-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the fix! this lgtm
if optim_states and saved_optim_type != opt['optimizer']: | ||
# we changed from adam to adamax, or sgd to adam, or similar | ||
logging.warning('Not loading optim state since optim class changed.') | ||
return False | ||
return True | ||
elif optim_states: | ||
# check for any fp16/fp32 conversions we need to do | ||
optimstate_fp16 = 'loss_scaler' in optim_states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can't leave a comment for it, but are the semantics correct for line 1099? the elif not optimstate_fp16 and self.fp16
block? are we returning True
always because of the lower precision conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
self.scheduler = optim.lr_scheduler.LambdaLR(optimizer, self._linear_lr) | ||
|
||
def _linear_lr(self, step): | ||
# this multiplicative factor ensures linear decay rate | ||
# lr_mult = float(self.max_lr_steps - step - 1) / float(self.max_lr_steps - step) | ||
lr_mult = max(0.0, 1e-6 + (1.0 - step / self.max_lr_steps) * (1 - 1e-6)) | ||
lr_mult = max(0.0, 1.0 - step / self.max_lr_steps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont need 1e-6
anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made an executive call to let it actually go to 0 :P
Patch description
Context:
Testing steps
Adjusted CI, new assertions.