This repository has been archived by the owner on Nov 3, 2023. It is now read-only.
[Torch classifier agent][bug fix]Fix optimizer loading in classifier agent #4406
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Patch description
I encountered an issue where my jobs would crash with
KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer
during final evaluation.After some digging, I found the issue was the following:
model. Parameters.
ParlAI/parlai/core/torch_classifier_agent.py
Line 561 in 289942b
Proposed changes:
Change this line to what torch generator agent was doing, loading the optimizer states back instead of creating a new one from model.parameters
ParlAI/parlai/core/torch_generator_agent.py
Line 525 in 289942b
Testing steps
You could reproduce the issue by setting a high warm up and load such model back when resume training. The issue was fixed after the proposed change.