-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] oss dict load fix #383
Conversation
|
||
# Populate the sharded optimizer state on the fly | ||
if self.param_to_rank[param] != self.rank: | ||
state_dict["state"][key] = None | ||
|
||
if key in self.index_to_param: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test was definitely wrong, I should have seen that..
@@ -391,16 +391,16 @@ def load_state_dict(self, state_dict: Dict[str, Any]) -> None: | |||
|
|||
# NOTE: PyTorch 1.5 does not index linearly but with the id(params) at saving time | |||
# we work around that here by using the fact that the params are ordered as in the param_groups | |||
pytorch15_index_redirect = {k: i for i, k in enumerate(state_dict["state"].keys())} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is useless post pytorch 1.5
context: Planning for a follow up PR to find a unit test which exposes the previous bug (guess is that the model used was too symmetrical), but trying to fix master ASAP in the meantime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
This reverts commit 8be9d93.
Summary: OSS removed the 'partition' key in their state dict to accommodate for changing partition size. This requires an update on the fairseq side to not look into the parameter partition, just broadcast everything, and let the optimizer on each rank decides which parameters are relevant. This diff also needs D26419095 to function completely, and blefaudeux has made fixes upstream in facebookresearch/fairscale#383 Reviewed By: myleott Differential Revision: D26382917 fbshipit-source-id: 95af1022be59e88814748acaee36a1a350f7dc5b
Summary: OSS removed the 'partition' key in their state dict to accommodate for changing partition size. This requires an update on the fairseq side to not look into the parameter partition, just broadcast everything, and let the optimizer on each rank decides which parameters are relevant. This diff also needs D26419095 to function completely, and blefaudeux has made fixes upstream in facebookresearch/fairscale#383 Reviewed By: myleott Differential Revision: D26382917 fbshipit-source-id: 95af1022be59e88814748acaee36a1a350f7dc5b
Summary: OSS removed the 'partition' key in their state dict to accommodate for changing partition size. This requires an update on the fairseq side to not look into the parameter partition, just broadcast everything, and let the optimizer on each rank decides which parameters are relevant. This diff also needs D26419095 to function completely, and blefaudeux has made fixes upstream in facebookresearch/fairscale#383 Reviewed By: myleott Differential Revision: D26382917 fbshipit-source-id: 95af1022be59e88814748acaee36a1a350f7dc5b
Before submitting
What does this PR do?
Fix for an issue that Pyspeech was seeing, it looks like I broke the logic in the last updates of #310
The unit test did not catch that because the model was symmetrical enough, tryign to find a better test
The bug detection and fix suggestion is from Weiyi Zheng @zhengwy888
Fixes #380
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃