Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] oss dict load fix #383

Merged
merged 2 commits into from
Feb 12, 2021
Merged

[fix] oss dict load fix #383

merged 2 commits into from
Feb 12, 2021

Conversation

blefaudeux
Copy link
Contributor

@blefaudeux blefaudeux commented Feb 12, 2021

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fix for an issue that Pyspeech was seeing, it looks like I broke the logic in the last updates of #310
The unit test did not catch that because the model was symmetrical enough, tryign to find a better test
The bug detection and fix suggestion is from Weiyi Zheng @zhengwy888

Fixes #380

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 12, 2021
@blefaudeux blefaudeux requested review from min-xu-ai, joshim5 and anj-s and removed request for min-xu-ai February 12, 2021 16:40

# Populate the sharded optimizer state on the fly
if self.param_to_rank[param] != self.rank:
state_dict["state"][key] = None

if key in self.index_to_param:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test was definitely wrong, I should have seen that..

@@ -391,16 +391,16 @@ def load_state_dict(self, state_dict: Dict[str, Any]) -> None:

# NOTE: PyTorch 1.5 does not index linearly but with the id(params) at saving time
# we work around that here by using the fact that the params are ordered as in the param_groups
pytorch15_index_redirect = {k: i for i, k in enumerate(state_dict["state"].keys())}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is useless post pytorch 1.5

@blefaudeux
Copy link
Contributor Author

context: Planning for a follow up PR to find a unit test which exposes the previous bug (guess is that the model used was too symmetrical), but trying to fix master ASAP in the meantime

Copy link
Contributor

@min-xu-ai min-xu-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@blefaudeux blefaudeux merged commit 8be9d93 into master Feb 12, 2021
blefaudeux added a commit that referenced this pull request Feb 12, 2021
blefaudeux added a commit that referenced this pull request Feb 12, 2021
facebook-github-bot pushed a commit to facebookresearch/fairseq that referenced this pull request Feb 12, 2021
Summary:
OSS removed the 'partition' key in their state dict to accommodate for changing partition size. This requires an update on the fairseq side to not look into the parameter partition, just broadcast everything, and let the optimizer on each rank decides which parameters are relevant.

This diff also needs D26419095 to function completely, and blefaudeux has made fixes upstream in facebookresearch/fairscale#383

Reviewed By: myleott

Differential Revision: D26382917

fbshipit-source-id: 95af1022be59e88814748acaee36a1a350f7dc5b
harkash pushed a commit to harkash/fairseq that referenced this pull request Feb 23, 2021
Summary:
OSS removed the 'partition' key in their state dict to accommodate for changing partition size. This requires an update on the fairseq side to not look into the parameter partition, just broadcast everything, and let the optimizer on each rank decides which parameters are relevant.

This diff also needs D26419095 to function completely, and blefaudeux has made fixes upstream in facebookresearch/fairscale#383

Reviewed By: myleott

Differential Revision: D26382917

fbshipit-source-id: 95af1022be59e88814748acaee36a1a350f7dc5b
jinyiyang-jhu pushed a commit to jinyiyang-jhu/fairseq-jyang that referenced this pull request Feb 26, 2021
Summary:
OSS removed the 'partition' key in their state dict to accommodate for changing partition size. This requires an update on the fairseq side to not look into the parameter partition, just broadcast everything, and let the optimizer on each rank decides which parameters are relevant.

This diff also needs D26419095 to function completely, and blefaudeux has made fixes upstream in facebookresearch/fairscale#383

Reviewed By: myleott

Differential Revision: D26382917

fbshipit-source-id: 95af1022be59e88814748acaee36a1a350f7dc5b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[OSS] When loading a state, the insertion order for the params may not match the params_groups
3 participants