[fix] oss dict load fix #383

blefaudeux · 2021-02-12T16:39:34Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Did you read the contributor guideline?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Fix for an issue that Pyspeech was seeing, it looks like I broke the logic in the last updates of #310
The unit test did not catch that because the model was symmetrical enough, tryign to find a better test
The bug detection and fix suggestion is from Weiyi Zheng @zhengwy888

Fixes #380

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

blefaudeux · 2021-02-12T16:42:52Z

fairscale/optim/oss.py


            # Populate the sharded optimizer state on the fly
            if self.param_to_rank[param] != self.rank:
                state_dict["state"][key] = None

-            if key in self.index_to_param:


this test was definitely wrong, I should have seen that..

blefaudeux · 2021-02-12T16:43:06Z

fairscale/optim/oss.py

@@ -391,16 +391,16 @@ def load_state_dict(self, state_dict: Dict[str, Any]) -> None:

        # NOTE: PyTorch 1.5 does not index linearly but with the id(params) at saving time
        # we work around that here by using the fact that the params are ordered as in the param_groups
+        pytorch15_index_redirect = {k: i for i, k in enumerate(state_dict["state"].keys())}


this is useless post pytorch 1.5

blefaudeux · 2021-02-12T17:05:43Z

context: Planning for a follow up PR to find a unit test which exposes the previous bug (guess is that the model used was too symmetrical), but trying to fix master ASAP in the meantime

min-xu-ai

nice!

This reverts commit 8be9d93.

Summary: OSS removed the 'partition' key in their state dict to accommodate for changing partition size. This requires an update on the fairseq side to not look into the parameter partition, just broadcast everything, and let the optimizer on each rank decides which parameters are relevant. This diff also needs D26419095 to function completely, and blefaudeux has made fixes upstream in facebookresearch/fairscale#383 Reviewed By: myleott Differential Revision: D26382917 fbshipit-source-id: 95af1022be59e88814748acaee36a1a350f7dc5b

blefaudeux added 2 commits February 12, 2021 16:15

WIP, needs to be fixed !

30c48ed

should be a fix, many thanks Weiyi Zheng

45f0925

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 12, 2021

blefaudeux requested review from min-xu-ai, joshim5 and anj-s and removed request for min-xu-ai February 12, 2021 16:40

blefaudeux commented Feb 12, 2021

View reviewed changes

min-xu-ai approved these changes Feb 12, 2021

View reviewed changes

blefaudeux merged commit 8be9d93 into master Feb 12, 2021

blefaudeux added a commit that referenced this pull request Feb 12, 2021

Revert "[fix] oss dict load (#383)"

e97a704

This reverts commit 8be9d93.

blefaudeux mentioned this pull request Feb 12, 2021

Revert "[fix] oss dict load fix" #384

Merged

blefaudeux added a commit that referenced this pull request Feb 12, 2021

Revert "[fix] oss dict load (#383)" (#384)

b666d6a

This reverts commit 8be9d93.

blefaudeux mentioned this pull request Feb 13, 2021

[fix] OSS dict load/save fix - better fix than 383 and unit test #386

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] oss dict load fix #383

[fix] oss dict load fix #383

blefaudeux commented Feb 12, 2021 •

edited

Loading

blefaudeux Feb 12, 2021

blefaudeux Feb 12, 2021

blefaudeux commented Feb 12, 2021

min-xu-ai left a comment

[fix] oss dict load fix #383

[fix] oss dict load fix #383

Conversation

blefaudeux commented Feb 12, 2021 • edited Loading

Before submitting

What does this PR do?

PR review

Did you have fun?

blefaudeux Feb 12, 2021

Choose a reason for hiding this comment

blefaudeux Feb 12, 2021

Choose a reason for hiding this comment

blefaudeux commented Feb 12, 2021

min-xu-ai left a comment

Choose a reason for hiding this comment

blefaudeux commented Feb 12, 2021 •

edited

Loading