Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sharded plugin] Fix check for fp16 precision #7825

Merged
merged 57 commits into from
Jun 4, 2021

Conversation

shuyingsunshine21
Copy link
Contributor

@shuyingsunshine21 shuyingsunshine21 commented Jun 3, 2021

What does this PR do?

Fixes the bug, precision passed in trainer is 16, but PrecisionPlugin will convert this to other format, like MixedPrecisionPlugin.precision is mixed (https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/plugins/precision/mixed.py)

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

Shuying Sun and others added 30 commits March 23, 2021 12:06
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
…oint_consolidate

Update test_all_gather_grad.py
…1-checkpoint_consolidate"

This reverts commit c5053da, reversing
changes made to 0d23d75.
This reverts commit 70fe5da.
This reverts commit a9aae99.
@@ -54,7 +54,7 @@ def _reinit_optimizers_with_oss(self):
optim_class = type(optimizer)
zero_optimizer = OSS(params=optimizer.param_groups, optim=optim_class, **optimizer.defaults)
if _FAIRSCALE_OSS_FP16_BROADCAST_AVAILABLE:
is_fp16 = self.lightning_module.trainer.precision == 16
is_fp16 = self.lightning_module.trainer.precision == "mixed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is_fp16 = self.lightning_module.trainer.precision == "mixed"
precision = self.lightning_module.trainer.precision
is_fp16 = precision == "mixed" or precision == 16

just in case we enable true fp16 later

@ananthsub ananthsub changed the title Fix precision bug [sharded plugin] Fix check for is fp16 precision Jun 3, 2021
@ananthsub ananthsub changed the title [sharded plugin] Fix check for is fp16 precision [sharded plugin] Fix check for fp16 precision Jun 3, 2021
@ananthsub ananthsub added the ready PRs ready to be merged label Jun 4, 2021
@justusschock justusschock enabled auto-merge (squash) June 4, 2021 05:42
@mergify mergify bot added the has conflicts label Jun 4, 2021
auto-merge was automatically disabled June 4, 2021 05:52

Head branch was pushed to by a user without write access

@mergify mergify bot removed the has conflicts label Jun 4, 2021
@shuyingsunshine21
Copy link
Contributor Author

would like to add unittest for assert the wrapped optimizer's attribute broadcast_fp16 is set correctly, but realized that there might have difficulty for multi-nodes setting.

@awaelchli awaelchli added the bug Something isn't working label Jun 4, 2021
@awaelchli awaelchli added this to the v1.3.x milestone Jun 4, 2021
@awaelchli awaelchli merged commit ca89a7f into Lightning-AI:master Jun 4, 2021
@shuyingsunshine21 shuyingsunshine21 deleted the fix_precision_bug branch June 4, 2021 07:20
@ananthsub ananthsub mentioned this pull request Jun 7, 2021
11 tasks
@SeanNaren SeanNaren mentioned this pull request Jun 8, 2021
SeanNaren pushed a commit that referenced this pull request Jun 8, 2021
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
(cherry picked from commit ca89a7f)
lexierule pushed a commit that referenced this pull request Jun 9, 2021
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
(cherry picked from commit ca89a7f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants