Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] AMP + Precision unscale grad #4441

Merged
merged 23 commits into from
Nov 2, 2020
Merged

[BUGFIX] AMP + Precision unscale grad #4441

merged 23 commits into from
Nov 2, 2020

Conversation

tchaton
Copy link
Contributor

@tchaton tchaton commented Oct 30, 2020

What does this PR do?

This PR does the follow:

  • upscale gradients only when optimizer_step is going to be called within AMPNativePlugin
  • call on_after_backward only if gradient accumulated
  • move the gradient tracking out to model.backward and only once gradient accumulated.

Fixes #4151
Fixes #4427

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@awaelchli
Copy link
Contributor

@tchaton not sure if this is WIP but I believe the test is missing a trainer.fit(model).

@codecov
Copy link

codecov bot commented Oct 30, 2020

Codecov Report

Merging #4441 into master will decrease coverage by 0%.
The diff coverage is 75%.

@@          Coverage Diff           @@
##           master   #4441   +/-   ##
======================================
- Coverage      93%     93%   -0%     
======================================
  Files         113     113           
  Lines        8194    8197    +3     
======================================
+ Hits         7627    7629    +2     
- Misses        567     568    +1     

@tchaton
Copy link
Contributor Author

tchaton commented Oct 30, 2020

@tchaton not sure if this is WIP but I believe the test is missing a trainer.fit(model).

Thanks @awaelchli, wrong copy / paste from Notebook

@ydcjeff
Copy link
Contributor

ydcjeff commented Oct 30, 2020

Just a notice, the tests are skipped on Drone

@pep8speaks
Copy link

pep8speaks commented Oct 30, 2020

Hello @tchaton! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-02 14:07:43 UTC

Copy link
Contributor Author

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a notice, the tests are skipped on Drone

Yes, drone is running 1.5 and AMP is introduced in 1.6.

pytorch_lightning/trainer/training_loop.py Show resolved Hide resolved
@ydcjeff
Copy link
Contributor

ydcjeff commented Oct 30, 2020

@tchaton okay, shall we try to upgrade to 1.6?

@ydcjeff ydcjeff added the bug Something isn't working label Oct 30, 2020
@ydcjeff ydcjeff added this to the 1.0.x milestone Oct 30, 2020
tests/plugins/test_amp_plugin.py Outdated Show resolved Hide resolved
tests/plugins/test_amp_plugin.py Outdated Show resolved Hide resolved
tchaton and others added 2 commits November 2, 2020 10:38
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
@ydcjeff
Copy link
Contributor

ydcjeff commented Nov 2, 2020

@tchaton changelog must be under [unreleased]

@tchaton tchaton added the ready PRs ready to be merged label Nov 2, 2020
@SeanNaren SeanNaren merged commit 102fa9e into master Nov 2, 2020
@SeanNaren SeanNaren deleted the issue_4311 branch November 2, 2020 16:36
@SeanNaren
Copy link
Contributor

Thanks @tchaton this was a good catch!

@SeanNaren SeanNaren mentioned this pull request Nov 3, 2020
1 task
Borda pushed a commit that referenced this pull request Nov 4, 2020
* move unscale within Native plugin

* remove gradient tracking from lightning backward

* forgot trainer.fit

* typo

* update

* cleanup

* set to 1.6

* typo

* skip if below 1.6 strict

* update changelog

* remove useless code

* Update tests/plugins/test_amp_plugin.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* Update tests/plugins/test_amp_plugin.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* update changelog

* Update CHANGELOG.md

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>

(cherry picked from commit 102fa9e)
rohitgr7 pushed a commit that referenced this pull request Nov 21, 2020
* move unscale within Native plugin

* remove gradient tracking from lightning backward

* forgot trainer.fit

* typo

* update

* cleanup

* set to 1.6

* typo

* skip if below 1.6 strict

* update changelog

* remove useless code

* Update tests/plugins/test_amp_plugin.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* Update tests/plugins/test_amp_plugin.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* update changelog

* Update CHANGELOG.md

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ready PRs ready to be merged
Projects
None yet
7 participants