-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid wrapping LightningModule in DDP plugins when not fitting #9096
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for carrying this forward!
870790e
to
1a2245a
Compare
Codecov Report
@@ Coverage Diff @@
## master #9096 +/- ##
=======================================
- Coverage 92% 88% -4%
=======================================
Files 176 176
Lines 14810 14817 +7
=======================================
- Hits 13663 13050 -613
- Misses 1147 1767 +620 |
@four4fish isn't this a subset of changes in #8632? |
@awaelchli no, I have commandeered the #8632, as Ning is focus on data loader changes ~~ |
@four4fish are we closing #8632 then? the contents are virtually identical. at least we should mark one of them as draft |
@awaelchli Yeah! Will close #8632. Sorry I should have marked it as commandeer PR, thanks :)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
pytorch_lightning/trainer/connectors/logger_connector/result.py
Outdated
Show resolved
Hide resolved
Head branch was pushed to by a user without write access
@four4fish - the tests are failing because the DeepSpeed plugin extends the DDP plugin. However, for DeepSpeed, we should always wrap the model, and not override the training/validation/test/predict steps (ie. DeepSpeed should continue calling This is another motivation for not relying on inheritance between these plugins, as both the wrapping and checkpoint behavior differ. |
Head branch was pushed to by a user without write access
@@ -280,6 +280,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/). | |||
- Fixed `DDP` "CUDA error: initialization error" due to a `copy` instead of `deepcopy` on `ResultCollection` ([#9239](https://github.com/PyTorchLightning/pytorch-lightning/pull/9239)) | |||
|
|||
|
|||
- Fixed wrapping issue: avoid wrapping LightningModule with data-parallel modules when not fitting in `DDPPlugin`, `DDPSpawnPlugin`, `DDPShardedPlugin`, `DDPSpawnShardedPlugin` ([#9096]https://github.com/PyTorchLightning/pytorch-lightning/pull/9096) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link is broken
Can be fixed with another PR - no need to open one just for this
What does this PR do?
Pull changes from #8632
Fixes #6977
Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃