Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] ref: decoupled ddp, ddp spawn #3733

Closed
wants to merge 119 commits into from
Closed

[WIP] ref: decoupled ddp, ddp spawn #3733

wants to merge 119 commits into from

Conversation

williamFalcon
Copy link
Contributor

@williamFalcon williamFalcon commented Sep 30, 2020

Ok... turned out to fix many problems:

  1. remove a bunch of rank_zero decorators.
    We MUST make sure we don't create information asymmetry in children processes. State must be EXACTLY the same. Rank zero is there to not write to disk and that is only where it should be used.

  2. Solved all the issues with DDP about setting VISIBLE_DEVICES and gpus indexing not working. Also solves the issues around calling fit or test or both.

  3. Also solved the issue of how exactly to test ddp since we call other scripts. Overall we should have no regressions going forward because of these tests.

Stability of ddp has been improved dramatically!

Fixes #3422
Fixes #3117
Fixes #3730
Fixes #3605
Fixes #3403
Fixes #2826
Fixes #3071
Fixes #3023
Fixes #2913
Fixes #2590
Fixes #2529

cc @edenafek to add other tickets this fixes

Edit

PR is too big, splitting in parts.

Part 1: #3766
part 2: #3767
part 3: #3770

@mergify mergify bot requested a review from a team September 30, 2020 02:31
@pep8speaks
Copy link

pep8speaks commented Sep 30, 2020

Hello @williamFalcon! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-10-03 16:24:51 UTC

@williamFalcon williamFalcon changed the title ref: decoupled ddp spawn ref: decoupled ddp, ddp spawn Sep 30, 2020
@Borda Borda added the refactor label Sep 30, 2020
@mergify
Copy link
Contributor

mergify bot commented Sep 30, 2020

This pull request is now in conflict... :(

williamFalcon added a commit that referenced this pull request Oct 1, 2020
williamFalcon added a commit that referenced this pull request Oct 1, 2020

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
* ref: part 4 of #3733

* ref: part 4 of #3733

* ref: part 4 of #3733
@mergify
Copy link
Contributor

mergify bot commented Oct 1, 2020

This pull request is now in conflict... :(

williamFalcon added a commit that referenced this pull request Oct 2, 2020
williamFalcon added a commit that referenced this pull request Oct 2, 2020
williamFalcon added a commit that referenced this pull request Oct 2, 2020
williamFalcon added a commit that referenced this pull request Oct 2, 2020
williamFalcon added a commit that referenced this pull request Oct 2, 2020
williamFalcon added a commit that referenced this pull request Oct 2, 2020
williamFalcon added a commit that referenced this pull request Oct 2, 2020
williamFalcon added a commit that referenced this pull request Oct 2, 2020
williamFalcon added a commit that referenced this pull request Oct 2, 2020
williamFalcon added a commit that referenced this pull request Oct 2, 2020
williamFalcon added a commit that referenced this pull request Oct 2, 2020
williamFalcon added a commit that referenced this pull request Oct 2, 2020
williamFalcon added a commit that referenced this pull request Oct 2, 2020
* ref: part 7 of #3733

* ref: part 7 of #3733
@mergify
Copy link
Contributor

mergify bot commented Oct 2, 2020

This pull request is now in conflict... :(

williamFalcon added a commit that referenced this pull request Oct 2, 2020
@edenlightning
Copy link
Contributor

maybe #3791 too?

williamFalcon added a commit that referenced this pull request Oct 2, 2020
williamFalcon added a commit that referenced this pull request Oct 3, 2020
@williamFalcon
Copy link
Contributor Author

Finished in #3819

williamFalcon added a commit that referenced this pull request Oct 3, 2020
* ref: finish #3733

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* Update pytorch_lightning/accelerators/ddp_backend.py

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* remove deprecated test

* remove deprecated test

* remove deprecated test

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
@williamFalcon williamFalcon deleted the fw2 branch October 4, 2020 17:37
@Borda Borda added this to the 0.10.0 milestone Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment