-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support launching Lightning ddp with traditional command #7480
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7480 +/- ##
======================================
- Coverage 93% 92% -0%
======================================
Files 216 216
Lines 14115 14112 -3
======================================
- Hits 13088 13017 -71
- Misses 1027 1095 +68 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :)
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some comments:
Can you merge master? To make sure none of the recent changes have any impact.
Should we add a special test using torch.distributed.launch
?
@carmocca adding a test for I added such calls to the It adds about 10s of special test time. |
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
Yesterday GPU tests passed flawlessly. Of course it fails today when I get reviews. |
What does this PR do?
Fixes #7003
The Lightning environment provides a convenient way to launch DDP multi-gpu experiments. It launches the required number of processes automatically under the hood as explained in the docs. However, there is currently only a hacky way for the user to prevent this if they wish to launch all processes manually through the command line or with utilities like
torch.distributed.launch
. This PR adds detection ofLOCAL_RANK
variable in the environment and determines accordingly if processes need to be launched or not.Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
I made sure I had fun coding 🙃