Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] Fix to DDP tutorial #3120

Merged
merged 3 commits into from
Oct 30, 2024
Merged

[doc] Fix to DDP tutorial #3120

merged 3 commits into from
Oct 30, 2024

Conversation

c-p-i-o
Copy link
Contributor

@c-p-i-o c-p-i-o commented Oct 28, 2024

Summary:

  1. Add "set_device" call to keep things consistent between all DDP tutorials.
    This was inspired by the following change in the PyTorch repo: Update DDP tutorial for the correct order of set_device examples#1285 (review)
  2. Update documentation to be more clearer.

Test Plan:
Ran tutorial with the applied changes and we see:

Running basic DDP example on rank 3.
Running basic DDP example on rank 1.
Running basic DDP example on rank 2.
Running basic DDP example on rank 0.
Finished running basic DDP example on rank 0.
Finished running basic DDP example on rank 1.
Finished running basic DDP example on rank 3.
Finished running basic DDP example on rank 2.
Running DDP checkpoint example on rank 2.
Running DDP checkpoint example on rank 1.
Running DDP checkpoint example on rank 0.
Running DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 0.
Finished DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 1.
Finished DDP checkpoint example on rank 2.
Running DDP with model parallel example on rank 0. 
Running DDP with model parallel example on rank 1. 
Finished running DDP with model parallel example on rank 0. 
Finished running DDP with model parallel example on rank 1

Fixes #ISSUE_NUMBER

Description

Checklist

  • The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
  • Only one issue is addressed in this pull request
  • Labels from the issue that this PR is fixing are added to this pull request
  • No unnecessary issues are included into this pull request.

@c-p-i-o c-p-i-o requested review from svekars and wz337 October 28, 2024 17:48
@c-p-i-o c-p-i-o self-assigned this Oct 28, 2024
Copy link

pytorch-bot bot commented Oct 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3120

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 38b14d7 with merge base b465aa7 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@svekars
Copy link
Contributor

svekars commented Oct 28, 2024

linkcheck can be ignored for now

@c-p-i-o c-p-i-o force-pushed the chirag/fix-ddp-tutorial branch 2 times, most recently from 70862b0 to 4413dc4 Compare October 29, 2024 00:51
@c-p-i-o c-p-i-o changed the title [doc] minor fixups to DDP tutorial [doc] Fix to DDP tutorial Oct 29, 2024
@c-p-i-o c-p-i-o force-pushed the chirag/fix-ddp-tutorial branch 3 times, most recently from 3db89e9 to 68f865f Compare October 29, 2024 20:47
Summary:
1. Add "set_device" call to keep things consistent between all DDP
tutorials.
This was inspired by the following change in the PyTorch repo:
pytorch/examples#1285 (review)
2. Fix up the tutorial and add additional prints when the model exits.

Test Plan:
Ran tutorial with the applied changes and we see:
"""
Running basic DDP example on rank 3.
Running basic DDP example on rank 1.
Running basic DDP example on rank 2.
Running basic DDP example on rank 0.
Finished running basic DDP example on rank 0.
Finished running basic DDP example on rank 1.
Finished running basic DDP example on rank 3.
Finished running basic DDP example on rank 2.
Running DDP checkpoint example on rank 2.
Running DDP checkpoint example on rank 1.
Running DDP checkpoint example on rank 0.
Running DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 0.
Finished DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 1.
Finished DDP checkpoint example on rank 2.
Running DDP with model parallel example on rank 0.
Running DDP with model parallel example on rank 1.
Finished running DDP with model parallel example on rank 0.
Finished running DDP with model parallel example on rank 1.
"""
@svekars svekars merged commit 6a15041 into main Oct 30, 2024
20 checks passed
@svekars svekars deleted the chirag/fix-ddp-tutorial branch October 30, 2024 17:38
svekars added a commit that referenced this pull request Oct 31, 2024
Summary:
1. Add "set_device" call to keep things consistent between all DDP
tutorials.
This was inspired by the following change in the PyTorch repo:
pytorch/examples#1285 (review)
2. Fix up the tutorial and add additional prints when the model exits.

Test Plan:
Ran tutorial with the applied changes.
"""

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants