-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove SharedDDP as it is deprecated #25702
Conversation
@sgugger sorry for bothering you, but would you mind taking a look at this PR? |
cc @muellerzr and @pacman100 who will take over the Trainer. |
b99de26
to
e069269
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
b797639
to
a56008a
Compare
Rebase my commits to master HEAD to fix the merge conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @statelesshz for removing the FairScale support as it is deprecated. The changes LGTM except for the TPU-related bits which Zach can review. Left a comment.
2447a3b
to
959cd5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is looking great! Just a small note in terms of gradient scaling. We need to either keep this for now in this PR and merge, or coordinate a PR in accelerate with the right logic before this merges. I don't particularly have a leaning one way or another :)
@muellerzr Could you please take a second look at this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This all looks great to me, do you have access to a multi-gpu system by chance to run the sharded DDP yourself/test? Probably also good to have @pacman100 give it one more look as well :)
I've tested this PR using 4xA100-80G on FastChat with the following scripts
|
@pacman100 Could you please take a look once again :-) I think it's ready to be merged. |
665a8ab
to
1554e7a
Compare
Resolve merge conflicts by rebasing to main branch |
15facf5
to
aceb42c
Compare
Rebasing my commits to master HEAD and resolving merge conflicts |
Hi there. This PR is approved and the tests are green :D |
This PR is approved and the tests are green. @muellerzr Could you help to merge it? |
Thank you @statelesshz! |
Let me rebase quickly and merge if tests are green |
aceb42c
to
140bd1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, LGTM! Thanks @statelesshz
* remove SharedDDP as it was drepracated * apply review suggestion * make style * Oops,forgot to remove the compute_loss context manager in Seq2SeqTrainer. * remove the unnecessary conditional statement * keep the logic of IPEX * clean code * mix precision setup & make fixup --------- Co-authored-by: statelesshz <jihuazhong1@huawei.com>
* remove SharedDDP as it was drepracated * apply review suggestion * make style * Oops,forgot to remove the compute_loss context manager in Seq2SeqTrainer. * remove the unnecessary conditional statement * keep the logic of IPEX * clean code * mix precision setup & make fixup --------- Co-authored-by: statelesshz <jihuazhong1@huawei.com>
* remove SharedDDP as it was drepracated * apply review suggestion * make style * Oops,forgot to remove the compute_loss context manager in Seq2SeqTrainer. * remove the unnecessary conditional statement * keep the logic of IPEX * clean code * mix precision setup & make fixup --------- Co-authored-by: statelesshz <jihuazhong1@huawei.com>
What does this PR do?
As mentioned previously(see), fairscale's ShardedDDP is deprecated, and PyTorch FSDP is the recommended method for scaling to large NN models. Now it's time to say goodbye to this library👋.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@muellerz Good day. Could you please review this PR? Thanks😄