-
Notifications
You must be signed in to change notification settings - Fork 23k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c10d] switch ProcessGroup::Work to be managed by intrusive_ptr #44046
Closed
Closed
Changes from 9 commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
229d170
[c10d] switch ProcessGroup::Work to be managed by intrusive_ptr
wanchaol 9f6506e
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 24cc652
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 0b77de1
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 0fabd3f
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 50df2c1
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol b3b5479
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol e5fdc7f
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 1b75374
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 4045f9d
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 7a03e67
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 5309485
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 639c487
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol ecd1cc6
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol a72c82e
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol a392a08
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 7a6b5d9
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol ccbcbcf
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol afab677
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol afa3532
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol b2453ed
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol ead03ba
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 3975a34
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol f58c771
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 6e6f6f0
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol c9f85df
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 2029658
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 716dd6f
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol e7c5d38
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 911807b
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 9c3579a
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol c3070d3
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol ec58283
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 534296e
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol db53f39
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol b20a3de
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol b15a6d0
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 1cb14d9
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol e462a46
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol 056232f
Update on "[c10d] switch ProcessGroup::Work to be managed by intrusiv…
wanchaol File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is probably going to break other third party backends like: https://github.com/intel/torch-ccl/blob/master/src/ProcessGroupCCL.hpp#L136 and https://github.com/openucx/torch-ucc/blob/master/include/torch_ucc.hpp#L77.
I'm guessing this is necessary for TorchScript and there is no way around it, so should we ask the third-party libraries to make this change as well? (we can probably file issues on those repos).
cc @agolynski Since this affects the c10d extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah agreed that we should ask them to make the changes. Do you have a list of third party backends or are these two the only two that's currently using c10d extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ones I am aware of are Intel and UCX. See:
Could you please check with @agolynski, he might know more context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @chengjunlu @mshiryaev @Sergei-Lebedev @srinivas212
This PR will break master -> master dependency in torch-ccl and torch-ucc.
In near future we'll be changing ProcessGroup API which will break these repos as well. Would it be okay if you depend on 1.6 (upgrade to 1.7 when released) and not on master meanwhile?
@Sergei-Lebedev @srinivas212: How does torch-ucc depend on pytorch, do you require users to install from pytorch from master branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @agolynski @mrshenli @pritamdamania87 - after discussions with both Torch-UCC and Torch-CCL teams, the near term plan is to fix the issue in the third-party repo once this change lands.
In general, it is best we keep third-party plugins in sync w/ master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going to land the stack soon, created openucx/torch-ucc#23 and intel/torch-ccl#11 to ucc and ccl to do the API migration.