-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[Train] move collective implementations to train_fn_utils #55689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Train] move collective implementations to train_fn_utils #55689
Conversation
Signed-off-by: xgui <xgui@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors collective operations like barrier and broadcast_from_rank_zero by moving their implementations from the public API module ray.train.collective.collectives to the internal TrainFnUtils class. This is a good architectural improvement that cleans up the public API surface and centralizes training-related utilities. The tests have been updated to reflect these changes. The changes are logical and well-executed. I have a couple of minor suggestions to improve code clarity and avoid redundant function calls.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com>
Signed-off-by: xgui <xgui@anyscale.com>
| @@ -0,0 +1,56 @@ | |||
| import logging | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move to collective folder as per @justinvyu 's comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed offline and it made more sense to put inside the _internal/execution
Signed-off-by: xgui <xgui@anyscale.com>
…t#55689) This PR moves the implementations of collectives to `TrainFnUtils`. This would unblock the local mode that is introduced in ray-project#55487 --------- Signed-off-by: xgui <xgui@anyscale.com> Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
This PR moves the implementations of collectives to `TrainFnUtils`. This would unblock the local mode that is introduced in #55487 --------- Signed-off-by: xgui <xgui@anyscale.com> Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
…t#55689) This PR moves the implementations of collectives to `TrainFnUtils`. This would unblock the local mode that is introduced in ray-project#55487 --------- Signed-off-by: xgui <xgui@anyscale.com> Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Why are these changes needed?
This PR moves the implementations of collectives to
TrainFnUtils. This would unblock the local mode that is introduced in #55487Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.