-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core][distributed] allow custom allreduce when pipeline parallel size > 1 #6117
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Do we need a test?
I think you already have it:
|
I mean should we also test with and without ENABLE_ALL_REDUCE? |
the current main does not test it. it is on by default, and might be turned off when hardware does not support it. |
Sure, it makes sense to me |
merge as failed tests are unrelated |
when pipeline parallel size > 1, we can still use custom allreduce in the tensor parallel group.