-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TL/MLX5: adding mcast interface #784
Conversation
Can one of the admins verify this patch? |
ok to test |
@Sergei-Lebedev how do you retrigger CI? specifically the codestyle CI |
it auto restarts after new commit or force push, note that codestyle doesn't check PR title but commit title |
@Sergei-Lebedev Thanks for the constructive comments. I have resolved all of them. |
f8c482a
to
8a74c71
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Mamzi, thanks for the clean implementation that smoothly fits into the existing tl.
I left some minor comments, some of them might not be relevant at this stage where I can't see the full collective implementation, so feel free to tell me what you think! :)
This PR and my open PR will have quite many conflicts, we should think about how to properly merge/rebase them two.
8a74c71
to
2950d04
Compare
@Sergei-Lebedev @samnordmann Thank you all for the comments. I have resolved all of them. Please feel free to approve the changes. @janjust FYI |
ebc38fc
to
9013651
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Mamzi! Thanks for addressing the comments, I'm approving the PR. I added another minor comment. Can you also make sure to fix the errors revealed by Linter-NVIDIA / clang-tidy
check ?
9013651
to
76df0fb
Compare
@samnordmann Thanks for pointing out the issue. I have updated the commit. |
76df0fb
to
7eef93e
Compare
@MamziB Since this is only part of PR. I know you will address some of this in your presentation to WG. Some design questions:
|
@manjugv Please find the answers inline: - How do you create multicast groups? How do you plan to support shared memory? Hierarchy? Do you plan to support NVLINK multicast or other SM multicast? What reliability protocol do you plan to use? Can you describe it? How do you support large messages? How do you plan to support GPU memory? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can you return UCC_OK
at team and context creation ?
7eef93e
to
65ee40e
Compare
@samnordmann Thanks for the new comments. I just resolved all of them. @manjugv Please let me know if you have any other comments regarding this PR. |
Thanks, it looks good!
|
|
@manjugv Thanks, I will add FAQ as suggested. I just also saw that CI tests have finished successfully. |
Co-authored-by: Manjunath Gorentla Venkata <manjugv@users.noreply.github.com>
Co-authored-by: Manjunath Gorentla Venkata <manjugv@users.noreply.github.com>
What
TL/MLX5/MCAST: multicast-related changes to mlx5 files WIP
Why ?
Implementing HW MCAST in UCC
FYI @janjust @Sergei-Lebedev