Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TL/MLX5: adding mcast interface #784

Merged
merged 2 commits into from
Jun 29, 2023

Conversation

MamziB
Copy link
Collaborator

@MamziB MamziB commented May 22, 2023

What

TL/MLX5/MCAST: multicast-related changes to mlx5 files WIP

Why ?

Implementing HW MCAST in UCC

FYI @janjust @Sergei-Lebedev

@swx-jenkins3
Copy link

Can one of the admins verify this patch?

@Sergei-Lebedev
Copy link
Contributor

ok to test

@MamziB MamziB changed the title TL/MLX5/MCAST: HW multicast to mlx5 addition WIP TL/MLX5: adding mcast interface May 24, 2023
@janjust
Copy link
Collaborator

janjust commented May 24, 2023

@Sergei-Lebedev how do you retrigger CI? specifically the codestyle CI

@Sergei-Lebedev
Copy link
Contributor

Sergei-Lebedev commented May 25, 2023

@Sergei-Lebedev how do you retrigger CI? specifically the codestyle CI

it auto restarts after new commit or force push, note that codestyle doesn't check PR title but commit title

@MamziB MamziB force-pushed the mamzi/mcast-merge branch from be7c0e7 to 45eca68 Compare May 25, 2023 19:11
src/components/tl/mlx5/mcast/tl_mcast.h Outdated Show resolved Hide resolved
src/components/tl/mlx5/tl_mlx5_team.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/tl_mlx5_coll.h Show resolved Hide resolved
src/components/tl/mlx5/Makefile.am Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mcast.h Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mcast_context.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mcast.h Outdated Show resolved Hide resolved
src/components/tl/mlx5/tl_mlx5_coll.h Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mcast_coll.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/tl_mlx5_coll.c Outdated Show resolved Hide resolved
@MamziB
Copy link
Collaborator Author

MamziB commented May 31, 2023

@Sergei-Lebedev Thanks for the constructive comments. I have resolved all of them.

@MamziB MamziB force-pushed the mamzi/mcast-merge branch 2 times, most recently from f8c482a to 8a74c71 Compare May 31, 2023 02:41
@samnordmann samnordmann self-requested a review June 13, 2023 09:27
Copy link
Collaborator

@samnordmann samnordmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Mamzi, thanks for the clean implementation that smoothly fits into the existing tl.
I left some minor comments, some of them might not be relevant at this stage where I can't see the full collective implementation, so feel free to tell me what you think! :)

This PR and my open PR will have quite many conflicts, we should think about how to properly merge/rebase them two.

src/components/tl/mlx5/mcast/tl_mlx5_mcast_coll.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast.h Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast.h Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast.h Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast_coll.h Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast_coll.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast_coll.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast_coll.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast.h Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast_coll.c Outdated Show resolved Hide resolved
@MamziB MamziB force-pushed the mamzi/mcast-merge branch from 8a74c71 to 2950d04 Compare June 13, 2023 19:14
@MamziB
Copy link
Collaborator Author

MamziB commented Jun 13, 2023

@Sergei-Lebedev @samnordmann Thank you all for the comments. I have resolved all of them. Please feel free to approve the changes.
There were also a lot of merge conflicts that needed to be addressed. Now it should be free of any conflict with the latest master branch.

@janjust FYI

@MamziB MamziB force-pushed the mamzi/mcast-merge branch 2 times, most recently from ebc38fc to 9013651 Compare June 13, 2023 20:01
Copy link
Collaborator

@samnordmann samnordmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Mamzi! Thanks for addressing the comments, I'm approving the PR. I added another minor comment. Can you also make sure to fix the errors revealed by Linter-NVIDIA / clang-tidy check ?

src/components/tl/mlx5/mcast/tl_mlx5_mcast.h Show resolved Hide resolved
src/components/tl/mlx5/tl_mlx5_team.c Show resolved Hide resolved
@MamziB MamziB force-pushed the mamzi/mcast-merge branch from 9013651 to 76df0fb Compare June 14, 2023 17:11
@MamziB
Copy link
Collaborator Author

MamziB commented Jun 14, 2023

@samnordmann Thanks for pointing out the issue. I have updated the commit.

@MamziB MamziB requested a review from samnordmann June 14, 2023 17:18
@manjugv
Copy link
Contributor

manjugv commented Jun 23, 2023

@MamziB Since this is only part of PR. I know you will address some of this in your presentation to WG. Some design questions:

  • How do you create multicast groups?
  • How do you plan to support shared memory? Hierarchy?
  • Do you plan to support NVLINK multicast or other SM multicast?
  • What reliability protocol do you plan to use? Can you describe it?
  • How do you support large messages? How do you plan to support GPU memory?

@MamziB
Copy link
Collaborator Author

MamziB commented Jun 23, 2023

@manjugv Please find the answers inline:

- How do you create multicast groups?
Mcast group is created during the TL MLX5 team creation. Rank 0 in the new team calls rdma_join_multicast() and retrieves the gid and lid of this mcast group. Then it calls bcast() to send this information to all other processes in the new team. Upon receiving this info, all can make sure that they are joining the same mcast group.

How do you plan to support shared memory? Hierarchy?
We are envisioning a two-level approach for our mcast based bcast design. Root does an Inter-node bcast using Mcast and then node-leaders use shared memory to propagate the data to all other processes in the same node.
Therefore, CL can be utilized to take advantage of TL UCP (intra-node step) + TL MLX5 (Inter-node step).

Do you plan to support NVLINK multicast or other SM multicast?
Absolutely. GPU memory bcast is on our roadmap and such technology can accelerate the intra-node step. If you know of any implementation that uses NVLINK multicast, please let us know.

What reliability protocol do you plan to use? Can you describe it?
We use a binomial tree to represent the mcast group. Each rank has max one parent and two children. Each rank sends ACK to the parent upon receiving the mcast packet and then waits until it receives two ACK packets from its children. If a rank does not receive a packet in a timely manner, it sends a NACK to its parent, and then the parent directly sends the missing packet.

How do you support large messages? How do you plan to support GPU memory?
The max mcast packet size is MTU. Therefore, for large messages, we chunk the buffer into MTU-size buffers and mcast these smaller chunks back-to-back to each other.
For GPU memory, we are planning to stage the GPU buffer into CPU memory, and then use mcast to propagate the data. To hide this overhead, the staging step of a smaller chunk can be overlapped with mcast operation of the previous packet.

Copy link
Collaborator

@samnordmann samnordmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you return UCC_OK at team and context creation ?

src/components/tl/mlx5/mcast/tl_mlx5_mcast_team.c Outdated Show resolved Hide resolved
src/components/tl/mlx5/mcast/tl_mlx5_mcast_context.c Outdated Show resolved Hide resolved
@MamziB MamziB force-pushed the mamzi/mcast-merge branch from 7eef93e to 65ee40e Compare June 28, 2023 18:59
@MamziB
Copy link
Collaborator Author

MamziB commented Jun 28, 2023

@samnordmann Thanks for the new comments. I just resolved all of them. @manjugv Please let me know if you have any other comments regarding this PR.

@MamziB MamziB requested a review from manjugv June 28, 2023 19:00
@manjugv
Copy link
Contributor

manjugv commented Jun 28, 2023

Thanks, it looks good!

  1. Mamzi, please add FAQ as we discussed.
  2. I will let CI run, and we can merge

@manjugv
Copy link
Contributor

manjugv commented Jun 28, 2023

Thanks, it looks good!

  1. Mamzi, please add FAQ as we discussed.
  2. I will let CI run, and we can merge

CC @Sergei-Lebedev

@MamziB
Copy link
Collaborator Author

MamziB commented Jun 28, 2023

@manjugv Thanks, I will add FAQ as suggested. I just also saw that CI tests have finished successfully.

@Sergei-Lebedev Sergei-Lebedev merged commit 612b8d3 into openucx:master Jun 29, 2023
jeniaka pushed a commit to jeniaka/ucc that referenced this pull request Aug 14, 2023
Co-authored-by: Manjunath Gorentla Venkata <manjugv@users.noreply.github.com>
janjust pushed a commit to janjust/ucc that referenced this pull request Jan 31, 2024
Co-authored-by: Manjunath Gorentla Venkata <manjugv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants