Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Neighbor Exchange Algorithm for Allgather #822

Merged
merged 17 commits into from
Sep 5, 2023

Conversation

ikryukov
Copy link
Collaborator

@ikryukov ikryukov commented Aug 16, 2023

What

Implementation of Neighbor Exchange Allgather algorithm described in this paper: https://ieeexplore.ieee.org/document/1592302
Based on original implementation from OpenMPI: https://github.com/open-mpi/ompi/blob/8514e717215a42ddb6db065c3baae3a2779a072a/ompi/mca/coll/base/coll_base_allgather.c#L624

Why ?

According to mentioned paper: For the TCP/IP protocol, an algorithm that incorporates pair-wise exchange is the best for minimizing TCP/IP traffic and this algorithm require N / 2 steps instead of N - 1 like in ring.

@swx-jenkins3
Copy link

Can one of the admins verify this patch?

@Sergei-Lebedev
Copy link
Contributor

ok to test

@ikryukov ikryukov force-pushed the allgather_neighbor_exchange branch 2 times, most recently from 2a24add to 9d9c3da Compare August 16, 2023 14:22
src/components/tl/ucp/allgather/allgather.h Outdated Show resolved Hide resolved
src/components/tl/ucp/allgather/allgather_neighbor.c Outdated Show resolved Hide resolved
src/components/tl/ucp/allgather/allgather_neighbor.c Outdated Show resolved Hide resolved
src/components/tl/ucp/allgather/allgather_neighbor.c Outdated Show resolved Hide resolved
src/components/tl/ucp/allgather/allgather_neighbor.c Outdated Show resolved Hide resolved
src/components/tl/ucp/allgather/allgather_neighbor.c Outdated Show resolved Hide resolved
src/components/tl/ucp/allgather/allgather_neighbor.c Outdated Show resolved Hide resolved
src/components/tl/ucp/allgather/allgather_neighbor.c Outdated Show resolved Hide resolved
src/components/tl/ucp/allgather/allgather_neighbor.c Outdated Show resolved Hide resolved
src/components/tl/ucp/allgather/allgather_neighbor.c Outdated Show resolved Hide resolved
@ikryukov ikryukov marked this pull request as ready for review August 23, 2023 10:14
@Sergei-Lebedev
Copy link
Contributor

bot:retest

@ikryukov ikryukov force-pushed the allgather_neighbor_exchange branch from 6c33bb3 to 18669f0 Compare August 24, 2023 16:26
@shimmybalsam
Copy link
Collaborator

@Sergei-Lebedev should we add here ranks reordering? I assume it would improve performance substantially since each neighbor with default mapping will always cross cores...

@Sergei-Lebedev
Copy link
Contributor

@Sergei-Lebedev should we add here ranks reordering? I assume it would improve performance substantially since each neighbor with default mapping will always cross cores...

this algorithm won't work with reordering, if need reordering we should fallback to ring

@ikryukov ikryukov force-pushed the allgather_neighbor_exchange branch 2 times, most recently from d7034c7 to 22040c3 Compare August 30, 2023 08:48
@ikryukov ikryukov requested a review from samnordmann August 30, 2023 09:03
Copy link
Collaborator

@samnordmann samnordmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ikryukov for the revisions! It looks good to me. Can you just fix the last comment please?

@ikryukov ikryukov requested a review from samnordmann September 4, 2023 13:59
@ikryukov ikryukov force-pushed the allgather_neighbor_exchange branch 2 times, most recently from 226348e to 369b138 Compare September 4, 2023 15:47
@ikryukov ikryukov force-pushed the allgather_neighbor_exchange branch from 369b138 to 7b9acf2 Compare September 5, 2023 09:55
@Sergei-Lebedev Sergei-Lebedev merged commit 25a3d95 into openucx:master Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants