Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TL/UCP: avoid copy in knomial scatter #771

Merged

Conversation

Sergei-Lebedev
Copy link
Contributor

What

Adding configuration parameter for knomial scatter algorithm (part of scatter allgather broadcast) to receive data directly into destination buffer and avoid memory copy as a last step of the algorithm

Why ?

Improves performance of large message broadcast. By default enabled only for GPU collective

@@ -62,13 +63,7 @@ void ucc_tl_ucp_scatter_knomial_progress(ucc_coll_task_t *coll_task)
while (!ucc_knomial_pattern_loop_done(p)) {
step_radix = ucc_kn_compute_step_radix(p);
block_count = ucc_sra_kn_compute_block_count(count, rank, p);
sbuf = (rank == root)
? args->src.info.buffer : args->dst.info.buffer;
rbuf = args->dst.info.buffer;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are you setting rbuf instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate pls, i don't understand the question

Copy link
Collaborator

@shimmybalsam shimmybalsam May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, i see it is already set in the beginning of progress and this is redundant.

task->scatter_kn.recv_offset = 0;
is_zcopy = UCC_TL_UCP_TEAM_LIB(team)->cfg.scatter_kn_enable_recv_zcopy;
if (((is_zcopy == UCC_CONFIG_AUTO) &&
(args->src.info.mem_type != UCC_MEMORY_TYPE_HOST)) ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not on host?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tricky, depending on CPU vendor I see different results. For AMD it's better to have extra copy and for Intel it's better to receive with correct offset. For GPUs it's always better to avoid copy

@Sergei-Lebedev Sergei-Lebedev force-pushed the topic/scatter_no_copy branch from 2f26eaf to eae1a2e Compare June 1, 2023 07:12
@Sergei-Lebedev Sergei-Lebedev merged commit 98e0e27 into openucx:master Jun 1, 2023
@Sergei-Lebedev Sergei-Lebedev deleted the topic/scatter_no_copy branch June 1, 2023 09:53
janjust pushed a commit to janjust/ucc that referenced this pull request Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants