-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TL/UCP: avoid copy in knomial scatter #771
TL/UCP: avoid copy in knomial scatter #771
Conversation
@@ -62,13 +63,7 @@ void ucc_tl_ucp_scatter_knomial_progress(ucc_coll_task_t *coll_task) | |||
while (!ucc_knomial_pattern_loop_done(p)) { | |||
step_radix = ucc_kn_compute_step_radix(p); | |||
block_count = ucc_sra_kn_compute_block_count(count, rank, p); | |||
sbuf = (rank == root) | |||
? args->src.info.buffer : args->dst.info.buffer; | |||
rbuf = args->dst.info.buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where are you setting rbuf instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate pls, i don't understand the question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind, i see it is already set in the beginning of progress and this is redundant.
task->scatter_kn.recv_offset = 0; | ||
is_zcopy = UCC_TL_UCP_TEAM_LIB(team)->cfg.scatter_kn_enable_recv_zcopy; | ||
if (((is_zcopy == UCC_CONFIG_AUTO) && | ||
(args->src.info.mem_type != UCC_MEMORY_TYPE_HOST)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not on host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this tricky, depending on CPU vendor I see different results. For AMD it's better to have extra copy and for Intel it's better to receive with correct offset. For GPUs it's always better to avoid copy
2f26eaf
to
eae1a2e
Compare
What
Adding configuration parameter for knomial scatter algorithm (part of scatter allgather broadcast) to receive data directly into destination buffer and avoid memory copy as a last step of the algorithm
Why ?
Improves performance of large message broadcast. By default enabled only for GPU collective