-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the cuda_memcpy
functions to make them more usable
#16945
Conversation
impl::cuda_memcpy_async( | ||
dst.data(), | ||
src.data(), | ||
std::min(dst.size_bytes(), src.size_bytes()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might cause surprises to users if they unintentionally pass spans of different sizes. We could instead throw in this case.
Yes, this should be a runtime check and it should throw. If the caller wants to copy subspans, the caller can create subspans. Spans, as view types, are meant to make this easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
Just one thing to keep in mind for this use case:
host_span{my_hv.data(), subsize}
is not the same as host_span{my_hv}.subspan(0, subsize)
because the first one will not know if it's pointing to pinned memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One namespace question. Otherwise LGTM.
source_data.size() * sizeof(T), | ||
is_pinned ? host_memory_kind::PINNED : host_memory_kind::PAGEABLE, | ||
stream); | ||
cuda_memcpy_async<T>(ret, source_data, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes. This is much better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Aesthetically, much improved.
Barring @bdice's concern regarding the namespace, this looks good to ship, to my eyes.
/merge |
Depends on #16945 Added `cudf::detail::device_scalar`, derived from `rmm::device_scalar`. The new class overrides function members that perform copies between host and device. New implementation uses a `cudf::detail::host_vector` as a bounce buffer to avoid performing a pageable copy. Replaced `rmm::device_scalar` with `cudf::detail::device_scalar` across libcudf. Authors: - Vukasin Milovanovic (https://github.com/vuule) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Basit Ayantunde (https://github.com/lamarrr) - Vyas Ramasubramani (https://github.com/vyasr) - David Wendt (https://github.com/davidwendt) URL: #16947
Description
As we expanded the use of the
cuda_memcpy
functions, we realized that they are not very ergonomic, as they require caller to queryis_device_accessible
and pass the correctPAGEABLE
/PINNED
enum based on this.This PR aims to make the
cuda_memcpy
functions easier to use, and the call site changes hopefully showcase this. The new implementation takes spans as parameters and relies on thehost_span::is_device_accessible
to enable copy strategies for pinned memory. Host spans set this flag during construction; creating a host span from acudf::detail::host_vector
will correctly propagateis_device_accessible
. Thus, call can simply* call thecuda_memcpy
functions with their containers as parameters and rely on implicit conversion tohost_span
/device_span
. Bonus - there's no way to mix up host and device memory pointers 👍Sharp edges:
The API copies themin(input.size(), output.size())
bytes, as this is what we can do safely. This might cause surprises to users if they unintentionally pass spans of different sizes. We could instead throw in this case.Checklist