Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the cuda_memcpy functions to make them more usable #16945

Merged
merged 9 commits into from
Oct 2, 2024

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Sep 27, 2024

Description

As we expanded the use of the cuda_memcpy functions, we realized that they are not very ergonomic, as they require caller to query is_device_accessible and pass the correct PAGEABLE/PINNED enum based on this.

This PR aims to make the cuda_memcpy functions easier to use, and the call site changes hopefully showcase this. The new implementation takes spans as parameters and relies on the host_span::is_device_accessible to enable copy strategies for pinned memory. Host spans set this flag during construction; creating a host span from a cudf::detail::host_vector will correctly propagate is_device_accessible. Thus, call can simply* call the cuda_memcpy functions with their containers as parameters and rely on implicit conversion to host_span/device_span. Bonus - there's no way to mix up host and device memory pointers 👍

Sharp edges:

  • Conversion prevents template deduction, so calls that pass containers as parameters need to specify the template parameter (see changes in this PR).
  • The API copies the min(input.size(), output.size()) bytes, as this is what we can do safely. This might cause surprises to users if they unintentionally pass spans of different sizes. We could instead throw in this case.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vuule vuule added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 27, 2024
@vuule vuule self-assigned this Sep 27, 2024
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 27, 2024
impl::cuda_memcpy_async(
dst.data(),
src.data(),
std::min(dst.size_bytes(), src.size_bytes()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might cause surprises to users if they unintentionally pass spans of different sizes. We could instead throw in this case.

Yes, this should be a runtime check and it should throw. If the caller wants to copy subspans, the caller can create subspans. Spans, as view types, are meant to make this easy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.
Just one thing to keep in mind for this use case:
host_span{my_hv.data(), subsize} is not the same as host_span{my_hv}.subspan(0, subsize) because the first one will not know if it's pointing to pinned memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@vuule vuule marked this pull request as ready for review September 30, 2024 16:55
@vuule vuule requested a review from a team as a code owner September 30, 2024 16:55
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One namespace question. Otherwise LGTM.

cpp/include/cudf/detail/utilities/cuda_memcpy.hpp Outdated Show resolved Hide resolved
source_data.size() * sizeof(T),
is_pinned ? host_memory_kind::PINNED : host_memory_kind::PAGEABLE,
stream);
cuda_memcpy_async<T>(ret, source_data, stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes. This is much better.

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Aesthetically, much improved.

Barring @bdice's concern regarding the namespace, this looks good to ship, to my eyes.

@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Oct 1, 2024
@vuule
Copy link
Contributor Author

vuule commented Oct 2, 2024

/merge

@rapids-bot rapids-bot bot merged commit 6c9064a into rapidsai:branch-24.12 Oct 2, 2024
100 checks passed
rapids-bot bot pushed a commit that referenced this pull request Oct 18, 2024
Depends on #16945

Added `cudf::detail::device_scalar`, derived from `rmm::device_scalar`. The new class overrides function members that perform copies between host and device. New implementation uses a `cudf::detail::host_vector` as a bounce buffer to avoid performing a pageable copy.

Replaced `rmm::device_scalar` with `cudf::detail::device_scalar` across libcudf.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Basit Ayantunde (https://github.com/lamarrr)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - David Wendt (https://github.com/davidwendt)

URL: #16947
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants