Refactor the `cuda_memcpy` functions to make them more usable #16945

vuule · 2024-09-27T16:43:47Z

Description

As we expanded the use of the cuda_memcpy functions, we realized that they are not very ergonomic, as they require caller to query is_device_accessible and pass the correct PAGEABLE/PINNED enum based on this.

This PR aims to make the cuda_memcpy functions easier to use, and the call site changes hopefully showcase this. The new implementation takes spans as parameters and relies on the host_span::is_device_accessible to enable copy strategies for pinned memory. Host spans set this flag during construction; creating a host span from a cudf::detail::host_vector will correctly propagate is_device_accessible. Thus, call can simply* call the cuda_memcpy functions with their containers as parameters and rely on implicit conversion to host_span/device_span. Bonus - there's no way to mix up host and device memory pointers 👍

Sharp edges:

Conversion prevents template deduction, so calls that pass containers as parameters need to specify the template parameter (see changes in this PR).
The API copies the min(input.size(), output.size()) bytes, as this is what we can do safely. This might cause surprises to users if they unintentionally pass spans of different sizes. We could instead throw in this case.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…fea-host-device-copy

bdice · 2024-09-27T19:42:12Z

cpp/include/cudf/detail/utilities/cuda_memcpy.hpp

+  impl::cuda_memcpy_async(
+    dst.data(),
+    src.data(),
+    std::min(dst.size_bytes(), src.size_bytes()),


This might cause surprises to users if they unintentionally pass spans of different sizes. We could instead throw in this case.

Yes, this should be a runtime check and it should throw. If the caller wants to copy subspans, the caller can create subspans. Spans, as view types, are meant to make this easy.

Sure.
Just one thing to keep in mind for this use case:
host_span{my_hv.data(), subsize} is not the same as host_span{my_hv}.subspan(0, subsize) because the first one will not know if it's pointing to pinned memory.

bdice

One namespace question. Otherwise LGTM.

cpp/include/cudf/detail/utilities/cuda_memcpy.hpp

mythrocks · 2024-10-01T16:58:27Z

cpp/include/cudf/detail/utilities/vector_factories.hpp

-                    source_data.size() * sizeof(T),
-                    is_pinned ? host_memory_kind::PINNED : host_memory_kind::PAGEABLE,
-                    stream);
+  cuda_memcpy_async<T>(ret, source_data, stream);


Oh, yes. This is much better.

mythrocks

LGTM. Aesthetically, much improved.

Barring @bdice's concern regarding the namespace, this looks good to ship, to my eyes.

…fea-host-device-copy

vuule · 2024-10-02T01:42:06Z

/merge

Depends on #16945 Added `cudf::detail::device_scalar`, derived from `rmm::device_scalar`. The new class overrides function members that perform copies between host and device. New implementation uses a `cudf::detail::host_vector` as a bounce buffer to avoid performing a pageable copy. Replaced `rmm::device_scalar` with `cudf::detail::device_scalar` across libcudf. Authors: - Vukasin Milovanovic (https://github.com/vuule) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Basit Ayantunde (https://github.com/lamarrr) - Vyas Ramasubramani (https://github.com/vyasr) - David Wendt (https://github.com/davidwendt) URL: #16947

vuule added 4 commits September 26, 2024 16:23

initial impl

a272fe7

Merge branch 'branch-24.12' of https://github.com/rapidsai/cudf into …

7c1c918

…fea-host-device-copy

rework API

c0a2e71

impl fix

db97c3d

vuule added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 27, 2024

vuule self-assigned this Sep 27, 2024

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 27, 2024

docs

80047eb

vuule mentioned this pull request Sep 27, 2024

Add APIs to copy data to and from device to cudf::detail::host_vector #16931

Closed

3 tasks

bdice reviewed Sep 27, 2024

View reviewed changes

vuule added 2 commits September 27, 2024 13:06

throw when mismatched sizes

6cf40b3

Merge branch 'branch-24.12' into fea-host-device-copy

7414926

vuule mentioned this pull request Sep 28, 2024

Extend device_scalar to optionally use pinned bounce buffer #16947

Merged

3 tasks

vuule marked this pull request as ready for review September 30, 2024 16:55

vuule requested a review from a team as a code owner September 30, 2024 16:55

vuule requested review from vyasr and pmattione-nvidia September 30, 2024 16:55

pmattione-nvidia approved these changes Sep 30, 2024

View reviewed changes

bdice approved these changes Oct 1, 2024

View reviewed changes

cpp/include/cudf/detail/utilities/cuda_memcpy.hpp Outdated Show resolved Hide resolved

mythrocks reviewed Oct 1, 2024

View reviewed changes

mythrocks approved these changes Oct 1, 2024

View reviewed changes

vuule added 2 commits October 1, 2024 10:54

Merge branch 'branch-24.12' of https://github.com/rapidsai/cudf into …

d584fcc

…fea-host-device-copy

remove impl namespace

f6a9266

vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Oct 1, 2024

rapids-bot bot merged commit 6c9064a into rapidsai:branch-24.12 Oct 2, 2024
100 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the `cuda_memcpy` functions to make them more usable #16945

Refactor the `cuda_memcpy` functions to make them more usable #16945

vuule commented Sep 27, 2024 •

edited

Loading

bdice Sep 27, 2024

vuule Sep 27, 2024

vuule Sep 27, 2024

bdice left a comment

mythrocks Oct 1, 2024

mythrocks left a comment

vuule commented Oct 2, 2024

Refactor the cuda_memcpy functions to make them more usable #16945

Refactor the cuda_memcpy functions to make them more usable #16945

Conversation

vuule commented Sep 27, 2024 • edited Loading

Description

Checklist

bdice Sep 27, 2024

Choose a reason for hiding this comment

vuule Sep 27, 2024

Choose a reason for hiding this comment

vuule Sep 27, 2024

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

mythrocks Oct 1, 2024

Choose a reason for hiding this comment

mythrocks left a comment

Choose a reason for hiding this comment

vuule commented Oct 2, 2024

Refactor the `cuda_memcpy` functions to make them more usable #16945

Refactor the `cuda_memcpy` functions to make them more usable #16945

vuule commented Sep 27, 2024 •

edited

Loading