Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a raft::copy overload for mdspan-to-mdspan copies #1818

Merged
merged 104 commits into from
Oct 6, 2023
Merged
Show file tree
Hide file tree
Changes from 102 commits
Commits
Show all changes
104 commits
Select commit Hold shift + click to select a range
e24fd2e
Initial commit
tarang-jain Apr 3, 2023
b8cda77
Merge branch 'branch-23.04' of https://github.com/rapidsai/raft into …
tarang-jain Apr 3, 2023
07dabfe
New commit
tarang-jain Apr 6, 2023
64eb461
Merge branch 'branch-23.06' of https://github.com/rapidsai/raft into …
tarang-jain Apr 6, 2023
21c2641
Update
tarang-jain Apr 6, 2023
c84daa6
Merge
tarang-jain Apr 6, 2023
4ad421b
Merge
tarang-jain Apr 6, 2023
ea11b07
Merge
tarang-jain Apr 6, 2023
ab19410
build
tarang-jain Apr 7, 2023
9870e9d
Test start
tarang-jain Apr 7, 2023
51a2581
Test start
tarang-jain Apr 7, 2023
552b21e
Merge branch 'branch-23.06' of https://github.com/rapidsai/raft into …
tarang-jain Apr 7, 2023
d0e7b2c
style changes
tarang-jain Apr 7, 2023
f72f7f8
merge
tarang-jain Apr 7, 2023
05f9daa
merge dependencies.yaml
tarang-jain Apr 7, 2023
0250931
Updates
tarang-jain Apr 10, 2023
057743d
Merge branch 'branch-23.06' of https://github.com/rapidsai/raft into …
tarang-jain Apr 10, 2023
20042b0
Debugging
tarang-jain Apr 12, 2023
2d189c3
Update gtest
tarang-jain Apr 19, 2023
53c4557
Merge branch 'branch-23.06' of https://github.com/rapidsai/raft into …
tarang-jain Apr 25, 2023
de753ae
Merge branch 'branch-23.06' of https://github.com/rapidsai/raft into …
tarang-jain Apr 27, 2023
2f8b294
Some updates after reviews
tarang-jain Apr 27, 2023
6539ef4
Use raft::resources
tarang-jain Apr 28, 2023
1709521
Merge branch 'branch-23.06' of https://github.com/rapidsai/raft into …
tarang-jain Apr 28, 2023
008bb5b
move exception
tarang-jain Apr 28, 2023
5b97273
Updates after PR Reviews
tarang-jain May 2, 2023
5be6ec2
Merge branch 'branch-23.06' of https://github.com/rapidsai/raft into …
tarang-jain May 2, 2023
838bfef
Add container policy
tarang-jain May 8, 2023
e035e2e
further changes with container policy
tarang-jain May 10, 2023
cd91a88
Merge branch 'branch-23.06' of https://github.com/rapidsai/raft into …
tarang-jain May 10, 2023
338c1a6
Some updates
tarang-jain May 12, 2023
6468c24
update container_policy
tarang-jain Jun 7, 2023
1bd5455
Merge branch 'branch-23.08' of https://github.com/rapidsai/raft into …
tarang-jain Jun 7, 2023
81c6a81
Working build
tarang-jain Jun 9, 2023
77ae593
Merge branch 'branch-23.08' of https://github.com/rapidsai/raft into …
tarang-jain Jun 9, 2023
451815e
Update buffer accessor policy
tarang-jain Jun 12, 2023
b553369
Merge branch 'branch-23.08' of https://github.com/rapidsai/raft into …
tarang-jain Jun 12, 2023
b410f36
Style changes
tarang-jain Jun 12, 2023
4731620
minor changes
tarang-jain Jun 13, 2023
238d010
combine owning buffer cpu/gpu
tarang-jain Jun 14, 2023
75cfcf1
update tests
tarang-jain Jun 20, 2023
7b1909f
Updates
tarang-jain Jul 3, 2023
5c041c4
Merge branch 'branch-23.08' of https://github.com/rapidsai/raft into …
tarang-jain Jul 3, 2023
0bf6f87
Merge branch 'branch-23.08' into tarbuf
wphicks Jul 3, 2023
1a1143f
Temporarily remove new files to bring back necessary ones
wphicks Jul 3, 2023
acceb61
Begin refactoring buffer container policies
wphicks Jul 5, 2023
fdefc34
Add placeholder resource for stream view in CUDA-free builds
wphicks Jul 10, 2023
24223ed
Add infrastructure for CUDA-free build
wphicks Jul 11, 2023
c6f6354
Merge branch 'branch-23.08' into fea-mdbuffer
wphicks Jul 11, 2023
4689052
Add initial set of CUDA-free tests
wphicks Jul 11, 2023
1b7e1e5
Add variant types to mdbuffer
wphicks Jul 17, 2023
5416ceb
Provide all mdarray/mdspan to mdbuffer conversions
wphicks Jul 18, 2023
355b3d4
Begin creating buffer copy utilities
wphicks Jul 31, 2023
601f65d
Merge branch 'branch-23.10' into fea-mdbuffer
wphicks Aug 18, 2023
4770a83
Correct computation of dest indices
wphicks Aug 18, 2023
28e8627
Merge branch 'branch-23.10' into fea-mdbuffer
wphicks Aug 22, 2023
8237a74
Temporarily remove simd-accelerated copy
wphicks Aug 23, 2023
022cf6e
Add initial mdspan copy utility implementation
wphicks Aug 29, 2023
a1776f4
Refactor copy properties detection
wphicks Aug 31, 2023
a970dad
Correct detection of mdspan copy paths
wphicks Sep 1, 2023
9a2fa9e
Correct build errors
wphicks Sep 1, 2023
eac9de6
Provide passing 3D host transpose tests
wphicks Sep 1, 2023
39cf094
Add working tests for cuBlas based transpose
wphicks Sep 1, 2023
760b656
Add incomplete kernel tests
wphicks Sep 5, 2023
f8d435f
Remove old mdspan copy header
wphicks Sep 5, 2023
4c4fbaf
Revert "Remove old mdspan copy header"
wphicks Sep 5, 2023
ad5c786
Remove correct mdspan copy header
wphicks Sep 5, 2023
2e433ba
Correct std::apply workaround in CUDA
wphicks Sep 6, 2023
d669e42
Provide fully working copy kernel
wphicks Sep 7, 2023
ed663c8
Begin adding SIMD support
wphicks Sep 11, 2023
ab809e8
Revert "Begin adding SIMD support"
wphicks Sep 11, 2023
49d871a
Disable initial SIMD implementation
wphicks Sep 11, 2023
cb24abc
Rename mdspan copy headers
wphicks Sep 11, 2023
2a83c1b
Remove mdbuffer work and document mdspan copy
wphicks Sep 11, 2023
4193b74
Merge branch 'branch-23.10' into fea-mdspan_copy
wphicks Sep 11, 2023
624e4f3
Remove un-needed changes left over from mdbuffer
wphicks Sep 12, 2023
e9ef750
Add testing for CUDA-disabled builds
wphicks Sep 12, 2023
06fe54d
Merge branch 'branch-23.10' into fea-mdspan_copy
wphicks Sep 12, 2023
92046e0
Fix style and revert some unnecessary changes
wphicks Sep 12, 2023
a0a5b69
Remove changes related to mdbuffer
wphicks Sep 12, 2023
58389ec
Remove change related to mdbuffer
wphicks Sep 12, 2023
0a19ae5
Correctly handle proxy references in mdspan copy kernel
wphicks Sep 12, 2023
0675207
Check for unique destination layout in any parallel copy
wphicks Sep 13, 2023
8ad9434
Use perfect forwarding for copy wrappers
wphicks Sep 13, 2023
fdbc9ee
Correct comment for dimension iteration order
wphicks Sep 13, 2023
21618ea
Add warning about copying to non-unique layouts
wphicks Sep 14, 2023
18d462e
Add benchmarks for mdspan copy
wphicks Sep 19, 2023
4700199
Merge branch 'branch-23.10' into fea-mdspan_copy
wphicks Sep 19, 2023
2cad1ed
Merge branch 'branch-23.10' into fea-mdspan_copy
wphicks Sep 19, 2023
6e91a1c
Correct check for assignability in mdspan copy
wphicks Sep 20, 2023
55e06fe
Add comment explaining intermediate storage
wphicks Sep 20, 2023
faa402a
Correct dtype compatibility test
wphicks Sep 21, 2023
2eba34d
Provide cleaner compile error for using copy with unsupported types
wphicks Sep 21, 2023
ca77cf0
Merge branch 'branch-23.10' into fea-mdspan_copy
wphicks Sep 22, 2023
4389b64
Update stream_view docs
wphicks Sep 22, 2023
7416b73
Merge branch 'branch-23.10' into fea-mdspan_copy
wphicks Sep 22, 2023
7f407ed
Merge branch 'branch-23.10' into fea-mdspan_copy
wphicks Sep 22, 2023
62ac60a
Update stream view docs
wphicks Sep 22, 2023
5bddcc8
Merge remote-tracking branch 'origin/fea-mdspan_copy' into fea-mdspan…
wphicks Sep 22, 2023
bd5a8f8
Merge branch 'branch-23.12' into fea-mdspan_copy
wphicks Oct 2, 2023
a8b17a8
Add static asserts for mdspan_copyable
wphicks Oct 2, 2023
722425c
Correct iteration in host-to-host copies
wphicks Oct 2, 2023
0863db0
Fix double-defined target from branch merge
wphicks Oct 4, 2023
5c4349e
Merge branch 'branch-23.12' into fea-mdspan_copy
cjnolet Oct 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions cpp/bench/prims/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ function(ConfigureBench)
PRIVATE raft::raft
raft_internal
$<$<BOOL:${ConfigureBench_LIB}>:raft::compiled>
${RAFT_CTK_MATH_DEPENDENCIES}
benchmark::benchmark
Threads::Threads
$<TARGET_NAME_IF_EXISTS:OpenMP::OpenMP_CXX>
Expand Down Expand Up @@ -73,6 +74,8 @@ function(ConfigureBench)
endfunction()

if(BUILD_PRIMS_BENCH)
ConfigureBench(NAME CORE_BENCH PATH bench/prims/core/copy.cu bench/prims/main.cpp)

ConfigureBench(
NAME CLUSTER_BENCH PATH bench/prims/cluster/kmeans_balanced.cu bench/prims/cluster/kmeans.cu
bench/prims/main.cpp OPTIONAL LIB EXPLICIT_INSTANTIATE_ONLY
Expand Down
401 changes: 401 additions & 0 deletions cpp/bench/prims/core/copy.cu

Large diffs are not rendered by default.

74 changes: 74 additions & 0 deletions cpp/include/raft/core/copy.cuh
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once
#include <raft/core/detail/copy.hpp>
namespace raft {
/**
* @brief Copy data from one mdspan to another with the same extents
*
* This function copies data from one mdspan to another, regardless of whether
* or not the mdspans have the same layout, memory type (host/device/managed)
* or data type. So long as it is possible to convert the data type from source
* to destination, and the extents are equal, this function should be able to
* perform the copy. Any necessary device operations will be stream-ordered via the CUDA stream
* provided by the `raft::resources` argument.
*
* This header includes a custom kernel used for copying data between
* completely arbitrary mdspans on device. To compile this function in a
* non-CUDA translation unit, `raft/core/copy.hpp` may be used instead. The
* pure C++ header will correctly compile even without a CUDA compiler.
* Depending on the specialization, this CUDA header may invoke the kernel and
* therefore require a CUDA compiler.
*
* Limitations: Currently this function does not support copying directly
* between two arbitrary mdspans on different CUDA devices. It is assumed that the caller sets the
* correct CUDA device. Furthermore, host-to-host copies that require a transformation of the
* underlying memory layout are currently not performant, although they are supported.
*
* Note that when copying to an mdspan with a non-unique layout (i.e. the same
* underlying memory is addressed by different element indexes), the source
* data must contain non-unique values for every non-unique destination
* element. If this is not the case, the behavior is undefined. Some copies
* to non-unique layouts which are well-defined will nevertheless fail with an
* exception to avoid race conditions in the underlying copy.
*
* @tparam DstType An mdspan type for the destination container.
* @tparam SrcType An mdspan type for the source container
* @param res raft::resources used to provide a stream for copies involving the
* device.
* @param dst The destination mdspan.
* @param src The source mdspan.
*/
template <typename DstType, typename SrcType>
detail::mdspan_copyable_with_kernel_t<DstType, SrcType> copy(resources const& res,
DstType&& dst,
SrcType&& src)
{
detail::copy(res, std::forward<DstType>(dst), std::forward<SrcType>(src));
}

#ifndef RAFT_NON_CUDA_COPY_IMPLEMENTED
#define RAFT_NON_CUDA_COPY_IMPLEMENTED
template <typename DstType, typename SrcType>
detail::mdspan_copyable_not_with_kernel_t<DstType, SrcType> copy(resources const& res,
DstType&& dst,
SrcType&& src)
{
detail::copy(res, std::forward<DstType>(dst), std::forward<SrcType>(src));
}
#endif
} // namespace raft
69 changes: 69 additions & 0 deletions cpp/include/raft/core/copy.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once
#include <raft/core/detail/copy.hpp>
namespace raft {

#ifndef RAFT_NON_CUDA_COPY_IMPLEMENTED
#define RAFT_NON_CUDA_COPY_IMPLEMENTED
/**
* @brief Copy data from one mdspan to another with the same extents
*
* This function copies data from one mdspan to another, regardless of whether
* or not the mdspans have the same layout, memory type (host/device/managed)
* or data type. So long as it is possible to convert the data type from source
* to destination, and the extents are equal, this function should be able to
* perform the copy.
*
* This header does _not_ include the custom kernel used for copying data
* between completely arbitrary mdspans on device. For arbitrary copies of this
* kind, `#include <raft/core/copy.cuh>` instead. Specializations of this
* function that require the custom kernel will be SFINAE-omitted when this
* header is used instead of `copy.cuh`. This header _does_ support
* device-to-device copies that can be performed with cuBLAS or a
* straightforward cudaMemcpy. Any necessary device operations will be stream-ordered via the CUDA
* stream provided by the `raft::resources` argument.
*
* Limitations: Currently this function does not support copying directly
* between two arbitrary mdspans on different CUDA devices. It is assumed that the caller sets the
* correct CUDA device. Furthermore, host-to-host copies that require a transformation of the
* underlying memory layout are currently not performant, although they are supported.
*
* Note that when copying to an mdspan with a non-unique layout (i.e. the same
* underlying memory is addressed by different element indexes), the source
* data must contain non-unique values for every non-unique destination
* element. If this is not the case, the behavior is undefined. Some copies
* to non-unique layouts which are well-defined will nevertheless fail with an
* exception to avoid race conditions in the underlying copy.
*
* @tparam DstType An mdspan type for the destination container.
* @tparam SrcType An mdspan type for the source container
* @param res raft::resources used to provide a stream for copies involving the
* device.
* @param dst The destination mdspan.
* @param src The source mdspan.
*/
template <typename DstType, typename SrcType>
detail::mdspan_copyable_not_with_kernel_t<DstType, SrcType> copy(resources const& res,
DstType&& dst,
SrcType&& src)
{
detail::copy(res, std::forward<DstType>(dst), std::forward<SrcType>(src));
}
#endif

} // namespace raft
23 changes: 23 additions & 0 deletions cpp/include/raft/core/cuda_support.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
namespace raft {
#ifndef RAFT_DISABLE_CUDA
auto constexpr static const CUDA_ENABLED = true;
#else
auto constexpr static const CUDA_ENABLED = false;
#endif
} // namespace raft
Loading