-
Notifications
You must be signed in to change notification settings - Fork 342
Modernize Thrust examples #5670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6de8ae1
e614728
b6efb9b
3b62cc1
4731c69
989f690
60238f3
0c28975
ba240a9
38d6771
027fc15
88376b8
48f456a
a24acc2
bd59431
0828bb6
5e32929
0841ade
6e58d56
02b49b4
30a6d4f
3557c0d
d171ad8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,8 +7,6 @@ | |
|
|
||
| #include <iostream> | ||
|
|
||
| #include "include/host_device.h" | ||
|
|
||
| // This example shows how to implement an arbitrary transformation of | ||
| // the form output[i] = F(first[i], second[i], third[i], ... ). | ||
| // In this example, we use a function with 3 inputs and 1 output. | ||
|
|
@@ -62,29 +60,20 @@ struct arbitrary_functor2 | |
|
|
||
| int main() | ||
| { | ||
| // allocate storage | ||
| thrust::device_vector<float> A(5); | ||
| thrust::device_vector<float> B(5); | ||
| thrust::device_vector<float> C(5); | ||
| // allocate and initialize | ||
| thrust::device_vector<float> A{3, 4, 0, 8, 2}; | ||
| thrust::device_vector<float> B{6, 7, 2, 1, 8}; | ||
| thrust::device_vector<float> C{2, 5, 7, 4, 3}; | ||
| thrust::device_vector<float> D1(5); | ||
|
|
||
| // clang-format off | ||
| // initialize input vectors | ||
| A[0] = 3; B[0] = 6; C[0] = 2; | ||
| A[1] = 4; B[1] = 7; C[1] = 5; | ||
| A[2] = 0; B[2] = 2; C[2] = 7; | ||
| A[3] = 8; B[3] = 1; C[3] = 4; | ||
| A[4] = 2; B[4] = 8; C[4] = 3; | ||
| // clang-format on | ||
|
|
||
| // apply the transformation | ||
| thrust::for_each(thrust::make_zip_iterator(A.begin(), B.begin(), C.begin(), D1.begin()), | ||
| thrust::make_zip_iterator(A.end(), B.end(), C.end(), D1.end()), | ||
| arbitrary_functor1()); | ||
|
|
||
| // print the output | ||
| std::cout << "Tuple functor" << std::endl; | ||
| for (int i = 0; i < 5; i++) | ||
| for (size_t i = 0; i < A.size(); i++) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't the preferred form Also,
for (
auto it = make_zip_iterator(make_tuple(begin(A), begin(B), begin(C), begin(D))));
it != make_zip_iterator(make_tuple(end(A), end(B), end(C), end(D))));
++it)
{
std::cout << std::format("{} + {} * {} = {}\n", *it);
}Maybe the auto zip_begin(auto containers..) {
return make_zip_iterator(make_tuple(begin(containers)...));
}
auto zip_end(auto containers..) {
return make_zip_iterator(make_tuple(end(containers)...));
}In which case the above simplifies further to for (
auto it = zip_begin(A, B, C, D); it != zip_end(A, B, C, D); ++it)
{
std::cout << std::format("{} + {} * {} = {}\n", *it);
}
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for the feedback! You are always free to create a PR yourself or start a discussion.
I have no preference here. The PR improved the situation by not using a magic number, which is good.
Yes, but iterating 4 ranges at the same time using a zip may also be a bit over-engineered. Using an index if fine here IMO. Examples should be easy.
Correct. Feel free to propose a PR to replace them by
CCCL still supports C++17, but I don't see a blocker with using C++20 in examples only. I will start a discussion internally.
Again, for example code I have no preference here. I agree with this when writing library code.
I think this does not increase readability or clarity of the example.
We have that today, just construct the zip_iterator(begin(A), begin(B), begin(C);Should deduce |
||
| { | ||
| std::cout << A[i] << " + " << B[i] << " * " << C[i] << " = " << D1[i] << std::endl; | ||
| } | ||
|
|
@@ -97,7 +86,7 @@ int main() | |
|
|
||
| // print the output | ||
| std::cout << "N-ary functor" << std::endl; | ||
| for (int i = 0; i < 5; i++) | ||
| for (size_t i = 0; i < A.size(); i++) | ||
| { | ||
| std::cout << A[i] << " + " << B[i] << " * " << C[i] << " = " << D2[i] << std::endl; | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,6 +16,8 @@ | |
|
|
||
| #pragma once | ||
|
|
||
| #include <cuda/__cccl_config> | ||
|
|
||
| #if !_CCCL_HAS_CUDA_COMPILER() | ||
|
|
||
| # ifndef __host__ | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alliepiper Is that fine? I believe we do not want to add all the noise of the
host_deviceincludes to all the examplesUh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miscco I was thinking there was something else, why CI tests were failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to show in the examples that users need to be mindful of
__host__ __device__annotations when targeting multiple backends. I'd almost prefer directly including the contents ofhost_device.hin each example, without the_CCCLmacros, to show users how to implement this themselves.It's noisy and repetitive, but might be worth doing with a good comment explaining why/when it's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, what you have here is functional and would be fine from a technical standpoint. I just don't like force-including important user-setup bits in example code, hidden magic is bad for teaching IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should move the examples, what does
device_vectoreven mean in the context of TBB and openMP?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move them?
It's backed by host memory when those backends are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that anything that uses a
device_vectorshould live inexamples/cudaand then use appropriately annotated functions.But anything in
examplesshould work out of the box for any backend without any magic interventionThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miscco
device_vectorallocates memory on the current Thrust device system. If that is CUDA, it's CUDA device memory. If the device system is TBB, OMP or CPP, then adevice_vectorjust behaves like a host vector. This is so Thrust can switch backends with the preprocessor.