-
-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch memory functions to only work with regions (and spans) #322
Comments
This would be a great feature for memory safety, one I actually had hoped already existed but I ended up here realizing that it's not yet available. :) |
@bgianfo : So, in your opinion, is it worth it to move away somewhat from the convention of the standard library - |
To give some context I am fairly new to the library, but I'm working on updating an older project and have been having good success so far. I think for the case where we copy a lot of arrays back and forth between device and host memory, I would like a way of using the type system to not have to keep passing the size of buffers in bytes around. I think having that information would also enable the library to even potentially catch corruptions in copy's before they happen. It sounds like your proposed "unique region" solution would solve this problem nicely. I think it would also enable a nice ergonomic enhancement where you could allocate host or device memory, and do the copy back to host/device with one call (essentially a "region clone"), something like: constexpr int mat_size = 100;
auto h_mat_1 = cuda::memory::host::make_unique<float[]>(mat_size);
auto h_mat_2 = cuda::memory::host::make_unique<float[]>(mat_size);
populate_matrices(h_mat_1.get(), h_mat_2.get(), mat_size);
auto d_mat_1 = cuda::memory::device::make_unique<float[]>(device, mat_size);
auto d_mat_2 = cuda::memory::device::make_unique<float[]>(device, mat_size);
cuda::memory::copy(d_mat_1.get(), h_mat_1.get(), mat_size * sizeof(float));
cuda::memory::copy(d_mat_2.get(), h_mat_2.get(), mat_size * sizeof(float)); Could be simplified to with some helper method to something like this, if we were able to retain information about region sizes + types: constexpr int mat_size = 100;
auto h_mat_1 = cuda::memory::host::make_unique_region<float[]>(mat_size);
auto h_mat_2 = cuda::memory::host::make_unique_region<float[]>(mat_size);
populate_matrices(h_mat_1.get(), h_mat_2.get(), mat_size);
auto d_mat_1 = h_mat_1.clone_to_device(device);
auto d_mat_2 = h_mat_2.clone_to_device(device); The same thing could be applied to memory zeroing: constexpr int mat_size = 100;
auto d_mat_1 = cuda::memory::device::make_unique_region<float[]>(device, mat_size);
d_mat_1.zero(); |
Well, I'll first say that I need feedback from at least a few users, including those with more experience than you describe, to make a decision... Secondly, let me play the "devil's advocate" here. You write:
but you already have that, with either
It's a bit verbose, but - is it not good enough? This is the dilemma. Note that I could, in theory, define a "unique region" class and not drop all of the pointer-and-address API functions (for copying and allocation). That's a third option, which has the benefit of letting everyone do exactly what they want, but the detriment of not promoting safety and correctness, and enlarging the already huge set of variants of each of the copy, set, allocate etc. functions |
* Added a `unique_region` class - non-templated * `unique_region`'s "decay" into `cuda::memory::region_t`'s - so there's no need to reimplement half the world for them * Added a missing variant of `cuda::memory::copy()` * Added an example program based on vectorAdd, which uses unique_regions (I would have replaced vectorAdd with it - had it not requirted to lines of source more than before...) * Removed some commented-out unique-pointer-related code
…an. It should provide the most flexibility in our example programs (and typical user programs) for direct use without explicit casting, re-specifying types and sizes, etc.
@bgianfo : Please have a look at the unique-regions branch. It has a |
I looked through the branch it looks great IMHO. In retrospect the |
@bgianfo : I'm actually worried that it's a bit too much to have all of:
and that maybe I should drop at least one of those. Not to mention how we also have |
Look at our modified vectorAdd example. It's certainly nicer than the original, but it's just sad that we have to repeat ourselves again and again with respect to lengths and sizes:
instead of being able to say:
This is in no small part due to our use of standard-library-like unique pointers rather than unique regions (see issue #291 ); and that in turn is partially motivated by how our memory functions are mostly pointer-oriented, not region-oriented. Now that we have region classes - why should we not just turn away from the raw CUDA API style of passing pointers around, and instead only work with sized ranges? (And perhaps also spans, for C++20; or even bundle a span implementation for older distros?)
The text was updated successfully, but these errors were encountered: