Switch memory functions to only work with regions (and spans) #322

eyalroz · 2022-04-27T14:09:12Z

Look at our modified vectorAdd example. It's certainly nicer than the original, but it's just sad that we have to repeat ourselves again and again with respect to lengths and sizes:

int numElements = 50000;
size_t size = numElements * sizeof(float); 
// ... snip ...
std::generate(h_A.get(), h_A.get() + numElements, generator);
// ... snip ...
cuda::memory::copy(d_A.get(), h_A.get(), size);
// ... snip ...
cuda::memory::copy(h_C.get(), d_C.get(), size);

instead of being able to say:

auto h_A = make_unique_span<float>(numElements);
std::genertate(std::begin(h_A), std::end(sh_A), generator);
cuda::memory::copy(d_A, h_A);
// ... snip ...
cuda::memory::copy(h_C, d_C);

This is in no small part due to our use of standard-library-like unique pointers rather than unique regions (see issue #291 ); and that in turn is partially motivated by how our memory functions are mostly pointer-oriented, not region-oriented. Now that we have region classes - why should we not just turn away from the raw CUDA API style of passing pointers around, and instead only work with sized ranges? (And perhaps also spans, for C++20; or even bundle a span implementation for older distros?)

The text was updated successfully, but these errors were encountered:

bgianfo · 2023-09-13T18:43:07Z

This would be a great feature for memory safety, one I actually had hoped already existed but I ended up here realizing that it's not yet available. :)

eyalroz · 2023-09-13T21:34:24Z

@bgianfo : So, in your opinion, is it worth it to move away somewhat from the convention of the standard library - std::unique_ptr's and support for pointers - in favor of "unique regions" (i.e. basically spans with ownership)?

bgianfo · 2023-09-14T19:45:02Z

To give some context I am fairly new to the library, but I'm working on updating an older project and have been having good success so far. I think for the case where we copy a lot of arrays back and forth between device and host memory, I would like a way of using the type system to not have to keep passing the size of buffers in bytes around. I think having that information would also enable the library to even potentially catch corruptions in copy's before they happen.

It sounds like your proposed "unique region" solution would solve this problem nicely.

I think it would also enable a nice ergonomic enhancement where you could allocate host or device memory, and do the copy back to host/device with one call (essentially a "region clone"), something like:

constexpr int mat_size = 100;
auto h_mat_1 = cuda::memory::host::make_unique<float[]>(mat_size);
auto h_mat_2 = cuda::memory::host::make_unique<float[]>(mat_size);

populate_matrices(h_mat_1.get(), h_mat_2.get(), mat_size);

auto d_mat_1 = cuda::memory::device::make_unique<float[]>(device, mat_size);
auto d_mat_2 = cuda::memory::device::make_unique<float[]>(device, mat_size);
cuda::memory::copy(d_mat_1.get(), h_mat_1.get(), mat_size * sizeof(float));
cuda::memory::copy(d_mat_2.get(), h_mat_2.get(), mat_size * sizeof(float));

Could be simplified to with some helper method to something like this, if we were able to retain information about region sizes + types:

constexpr int mat_size = 100;
auto h_mat_1 = cuda::memory::host::make_unique_region<float[]>(mat_size);
auto h_mat_2 = cuda::memory::host::make_unique_region<float[]>(mat_size);

populate_matrices(h_mat_1.get(), h_mat_2.get(), mat_size);

auto d_mat_1 = h_mat_1.clone_to_device(device);
auto d_mat_2 = h_mat_2.clone_to_device(device);

The same thing could be applied to memory zeroing:

constexpr int mat_size = 100;
auto d_mat_1 = cuda::memory::device::make_unique_region<float[]>(device, mat_size);
d_mat_1.zero();

eyalroz · 2023-09-14T20:46:17Z

Well, I'll first say that I need feedback from at least a few users, including those with more experience than you describe, to make a decision...

Secondly, let me play the "devil's advocate" here. You write:

I would like a way of using the type system to not have to keep passing the size of buffers in bytes around.

but you already have that, with either cuda::span<T> or cuda::memory::region_t:

constexpr int mat_size = 100;
auto h_mat_1 = cuda::memory::host::make_unique<float[]>(mat_size);
auto h_mat_2 = cuda::memory::host::make_unique<float[]>(mat_size);
auto h_span_1 = cuda::span<float> { h_mat_1.get(), mat_size };
auto h_span_2 = cuda::span<float> { h_mat_2.get(), mat_size };

populate_matrices(h_span_1, h_span_2);

auto d_mat_1 = cuda::memory::device::make_unique<float[]>(device, mat_size);
auto d_mat_2 = cuda::memory::device::make_unique<float[]>(device, mat_size);
auto d_span_1 = cuda::span<float> { d_mat_1.get(), mat_size };
auto d_span_2 = cuda::span<float> { d_mat_2.get(), mat_size };

cuda::memory::copy(d_span_1, h_span_1);
cuda::memory::copy(d_span_2, h_span_2);

It's a bit verbose, but - is it not good enough? This is the dilemma.

Note that I could, in theory, define a "unique region" class and not drop all of the pointer-and-address API functions (for copying and allocation). That's a third option, which has the benefit of letting everyone do exactly what they want, but the detriment of not promoting safety and correctness, and enlarging the already huge set of variants of each of the copy, set, allocate etc. functions

* Added a `unique_region` class - non-templated * `unique_region`'s "decay" into `cuda::memory::region_t`'s - so there's no need to reimplement half the world for them * Added a missing variant of `cuda::memory::copy()` * Added an example program based on vectorAdd, which uses unique_regions (I would have replaced vectorAdd with it - had it not requirted to lines of source more than before...) * Removed some commented-out unique-pointer-related code

…an. It should provide the most flexibility in our example programs (and typical user programs) for direct use without explicit casting, re-specifying types and sizes, etc.

eyalroz · 2024-02-13T09:11:52Z

@bgianfo : Please have a look at the unique-regions branch. It has a make_unique_span() for a typed owning contiguous container, and make_unique_region for an untyped owning version of cuda::memory::region_t.

bgianfo · 2024-02-13T14:00:20Z

I looked through the branch it looks great IMHO. In retrospect the unique_span is exactly what I was hoping for and it nicely solves the issues I was concerned about (ergonomics + verbosity + opportunity for bugs from typos from pointer + size mismatch). I appreciate the time and effort to flesh this out. Thank you!

eyalroz · 2024-02-19T23:57:58Z

@bgianfo : I'm actually worried that it's a bit too much to have all of:

unique_ptr
unique_region
unique_span

and that maybe I should drop at least one of those. Not to mention how we also have cuda::memory::pointer class, which could in theory be merged with at least one of those.

eyalroz added question task labels Apr 27, 2022

eyalroz mentioned this issue Oct 3, 2023

Support for additional vendor GPUs #541

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch memory functions to only work with regions (and spans) #322

Switch memory functions to only work with regions (and spans) #322

eyalroz commented Apr 27, 2022 •

edited

Loading

bgianfo commented Sep 13, 2023

eyalroz commented Sep 13, 2023

bgianfo commented Sep 14, 2023 •

edited

Loading

eyalroz commented Sep 14, 2023

eyalroz commented Feb 13, 2024

bgianfo commented Feb 13, 2024

eyalroz commented Feb 19, 2024

Switch memory functions to only work with regions (and spans) #322

Switch memory functions to only work with regions (and spans) #322

Comments

eyalroz commented Apr 27, 2022 • edited Loading

bgianfo commented Sep 13, 2023

eyalroz commented Sep 13, 2023

bgianfo commented Sep 14, 2023 • edited Loading

eyalroz commented Sep 14, 2023

eyalroz commented Feb 13, 2024

bgianfo commented Feb 13, 2024

eyalroz commented Feb 19, 2024

eyalroz commented Apr 27, 2022 •

edited

Loading

bgianfo commented Sep 14, 2023 •

edited

Loading