Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch memory functions to only work with regions (and spans) #322

Open
eyalroz opened this issue Apr 27, 2022 · 7 comments
Open

Switch memory functions to only work with regions (and spans) #322

eyalroz opened this issue Apr 27, 2022 · 7 comments

Comments

@eyalroz
Copy link
Owner

eyalroz commented Apr 27, 2022

Look at our modified vectorAdd example. It's certainly nicer than the original, but it's just sad that we have to repeat ourselves again and again with respect to lengths and sizes:

int numElements = 50000;
size_t size = numElements * sizeof(float); 
// ... snip ...
std::generate(h_A.get(), h_A.get() + numElements, generator);
// ... snip ...
cuda::memory::copy(d_A.get(), h_A.get(), size);
// ... snip ...
cuda::memory::copy(h_C.get(), d_C.get(), size);

instead of being able to say:

auto h_A = make_unique_span<float>(numElements);
std::genertate(std::begin(h_A), std::end(sh_A), generator);
cuda::memory::copy(d_A, h_A);
// ... snip ...
cuda::memory::copy(h_C, d_C);

This is in no small part due to our use of standard-library-like unique pointers rather than unique regions (see issue #291 ); and that in turn is partially motivated by how our memory functions are mostly pointer-oriented, not region-oriented. Now that we have region classes - why should we not just turn away from the raw CUDA API style of passing pointers around, and instead only work with sized ranges? (And perhaps also spans, for C++20; or even bundle a span implementation for older distros?)

@bgianfo
Copy link

bgianfo commented Sep 13, 2023

This would be a great feature for memory safety, one I actually had hoped already existed but I ended up here realizing that it's not yet available. :)

@eyalroz
Copy link
Owner Author

eyalroz commented Sep 13, 2023

@bgianfo : So, in your opinion, is it worth it to move away somewhat from the convention of the standard library - std::unique_ptr's and support for pointers - in favor of "unique regions" (i.e. basically spans with ownership)?

@bgianfo
Copy link

bgianfo commented Sep 14, 2023

To give some context I am fairly new to the library, but I'm working on updating an older project and have been having good success so far. I think for the case where we copy a lot of arrays back and forth between device and host memory, I would like a way of using the type system to not have to keep passing the size of buffers in bytes around. I think having that information would also enable the library to even potentially catch corruptions in copy's before they happen.

It sounds like your proposed "unique region" solution would solve this problem nicely.

I think it would also enable a nice ergonomic enhancement where you could allocate host or device memory, and do the copy back to host/device with one call (essentially a "region clone"), something like:

constexpr int mat_size = 100;
auto h_mat_1 = cuda::memory::host::make_unique<float[]>(mat_size);
auto h_mat_2 = cuda::memory::host::make_unique<float[]>(mat_size);

populate_matrices(h_mat_1.get(), h_mat_2.get(), mat_size);

auto d_mat_1 = cuda::memory::device::make_unique<float[]>(device, mat_size);
auto d_mat_2 = cuda::memory::device::make_unique<float[]>(device, mat_size);
cuda::memory::copy(d_mat_1.get(), h_mat_1.get(), mat_size * sizeof(float));
cuda::memory::copy(d_mat_2.get(), h_mat_2.get(), mat_size * sizeof(float));

Could be simplified to with some helper method to something like this, if we were able to retain information about region sizes + types:

constexpr int mat_size = 100;
auto h_mat_1 = cuda::memory::host::make_unique_region<float[]>(mat_size);
auto h_mat_2 = cuda::memory::host::make_unique_region<float[]>(mat_size);

populate_matrices(h_mat_1.get(), h_mat_2.get(), mat_size);

auto d_mat_1 = h_mat_1.clone_to_device(device);
auto d_mat_2 = h_mat_2.clone_to_device(device);

The same thing could be applied to memory zeroing:

constexpr int mat_size = 100;
auto d_mat_1 = cuda::memory::device::make_unique_region<float[]>(device, mat_size);
d_mat_1.zero();

@eyalroz
Copy link
Owner Author

eyalroz commented Sep 14, 2023

Well, I'll first say that I need feedback from at least a few users, including those with more experience than you describe, to make a decision...

Secondly, let me play the "devil's advocate" here. You write:

I would like a way of using the type system to not have to keep passing the size of buffers in bytes around.

but you already have that, with either cuda::span<T> or cuda::memory::region_t:

constexpr int mat_size = 100;
auto h_mat_1 = cuda::memory::host::make_unique<float[]>(mat_size);
auto h_mat_2 = cuda::memory::host::make_unique<float[]>(mat_size);
auto h_span_1 = cuda::span<float> { h_mat_1.get(), mat_size };
auto h_span_2 = cuda::span<float> { h_mat_2.get(), mat_size };

populate_matrices(h_span_1, h_span_2);

auto d_mat_1 = cuda::memory::device::make_unique<float[]>(device, mat_size);
auto d_mat_2 = cuda::memory::device::make_unique<float[]>(device, mat_size);
auto d_span_1 = cuda::span<float> { d_mat_1.get(), mat_size };
auto d_span_2 = cuda::span<float> { d_mat_2.get(), mat_size };

cuda::memory::copy(d_span_1, h_span_1);
cuda::memory::copy(d_span_2, h_span_2);

It's a bit verbose, but - is it not good enough? This is the dilemma.

Note that I could, in theory, define a "unique region" class and not drop all of the pointer-and-address API functions (for copying and allocation). That's a third option, which has the benefit of letting everyone do exactly what they want, but the detriment of not promoting safety and correctness, and enlarging the already huge set of variants of each of the copy, set, allocate etc. functions

eyalroz added a commit that referenced this issue Feb 13, 2024
* Added a `unique_region` class - non-templated
* `unique_region`'s "decay" into `cuda::memory::region_t`'s - so there's no need to reimplement half the world for them
* Added a missing variant of `cuda::memory::copy()`
* Added an example program based on vectorAdd, which uses unique_regions (I would have replaced vectorAdd with it - had it not requirted to lines of source more than before...)
* Removed some commented-out unique-pointer-related code
eyalroz added a commit that referenced this issue Feb 13, 2024
…an. It should provide the most flexibility in our example programs (and typical user programs) for direct use without explicit casting, re-specifying types and sizes, etc.
@eyalroz
Copy link
Owner Author

eyalroz commented Feb 13, 2024

@bgianfo : Please have a look at the unique-regions branch. It has a make_unique_span() for a typed owning contiguous container, and make_unique_region for an untyped owning version of cuda::memory::region_t.

@bgianfo
Copy link

bgianfo commented Feb 13, 2024

I looked through the branch it looks great IMHO. In retrospect the unique_span is exactly what I was hoping for and it nicely solves the issues I was concerned about (ergonomics + verbosity + opportunity for bugs from typos from pointer + size mismatch). I appreciate the time and effort to flesh this out. Thank you!

@eyalroz
Copy link
Owner Author

eyalroz commented Feb 19, 2024

@bgianfo : I'm actually worried that it's a bit too much to have all of:

  • unique_ptr
  • unique_region
  • unique_span

and that maybe I should drop at least one of those. Not to mention how we also have cuda::memory::pointer class, which could in theory be merged with at least one of those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants