-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] mdspan integration. #437
Conversation
Please consider using GitHub's Draft PR feature instead of Some useful information about Draft PRs:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving ops-codeowner
file changes
Thank you for sharing. Converted the PR to draft. |
IMO, it's less important to exactly mirror
Do we really think this it is necessary/a good idea to have a host-side accessor to device memory that automatically copies D2H? |
cpp/include/raft/linalg/mdarray.h
Outdated
/** | ||
* @brief Indexing operator, use it sparingly since it triggers a device<->host copy. | ||
*/ | ||
template <typename... IndexType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to throw a compiler warning when this function is used for device arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can limit it to debug build if necessary. I think it's fine though, this is similar to thrust::device_vector.
Thanks for the suggestions.
For now, we just want to mimic the interface of |
5409a45
to
201ffea
Compare
Sorry for the force push, moved the file into |
Note to myself: Build 11.0,driver-450,centos7,3.7 (Jan 12, 2022 10:06:24 PM)
11.2,driver-495,ubuntu18.04,3.8
11.5,driver-495,ubuntu20.04,3.8
|
cpp/include/raft/mdarray.h
Outdated
template <typename... IndexType> | ||
auto operator()(IndexType&&... indices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add an overload of operator()
that takes a stream as it's first argument in order to provide stream-ordered access.
This should work without causing an ambiguous overload because cuda_stream_view
is strongly typed.
template <typename... IndexType>
auto operator()(rmm::cuda_stream_view stream, IndexType&&... indices)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion, Currently, the mdarray class leaves the cuda stream to container policy. Adding an extra stream parameter might cause conflict with the stream in the policy. I'm not entirely sure whether do we need an extra stream for accessing the operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, maybe we should add view()
overloads that would accept a new container policy as an argument? So that it would be easy to construct different views per-stream.
cpp/include/raft/mdarray.h
Outdated
auto view() const noexcept { return const_view_type(c_.data(), map_, cp_.view()); } | ||
|
||
[[nodiscard]] constexpr auto size() const noexcept -> size_type { return this->view().size(); } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resize
functionality would be helpful. It would be tricky to promise std::vector::resize
kind of functionality though. If we resize to a larger size, keeping the values at lower indices the same as the old values is not trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be concerned about the resize
method might violate the shape of the mdarray
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean resizing a 3-dimensional array with 2-dimensional map? If so, that can be made as a compilation error, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rg20 are you suggesting resize would allocate a larger / smaller chunk of device memory (like std::vector::resize, which changes the data shape) or are you suggesting a numpy reshape where we change the shape but the amount of memory stays the same?
Won't changing the amount of memory allocated will always have an impact on the resulting shape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am talking more of std::vector::resize
. Consider an example where I have an mdarray
as a member variable but I would only initialize it much later (I can do this because there is a default
constructor). Typically, we would just do resize whenever we need (for std::vector
or rmm::device_uvector
).
We can still manage with the existing implementation because we can just create a new 'mdarray' and just move
it. My suggestion is more of making it look like other containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing we coulda accomplish this with dynamic extents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I did not realize that the extents in the current implementation are compile-time and so is the specification of C++ standard.
We could only do this using dynamic extents. Is there a particular reason (other than the standard specification) the extents have to be compile time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason (other than the standard specification) the extents have to be compile time?
The spec is reason enough. It is that way because there are optimization benefits to extents being statically known when possible, e.g., for loop unrolling, vectorization, modulo optimization, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the extent itself can be dynamic and is supported by the spec and this implementation. So when the extent is dynamic, we could allow resizing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rg20, I do agree with your general idea here. There are many times we allocate an empty array (especially in the case of sparse formats where we know we need 3 arrays but don't yet know the number of nonzeros) so we allocate an empty array and resize later.
Agreed!
Agreed! With reshape, we just need a runtime check if the total size is the same or not. |
I will write a document for the requested features after the PR is finished. The PR at its current state should be sufficient for replacing pointer and |
detail::matrix_extent extents{n_rows, n_cols}; | ||
return host_matrix_view<ElementType>{ptr, extents}; | ||
} | ||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we provide an option with a default here to allow for both c- and f-contiguous layout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the optional layout parameter along with a test showing how to see whether input is c/f-contiguous. Hope that helps.
Also, due to the implementation of submdspan, the returned slice has default_accessor
instead of our host/device tagged accessor. I will investigate it further, for now, it will be a minor inconvenience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Though, I might suggest we expose a version of the layout to the public API so that users don't have to call into the detail namespace. If we can, it would also be nice to rename the public version to "layout::colmajor" and "layout::rowmajor" (maybe just a simple typedef or "using" would be enough?) to make it easier on the users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjnolet There are additional details in here.
I might suggest we expose a version of the layout to the public API so that users don't have to call into the detail namespace
Agreed, I will expose some more things from mdspan ref impl. This will replace stdex
with raft
. (no detail
in here so I'm not sure if I understand you correctly).
If we can, it would also be nice to rename the public version to "layout::colmajor" and "layout::rowmajor"
This might be misleading. (correct me if I'm wrong) According to the design of stdex::mdspan
, stdex::layout_right
is c-contiguous, which implies row-major and contiguous. An array can be row-major but not contiguous. From this perspective, I think it might be better if we stick to the mdspan naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw your PR on using it for distance metrics. I think it would be better if we can pass the mdspan all the way down to the kernels and be careful around the extent
and stride
when using it with cublas. Converting a span directly to a pointer is dangerous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(no detail in here so I'm not sure if I understand you correctly).
I'm referring to detail::stdex::layout_right
here.
This might be misleading.
I'm proposing both to be used. Many developers are going to come into this API understanding things like row-/col-major and c/f-contiguity but not understanding the right vs left verbiage. We can cover both cases if we provide both here. We can constrain the API if needed, but if we plan to support non-contiguous layouts in the future, I agree the names could best reflect that.
I think it would be better if we can pass the mdspan all the way down to the kernels and be careful around the extent and stride when using it with cublas.
Thanks for taking a look at the PR! It was intended to initiate discussions like this. RAFT's primitives are pretty extensive at this point and re-writing all of the kernels up front to suit this new API would be an extremely massive undertaking. I understand the mdspan's underlying indexing can be very flexible but I'd like to take a little step back and think through ways we can provide this functionality incrementally (within RAFT) without rewriting the entire library. With contiguity assumed (as it is currently), the row- vs col-major cases seem like a pretty simple introduction to using the mdspan as a facade. Shouldn't we be able to constrain these cases by testing the layout (potentially against the size of the pointer)? I think as we expand the kernels to use the mdspan directly for indexing then the API will just become more flexible. For now, the entire RAFT is assuming pointers w/ contiguous layouts already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added type alias along with brief documents.
@trivialfis, your work on the new array and span abstractions is critical to both the success of RAFT's new 22.04 API and many other projects across the C++ layer of the RAPIDS analytics stack. Since just about all of RAFT's primitives take raw pointers currently, I think initially we can expose what you've got so far to the public API while we continue to iterate and improve its use in the lower-level bits as well (as an example, the existing Since the mdarray/mdspan/span are so far along, my suggestion would be to capture some of the more significant remaining features (e.g. padding, resizing, flexible indexing types) in Github issues. |
BUILD_EXPORT_SET raft-exports | ||
INSTALL_EXPORT_SET raft-exports | ||
CPM_ARGS | ||
GIT_REPOSITORY https://github.com/trivialfis/mdspan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also want to consider whether the mdspan fork should be moved into an official rapidsai repository for the time being.
detail::matrix_extent extents{n_rows, n_cols}; | ||
return host_matrix_view<ElementType>{ptr, extents}; | ||
} | ||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(no detail in here so I'm not sure if I understand you correctly).
I'm referring to detail::stdex::layout_right
here.
This might be misleading.
I'm proposing both to be used. Many developers are going to come into this API understanding things like row-/col-major and c/f-contiguity but not understanding the right vs left verbiage. We can cover both cases if we provide both here. We can constrain the API if needed, but if we plan to support non-contiguous layouts in the future, I agree the names could best reflect that.
I think it would be better if we can pass the mdspan all the way down to the kernels and be careful around the extent and stride when using it with cublas.
Thanks for taking a look at the PR! It was intended to initiate discussions like this. RAFT's primitives are pretty extensive at this point and re-writing all of the kernels up front to suit this new API would be an extremely massive undertaking. I understand the mdspan's underlying indexing can be very flexible but I'd like to take a little step back and think through ways we can provide this functionality incrementally (within RAFT) without rewriting the entire library. With contiguity assumed (as it is currently), the row- vs col-major cases seem like a pretty simple introduction to using the mdspan as a facade. Shouldn't we be able to constrain these cases by testing the layout (potentially against the size of the pointer)? I think as we expand the kernels to use the mdspan directly for indexing then the API will just become more flexible. For now, the entire RAFT is assuming pointers w/ contiguous layouts already.
The one last thing to do for this PR is to decide how to handle the upstream changes. There are 2 commits specifically: trivialfis/mdspan@0193f07: Fix a compiler issue. I think the first one should be easy to upstream, not sure about the second one yet. |
@trivialfis, I agree and I think this PR doesn't need to rely on the second item. I think we can open issues for the indexing types, padding, and resize/reshape capabilities as follow-on items and get this PR in after the compiler issue has been merged upstream. What do you think? |
rerun tests |
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@gpucibot merge |
This is an early version of the integration, it doesn't have padding support, the file is placed under test directory. Opening a PR so that we can continue the design discussion with actual code we can reference.
For
mdspan
, it's currently using a fork of mine, but in general, the required changes are trivial so it's directly integrated as a cmake module. Padding will add some more complexity, but I think we can implement it as custom layouts.For
mdarray
, due to theoperator()
, copy constructor, and CUDA stream, I copied the reference implementation with some substantial changes. (need to add license).operator()
, I have created adevice_reference
that's a simplified version ofthrust::device_reference
but supports the CUDA stream.extent
template parameter, I have changed the signature to matchmdspan
.rmm::device_uvector
with a custom policy.container_policy
formdspan::accessor_policy
. This is done by having multiple overloads ofaccess
methods, includingreference access(container const& c)
andreference access(pointer ptr)
, and the latter is a host device function. I have split it up into 2 policies as the container policy might have unwanted states for kernels like CUDA stream.Lastly, I have added host and device varients of both mdarray and mdspan.
@cjnolet @divyegala @achirkin
Related:
Todos: