[REVIEW] mdspan integration. #437

trivialfis · 2022-01-05T13:32:33Z

This is an early version of the integration, it doesn't have padding support, the file is placed under test directory. Opening a PR so that we can continue the design discussion with actual code we can reference.

For mdspan, it's currently using a fork of mine, but in general, the required changes are trivial so it's directly integrated as a cmake module. Padding will add some more complexity, but I think we can implement it as custom layouts.

For mdarray, due to the operator(), copy constructor, and CUDA stream, I copied the reference implementation with some substantial changes. (need to add license).

For the operator(), I have created a device_reference that's a simplified version of thrust::device_reference but supports the CUDA stream.
For supporting CUDA stream, I have removed most of the constructors to make sure there's a valid stream being passed into mdarray. Also, a new container policy is required to store the stream.
For the extent template parameter, I have changed the signature to match mdspan.
To integrate rmm allocator, I made a thin wrapper over rmm::device_uvector with a custom policy.
For the container policy, the ref implementation reuses container_policy for mdspan::accessor_policy. This is done by having multiple overloads of access methods, including reference access(container const& c) and reference access(pointer ptr), and the latter is a host device function. I have split it up into 2 policies as the container policy might have unwanted states for kernels like CUDA stream.

Lastly, I have added host and device varients of both mdarray and mdspan.

@cjnolet @divyegala @achirkin

cpp/include/raft/linalg/mdarray.h

ajschmidt8 · 2022-01-10T20:13:25Z

Please consider using GitHub's Draft PR feature instead of WIP tags in the future. Draft PRs have the benefit of preventing notifications to codeowners until PRs are marked Ready for Review. CI will still run on Draft PRs.

Some useful information about Draft PRs:

ajschmidt8

Approving ops-codeowner file changes

trivialfis · 2022-01-11T04:24:19Z

Thank you for sharing. Converted the PR to draft.

jrhemstad · 2022-01-12T21:13:24Z

IMO, it's less important to exactly mirror mdarray. mdspan is the important vocabulary type to use at interface boundaries.

Specifically, I have created a device_ref that's a simplified version of thrust::device_reference but supports the CUDA stream.

Do we really think this it is necessary/a good idea to have a host-side accessor to device memory that automatically copies D2H?

rg20 · 2022-01-12T22:10:41Z

cpp/include/raft/linalg/mdarray.h

+  /**
+   * @brief Indexing operator, use it sparingly since it triggers a device<->host copy.
+   */
+  template <typename... IndexType>


Does it make sense to throw a compiler warning when this function is used for device arrays?

We can limit it to debug build if necessary. I think it's fine though, this is similar to thrust::device_vector.

cpp/include/raft/linalg/mdarray.h

trivialfis · 2022-01-13T04:04:44Z

@jrhemstad

IMO, it's less important to exactly mirror mdarray. mdspan is the important vocabulary type to use at interface boundaries.

Thanks for the suggestions.

Do we really think this it is necessary/a good idea to have a host-side accessor to device memory that automatically copies D2H?

For now, we just want to mimic the interface of mdarray as much as possible to avoid reinventing the wheel. The feature is not so important other than debugging purposes.

cpp/include/raft/linalg/mdarray.h

trivialfis · 2022-01-13T06:06:05Z

Sorry for the force push, moved the file into raft/ and rebased onto the latest.

trivialfis · 2022-01-13T06:52:35Z

Note to myself:

Build 11.0,driver-450,centos7,3.7 (Jan 12, 2022 10:06:24 PM)

Error: Internal Compiler Error (codegen): "there was an error in verifying the lgenfe output!"

11.2,driver-495,ubuntu18.04,3.8

05:06:31 i8** null
05:06:31 i8* %call14 = call i32 null(void (i8*)* null, i8* null, i8** null), !dbg !968
05:06:31 : parse Invalid record (Producer: 'LLVM7.0.1' Reader: 'LLVM 7.0.1')

11.5,driver-495,ubuntu20.04,3.8

21:09:53 ../test/mdarray.cu(39): error: identifier "__assert_fail" is undefined

jrhemstad · 2022-01-13T14:24:56Z

cpp/include/raft/mdarray.h

+  template <typename... IndexType>
+  auto operator()(IndexType&&... indices)


You could add an overload of operator() that takes a stream as it's first argument in order to provide stream-ordered access.

This should work without causing an ambiguous overload because cuda_stream_view is strongly typed.

template <typename... IndexType> auto operator()(rmm::cuda_stream_view stream, IndexType&&... indices)

Thank you for the suggestion, Currently, the mdarray class leaves the cuda stream to container policy. Adding an extra stream parameter might cause conflict with the stream in the policy. I'm not entirely sure whether do we need an extra stream for accessing the operator.

Then, maybe we should add view() overloads that would accept a new container policy as an argument? So that it would be easy to construct different views per-stream.

rg20 · 2022-01-13T14:27:12Z

cpp/include/raft/mdarray.h

+  auto view() const noexcept { return const_view_type(c_.data(), map_, cp_.view()); }
+
+  [[nodiscard]] constexpr auto size() const noexcept -> size_type { return this->view().size(); }
+


resize functionality would be helpful. It would be tricky to promise std::vector::resize kind of functionality though. If we resize to a larger size, keeping the values at lower indices the same as the old values is not trivial.

We need to be concerned about the resize method might violate the shape of the mdarray.

do you mean resizing a 3-dimensional array with 2-dimensional map? If so, that can be made as a compilation error, right?

@rg20 are you suggesting resize would allocate a larger / smaller chunk of device memory (like std::vector::resize, which changes the data shape) or are you suggesting a numpy reshape where we change the shape but the amount of memory stays the same?

Won't changing the amount of memory allocated will always have an impact on the resulting shape?

I am talking more of std::vector::resize. Consider an example where I have an mdarray as a member variable but I would only initialize it much later (I can do this because there is a default constructor). Typically, we would just do resize whenever we need (for std::vector or rmm::device_uvector).

We can still manage with the existing implementation because we can just create a new 'mdarray' and just move it. My suggestion is more of making it look like other containers.

I'm guessing we coulda accomplish this with dynamic extents?

Ah, I did not realize that the extents in the current implementation are compile-time and so is the specification of C++ standard.

We could only do this using dynamic extents. Is there a particular reason (other than the standard specification) the extents have to be compile time?

Is there a particular reason (other than the standard specification) the extents have to be compile time?

The spec is reason enough. It is that way because there are optimization benefits to extents being statically known when possible, e.g., for loop unrolling, vectorization, modulo optimization, etc.

Looks like the extent itself can be dynamic and is supported by the spec and this implementation. So when the extent is dynamic, we could allow resizing.

@rg20, I do agree with your general idea here. There are many times we allocate an empty array (especially in the case of sparse formats where we know we need 3 arrays but don't yet know the number of nonzeros) so we allocate an empty array and resize later.

rg20 · 2022-01-26T15:38:16Z

@rg20 @trivialfis

Proper resizing will be more useful for the sparse case since they will always be vectors. One of the reasons this is important is for making sure the sparse format always has a consistent and valid state by always having its 3 mdarray instance variables be valid arrays, starting with a size of 0 and resizing once the number of nonzeros are known (by manipulating those instances directly instead of creating new instances). Even though the device memory underneath might change, the mdarray itself hides that from us and always presents usable memory up to the product of the extents.

Agreed!

I also think that a reshape functionality would be very useful. There are many cases where we may even reshape into a vector so we can perform some vector-wise operation on all the elements before reshaping back into a 2d matrix. If it's too hard to add reshaping to the mdspan without upstreaming the changes, we could also just use a helper function for this for now since the mdspan is so lightweight:

auto reshaped = raft::reshape(mdspan_obj, new_extents, stream);

Agreed! With reshape, we just need a runtime check if the total size is the same or not.

trivialfis · 2022-01-26T17:48:12Z

I will write a document for the requested features after the PR is finished. The PR at its current state should be sufficient for replacing pointer and rmm::device_uvector. Sorry for keeping the PR's scope to be smaller than desired. To me array indexing and slicing are difficult and error prone (that's why I want to eliminate the use of raw pointer), I need to have a clear description of things like what to do when reshaping a padded array before implementing them. My apologies for the lack of efficiency.

cjnolet · 2022-01-26T15:08:32Z

cpp/include/raft/mdarray.hpp

+  detail::matrix_extent extents{n_rows, n_cols};
+  return host_matrix_view<ElementType>{ptr, extents};
+}
+/**


Can we provide an option with a default here to allow for both c- and f-contiguous layout?

Added the optional layout parameter along with a test showing how to see whether input is c/f-contiguous. Hope that helps.

Also, due to the implementation of submdspan, the returned slice has default_accessor instead of our host/device tagged accessor. I will investigate it further, for now, it will be a minor inconvenience.

This looks great! Though, I might suggest we expose a version of the layout to the public API so that users don't have to call into the detail namespace. If we can, it would also be nice to rename the public version to "layout::colmajor" and "layout::rowmajor" (maybe just a simple typedef or "using" would be enough?) to make it easier on the users.

@cjnolet There are additional details in here.

I might suggest we expose a version of the layout to the public API so that users don't have to call into the detail namespace

Agreed, I will expose some more things from mdspan ref impl. This will replace stdex with raft. (no detail in here so I'm not sure if I understand you correctly).

If we can, it would also be nice to rename the public version to "layout::colmajor" and "layout::rowmajor"

This might be misleading. (correct me if I'm wrong) According to the design of stdex::mdspan, stdex::layout_right is c-contiguous, which implies row-major and contiguous. An array can be row-major but not contiguous. From this perspective, I think it might be better if we stick to the mdspan naming.

I saw your PR on using it for distance metrics. I think it would be better if we can pass the mdspan all the way down to the kernels and be careful around the extent and stride when using it with cublas. Converting a span directly to a pointer is dangerous.

(no detail in here so I'm not sure if I understand you correctly).

I'm referring to detail::stdex::layout_right here.

This might be misleading.

I'm proposing both to be used. Many developers are going to come into this API understanding things like row-/col-major and c/f-contiguity but not understanding the right vs left verbiage. We can cover both cases if we provide both here. We can constrain the API if needed, but if we plan to support non-contiguous layouts in the future, I agree the names could best reflect that.

I think it would be better if we can pass the mdspan all the way down to the kernels and be careful around the extent and stride when using it with cublas.

Thanks for taking a look at the PR! It was intended to initiate discussions like this. RAFT's primitives are pretty extensive at this point and re-writing all of the kernels up front to suit this new API would be an extremely massive undertaking. I understand the mdspan's underlying indexing can be very flexible but I'd like to take a little step back and think through ways we can provide this functionality incrementally (within RAFT) without rewriting the entire library. With contiguity assumed (as it is currently), the row- vs col-major cases seem like a pretty simple introduction to using the mdspan as a facade. Shouldn't we be able to constrain these cases by testing the layout (potentially against the size of the pointer)? I think as we expand the kernels to use the mdspan directly for indexing then the API will just become more flexible. For now, the entire RAFT is assuming pointers w/ contiguous layouts already.

Added type alias along with brief documents.

cjnolet · 2022-01-26T22:05:46Z

Sorry for keeping the PR's scope to be smaller than desired. To me array indexing and slicing are difficult and error prone (that's why I want to eliminate the use of raw pointer), I need to have a clear description of things like what to do when reshaping a padded array before implementing them. My apologies for the lack of efficiency.

@trivialfis, your work on the new array and span abstractions is critical to both the success of RAFT's new 22.04 API and many other projects across the C++ layer of the RAPIDS analytics stack. Since just about all of RAFT's primitives take raw pointers currently, I think initially we can expose what you've got so far to the public API while we continue to iterate and improve its use in the lower-level bits as well (as an example, the existing pairwise_distances function can go from pairwise_distances(handle, in_ptr, m, n, k, out_ptr, metric); to pairwise_distances(handle, in_mdspan, out_mdspan, metric); and can delegate to the existing function in the detail namespace).

Since the mdarray/mdspan/span are so far along, my suggestion would be to capture some of the more significant remaining features (e.g. padding, resizing, flexible indexing types) in Github issues.

cpp/include/raft/mdarray.hpp

cjnolet · 2022-01-28T14:22:46Z

cpp/cmake/thirdparty/get_mdspan.cmake

+    BUILD_EXPORT_SET    raft-exports
+    INSTALL_EXPORT_SET  raft-exports
+    CPM_ARGS
+      GIT_REPOSITORY https://github.com/trivialfis/mdspan


We might also want to consider whether the mdspan fork should be moved into an official rapidsai repository for the time being.

cjnolet · 2022-01-28T18:19:43Z

cpp/include/raft/mdarray.hpp

+  detail::matrix_extent extents{n_rows, n_cols};
+  return host_matrix_view<ElementType>{ptr, extents};
+}
+/**


(no detail in here so I'm not sure if I understand you correctly).

I'm referring to detail::stdex::layout_right here.

This might be misleading.

I'm proposing both to be used. Many developers are going to come into this API understanding things like row-/col-major and c/f-contiguity but not understanding the right vs left verbiage. We can cover both cases if we provide both here. We can constrain the API if needed, but if we plan to support non-contiguous layouts in the future, I agree the names could best reflect that.

I think it would be better if we can pass the mdspan all the way down to the kernels and be careful around the extent and stride when using it with cublas.

Thanks for taking a look at the PR! It was intended to initiate discussions like this. RAFT's primitives are pretty extensive at this point and re-writing all of the kernels up front to suit this new API would be an extremely massive undertaking. I understand the mdspan's underlying indexing can be very flexible but I'd like to take a little step back and think through ways we can provide this functionality incrementally (within RAFT) without rewriting the entire library. With contiguity assumed (as it is currently), the row- vs col-major cases seem like a pretty simple introduction to using the mdspan as a facade. Shouldn't we be able to constrain these cases by testing the layout (potentially against the size of the pointer)? I think as we expand the kernels to use the mdspan directly for indexing then the API will just become more flexible. For now, the entire RAFT is assuming pointers w/ contiguous layouts already.

trivialfis · 2022-02-07T09:26:45Z

The one last thing to do for this PR is to decide how to handle the upstream changes. There are 2 commits specifically:

trivialfis/mdspan@0193f07: Fix a compiler issue.
trivialfis/mdspan@f18e694: Avoid size_t as index type.

I think the first one should be easy to upstream, not sure about the second one yet.

cjnolet · 2022-02-07T18:11:08Z

I think the first one should be easy to upstream, not sure about the second one yet

@trivialfis, I agree and I think this PR doesn't need to rely on the second item. I think we can open issues for the indexing types, padding, and resize/reshape capabilities as follow-on items and get this PR in after the compiler issue has been merged upstream. What do you think?

ajschmidt8 · 2022-02-14T20:46:50Z

rerun tests

cjnolet · 2022-02-18T18:17:27Z

rerun tests

cjnolet

LGTM!

cjnolet · 2022-02-22T05:40:23Z

@gpucibot merge

trivialfis added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Jan 5, 2022

trivialfis requested review from a team as code owners January 5, 2022 13:32

github-actions bot added CMake cpp labels Jan 5, 2022

trivialfis force-pushed the fea-mdspan branch from fc22a1c to 9dd110d Compare January 5, 2022 13:34

trivialfis added feature request New feature or request non-breaking Non-breaking change labels Jan 6, 2022

cjnolet reviewed Jan 10, 2022

View reviewed changes

cpp/include/raft/linalg/mdarray.h Outdated Show resolved Hide resolved

cjnolet reviewed Jan 10, 2022

View reviewed changes

cpp/include/raft/linalg/mdarray.h Outdated Show resolved Hide resolved

ajschmidt8 approved these changes Jan 10, 2022

View reviewed changes

trivialfis marked this pull request as draft January 11, 2022 04:23

rg20 reviewed Jan 12, 2022

View reviewed changes

cpp/include/raft/linalg/mdarray.h Outdated Show resolved Hide resolved

rg20 reviewed Jan 12, 2022

View reviewed changes

cpp/include/raft/linalg/mdarray.h Outdated Show resolved Hide resolved

seunghwak reviewed Jan 13, 2022

View reviewed changes

cpp/include/raft/linalg/mdarray.h Outdated Show resolved Hide resolved

cpp/include/raft/linalg/mdarray.h Outdated Show resolved Hide resolved

cpp/include/raft/linalg/mdarray.h Outdated Show resolved Hide resolved

trivialfis force-pushed the fea-mdspan branch from 5409a45 to 201ffea Compare January 13, 2022 06:05

jrhemstad reviewed Jan 13, 2022

View reviewed changes

rg20 reviewed Jan 13, 2022

View reviewed changes

cjnolet reviewed Jan 26, 2022

View reviewed changes

trivialfis added 2 commits January 28, 2022 17:32

Optional matrix layout.

07be704

clang format.

e4fdf32

cjnolet reviewed Jan 28, 2022

View reviewed changes

trivialfis added 2 commits January 29, 2022 18:22

Add alias for c/f contiguous.

c28a8b3

scalar & rename.

db9ec8a

cjnolet mentioned this pull request Jan 30, 2022

[FEA] Add simple fractory methods to construct mdspan (and mdarray) objects from pointers and shape/stride information #450

Closed

trivialfis mentioned this pull request Feb 2, 2022

[FEA] Reuse dynamic_extent between span and mdspan. #478

Closed

cjnolet mentioned this pull request Feb 9, 2022

[FEA] All primitives should be hiding implementation details and exposing only public API #330

Closed

This was referenced Feb 9, 2022

[FEA] Padding support for mdspan and mdarray. #497

Open

[FEA] Custom index type of mdspan. #498

Closed

[FEA] Support resize/reshape for mdarray. #499

Open

Update cmake module.

c02fbaf

trivialfis requested review from a team as code owners February 14, 2022 19:28

trivialfis added 2 commits February 15, 2022 03:30

Merge branch 'branch-22.04' into fea-mdspan

daf9955

Add license.

96f3484

Adding copyright to detail/mdarray.hpp

94dc609

cjnolet approved these changes Feb 22, 2022

View reviewed changes

rapids-bot bot merged commit 57703c5 into rapidsai:branch-22.04 Feb 22, 2022

		template <typename... IndexType>
		auto operator()(IndexType&&... indices)

		auto view() const noexcept { return const_view_type(c_.data(), map_, cp_.view()); }

		[[nodiscard]] constexpr auto size() const noexcept -> size_type { return this->view().size(); }

[REVIEW] mdspan integration. #437

[REVIEW] mdspan integration. #437

Conversation

trivialfis commented Jan 5, 2022 • edited Loading

ajschmidt8 commented Jan 10, 2022

ajschmidt8 left a comment

Choose a reason for hiding this comment

trivialfis commented Jan 11, 2022

jrhemstad commented Jan 12, 2022

rg20 Jan 12, 2022 • edited Loading

Choose a reason for hiding this comment

trivialfis Jan 13, 2022 • edited Loading

Choose a reason for hiding this comment

trivialfis commented Jan 13, 2022

trivialfis commented Jan 13, 2022 • edited Loading

trivialfis commented Jan 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rg20 Jan 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rg20 commented Jan 26, 2022 • edited Loading

trivialfis commented Jan 26, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trivialfis Jan 28, 2022 • edited Loading

Choose a reason for hiding this comment

cjnolet Jan 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjnolet commented Jan 26, 2022

Choose a reason for hiding this comment

cjnolet Jan 28, 2022 • edited Loading

Choose a reason for hiding this comment

trivialfis commented Feb 7, 2022

cjnolet commented Feb 7, 2022 • edited Loading

ajschmidt8 commented Feb 14, 2022

cjnolet commented Feb 18, 2022

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet commented Feb 22, 2022

trivialfis commented Jan 5, 2022 •

edited

Loading

rg20 Jan 12, 2022 •

edited

Loading

trivialfis Jan 13, 2022 •

edited

Loading

trivialfis commented Jan 13, 2022 •

edited

Loading

trivialfis commented Jan 13, 2022 •

edited

Loading

rg20 Jan 13, 2022 •

edited

Loading

rg20 commented Jan 26, 2022 •

edited

Loading

trivialfis commented Jan 26, 2022 •

edited

Loading

trivialfis Jan 28, 2022 •

edited

Loading

cjnolet Jan 28, 2022 •

edited

Loading

cjnolet Jan 28, 2022 •

edited

Loading

cjnolet commented Feb 7, 2022 •

edited

Loading