Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add segmented array type #1545

Merged
merged 3 commits into from
May 9, 2024
Merged

Add segmented array type #1545

merged 3 commits into from
May 9, 2024

Conversation

MarcelKoch
Copy link
Member

@MarcelKoch MarcelKoch commented Feb 14, 2024

This PR adds a class to store a segmented array. A flat ginkgo array is partitioned into multiple segments by an additional index offset array. The segment i starts within the flat buffer at index offsets[i], and ends (exclusively) at index offsets[i + 1]. The class only provides access to the flat buffer and the offsets.

Used in: #1543

PR Stack:


Old Description

This PR adds a type collection::array which is a std::vector like class that stores multiple arrays. These arrays are all subviews of a shared buffer.

I think technically this could be considered as a batch::array, but I'm not sure if we want to go that way. As a batch::array this would interleave the dependencies between the batched and non-batched stuff, and I'm not sure if that is a good idea.

@MarcelKoch MarcelKoch self-assigned this Feb 14, 2024
@ginkgo-bot ginkgo-bot added the mod:core This is related to the core module. label Feb 14, 2024
@upsj
Copy link
Member

upsj commented Feb 14, 2024

Do you actually need these individual sub-ranges to be materialized as array views? Because otherwise this is just the typical row_ptr - col_idxs structure, and we might be better served with a fully functional std::span replacement and a range-of-ranges interface

@MarcelKoch
Copy link
Member Author

At least for CPU kernels the arrays are quite handy. But you're right, something like std::span would be better suited.

@upsj
Copy link
Member

upsj commented Feb 14, 2024

I've had it in my mind to provide a nice abstraction over these segmented arrays (including range-for support) for a while already, if you want I can resurrect my prototype code

@MarcelKoch
Copy link
Member Author

Sure, if it's not too much work, we could use that. I can also take that over if you want.

Copy link

sonarcloud bot commented Feb 16, 2024

@MarcelKoch
Copy link
Member Author

MarcelKoch commented Feb 16, 2024

@upsj maybe this could be formulated using ranges and accessors. Accessing the segmented array could be thought of as [i][j], where i is the array index, and j the index within the array. But I don't know enough of ranges and accessor to judge if that would be appropriate.

I don't think that is a good abstraction, since the size per dimension isn't constant.

Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a useful construct, I would try to make it more generic in some contexts (index type) and more restricted in others (host-accessibility). I would imagine a device adapter based on what I implemented as segmented_range in #1582 might be useful here.

Also we would need some tests and documentation before we can merge this

* \tparam T value type stored in the arrays
*/
template <typename T>
struct array {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would call this a segmented array, since the corresponding operations are referred to a segmented sort/reduction/... in papers and the cub library. That would also avoid the need for a separate namespace

Comment on lines 92 to 95
for (size_type i = 0; i < sizes.size(); ++i) {
elems_[i] = make_array_view(
exec, sizes[i], buffer_.get_data() + offsets_.get_data()[i]);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this design might encourage poor performance - it invites using a kernel invocation for each MPI rank, which might scale poorly. Instead I would try to do this with a single kernel for all ranks, and a corresponding device view.


private:
gko::array<T> buffer_;
gko::array<size_type> offsets_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering #430, I would prefer if we used int64 or even a generic integer type.

@MarcelKoch MarcelKoch added this to the Ginkgo 1.8.0 milestone Apr 5, 2024
Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Tobias that calling it segmented_array would be better. Otherwise, I dont have any other concerns.

include/ginkgo/core/base/collection.hpp Outdated Show resolved Hide resolved
Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mostly, does it need to be part of the public interface, or can we put it in detail first? I believe it might change significantly with segmented ranges.

core/base/segmented_array.cpp Outdated Show resolved Hide resolved
core/base/segmented_array.cpp Outdated Show resolved Hide resolved
core/base/segmented_array.cpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/segmented_array.hpp Outdated Show resolved Hide resolved
@MarcelKoch
Copy link
Member Author

@upsj I don't think this will change with the segmented ranges at all. The ranges are only relevant on the backend side. The index_map has this type as constructor parameter, and accessors to it, so I would not put it into the detail namespace.

@MarcelKoch MarcelKoch added the 1:ST:ready-for-review This PR is ready for review label May 3, 2024
include/ginkgo/core/base/segmented_array.hpp Show resolved Hide resolved
include/ginkgo/core/base/segmented_array.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/segmented_array.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/segmented_array.hpp Outdated Show resolved Hide resolved
@pratikvn
Copy link
Member

pratikvn commented May 3, 2024

Maybe also update the PR name ?

@MarcelKoch MarcelKoch changed the title Add array collection type Add segmented array type May 3, 2024
@MarcelKoch MarcelKoch requested a review from pratikvn May 3, 2024 14:28
Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. Just one question regarding the _const_x naming for const getters.

include/ginkgo/core/base/segmented_array.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/segmented_array.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/segmented_array.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/segmented_array.hpp Outdated Show resolved Hide resolved
@MarcelKoch MarcelKoch force-pushed the array-collection branch 3 times, most recently from e20d10c to cf07464 Compare May 6, 2024 16:38
@MarcelKoch MarcelKoch requested a review from pratikvn May 6, 2024 16:39
template <typename T>
T* segmented_array<T>::get_flat_data()
{
return buffer_.get_data();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move this directly within the class, right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can actually move them into the .cpp. I did that, but I won't wait for the full CI pipeline to merge it.

MarcelKoch and others added 2 commits May 8, 2024 20:53
Co-authored-by: Pratik Nayak <pratik.nayak@kit.edu>
@MarcelKoch MarcelKoch added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review 1:ST:run-full-test labels May 8, 2024
@MarcelKoch MarcelKoch merged commit 2aca32d into develop May 9, 2024
10 of 15 checks passed
@MarcelKoch MarcelKoch deleted the array-collection branch May 9, 2024 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. mod:core This is related to the core module.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants