RFC: Advanced global memory access patterns

All, 

I have seen the issue of strided global memory access popping up from time to time and @dhinf has started implementing something along those lines for his `HaloMatrix` implementation (for the DART part, see https://github.com/dash-project/dash/blob/feat-halo/dart-impl/mpi/src/dart_communication.c#L490). In general, both DASH and DART currently only support copying to and from contiguous memory regions so strided access has to be done in terms of individual operations for each element/block, which is less than ideal. To give us and users more flexibility, I think we should agree on an interface for handling advanced (strided, indexed?) memory access patterns. I currently see three options here:

1) Introduce additional DART functions for strided/indexed access, e.g., `dart_get_strided`, `dart_get_strided_handle`, etc. For DART-MPI, these functions would internally create and destroy the corresponding MPI data types on the fly. This will lead to an explosion in the number of DART abstractions that will be difficult to maintain and is inflexible when it comes to future extensions.
2) Extend existing DART functions with parameters for strides/indexes. We could have DASH abstractions wrapping these function to provide sensible defaults, i.e., to maintain a version that resembles the current interface. However, this is still inflexible when it comes to future extensions beyond strided access.
3) Introduce derived types in DART. These types cache the access patterns and can be created and destroyed at any time. On the pro side, ~we could keep the current DART `put`/`get` interface as it is~ the current DART `put`/`get` would only be extended by a second type to allow strided-to-contiguous transfers and vice versa. However, we need to beef up the type handling in DART, i.e., go from `enum` to opaque pointers, and be careful not to bind ourselves too much to the way MPI does things. At least for strided and indexed access though, the patterns could be easily implemented manually for a backend that does not support derived datatypes and for the shared memory window optimization. The type management interface could look like the following:
```
dart_ret_t dart_type_create_strided(basetype, stride, blocklen, newtype);
dart_ret_t dart_type_create_indexed(basetype, blocklens[], offsets[], newtype);
dart_ret_t dart_type_destroy(type);
```
Type creation/destruction in MPI seems to be reasonably fast to do it on the fly (~0.3us for `MPI_Type_vector` with Intel/Open MPI/Cray) so we do not have to expose type management to DASH users but instead could offer an extended version of `dash::copy` with stride defintions (not sure whether `indexed` is relevant for users). At the same time, DASH data structures and operations relying on strided access can create a type and cache it as long as it is required.

I am strongly leaning towards solution 3 as it seems to be the cleanest and most extensible but I am open to discussions and any other proposals. Please let me know what you think.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Advanced global memory access patterns #436

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC: Advanced global memory access patterns #436

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions