non-contiguous data-handling strategy #64
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Removes
KokkosComm_pack_traits.hpp
and replaces it with a more comprehensive non-contiguous data-handling interface.Warning
under construction
Outline
A
KokkosComm::NonContigXXX{SendRecv, Reduce, Alltoall}
is responsible for convertingView
arguments to buffer/count/datatype tuples for the underlying MPI calls....SendRecv
hands send and recv-like calls....Reduce
handles reduce-like calls....Alltoall
handles all-to-all like calls.I found it very challenging to collapse these down to a single interface because the underlying MPI operations have a variety of different parameters with different meanings; however, in the final evaluation some refactoring to reduce the size of the interface may be possible.
On the plus side, if a vendor library only implements a subset of our communication interface, we would only need to implement a subset of the non-contiguous handing interface as well.
The basic outline of every MPI implementation now looks like this (using
recv
as an example)Line (2) looks at the incoming view and does something specific to the
Kokkos::deep_copy
non-contiguous approach to decide what intermediate data to allocate, how to convert that into MPI arguments, etc.The result of that is stashed in a
CtxBufCount
object, so named because it is used for any MPI calls that take a buffer and a count.Line (3): fence, because
space
may be allocating our incoming dataLine (4-6): In this case, after packing, a single MPI_Recv call is make, but in general, that is not a requirement of the approach. So we consume all generated arguments and make the appropriate calls.
Line (7): maybe some work needs to be done on the receive side as well. No fence here, any required operations are inserted into
space
.NC
in this example is a struct that implements thispre_recv
andpost_recv
interface. Other structs with the same interface can be used to implement different non-contiguous data handling strategies.