Replies: 3 comments
-
We could have one discussion track for every one of your bullets!
This point alone is not trivial to solve. We cannot know what's happening with the receiver, so dealing with layouts will be difficult, especially for non-SPMD codes.
All your concerns are genuine, and we must soon clarify what relationship we want to deal with between execution space instances and MPI communicators. |
Beta Was this translation helpful? Give feedback.
-
We did experiment with a merged If the user wants to advance program execution conditioned on a operation request (aka the handle from an async MPI call), he will likely need to either poll the |
Beta Was this translation helpful? Give feedback.
-
Initial thoughts
Definitely want support for this - I see it as a main selling point for the initial MPI wrapper.
I think we should expose some kind of trait to allow users to customize handling of non-contiguous data. We would also use the system internally, and provide a few implementations: There is a very bare-bones version of this implemented right now, one of which uses an intermediate view + deep_copy, and another which constructs an MPI datatype. It would be cool to make the trait interface expressive enough to support send-by-chunk too. This is also the extension point for folks who want to do research with special hardware or software - they can implement a custom specialization of the trait. Maybe we could allow the default to be overridden globally somehow.
send from
I think we should support this. The default would be
This may be "backend" specific, I don't know how we can detect at compile time that a send and recv are incompatible though. We could error out at run time or have a fallback path.
I think we can strive to support this. May need to be integrated with non-contiguous handling. I'm unfamiliar with the consequences of accessor stuff at the moment.
I think we should keep allocations alive (lifetime errors for our users can be hard to debug). Currently type-erased copies of views are stashed in the
I envisioned the semantics as the operation is ordered w.r.t the provided instance. However, fencing that instance does not guarantee that the communication finished Kokkos::Cuda space();
auto Req = irecv(space, ...); // calls MPI_Irecv, but doesn't actually use the space
space.fence(); // may or may not have some effect on communication
Req.wait() // implicitly use (and implicitly fence) space Maybe our APIs should only accept a space argument at the point where something is (possibly) inserted into the space. Kokkos::Cuda space();
auto Req = irecv(...); // no space work done since space is not an argument
space.fence(); // obviously unrelated to communication
Req.wait(space) // explicitly use (and implicitly fence) space |
Beta Was this translation helpful? Give feedback.
-
Some thoughts I shared with Jan on the question of what the limitations of the interface for now should be, or at a minimum which order we should prioritize things. This is just some random things which came to mind:
Beta Was this translation helpful? Give feedback.
All reactions