-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Problem
This ticket introduces MPI_Pbuf_prepare, a call for guaranteeing that remote buffers are available before calling MPI_Pready for partitioned communication. This is important for optimizations to the MPI_Pready call that can be implemented on accelerators like GPUs/FPGAs.
Proposal
Introduce MPI_Pbuf_prepare and MPI_Pbuf_prepareall that provide remote buffer readiness guarantees from MPI. This enables GPU/Accelerator side MPI implementation of MPI_Pready with a single code path that is ideal for those architectures. MPI_Psync allows the MPI library to utilize accelerator triggered communications that are set up on the host CPU efficiently for kernel triggered communication. By avoiding buffer management and branching code paths, MPI_Pready and MPI_Parrived can be implemented using fast instructions on data flow centric architectures.
The proposed operation flow is as follows:

Changes to the Text
This ticket adds two calls to the partitioned communication chapter, MPI_Pbuf_prepare and MPI_Pbuf_prepareall
Impact on Implementations
Implementations will have to add support for these calls. This involves implementing a CTS-RTS type handshake for synchronization.
Impact on Users
Users will have a new mechanism to help writing code on accelerators for MPI_Pready that provides consistently optimized performance.
References
Pull request Synchronization on Partitioned Communication for Accelerator Optimization
Semantics table pull request
Please see only changes for this ticket to avoid pending partitioned communication merges.
Metadata
Metadata
Labels
Type
Projects
Status