-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Problem
The monolithic ULFM proposal has been split in morsels so that the MPI Forum can focus on individual topics.
Main topic issue
#20
Proposal
The second topic slice contains the following concepts for communicators:
- MPI_COMM_AGREE
Changes to the Text
Addition of an FT chapter containing the proposed constructs
Read text (Sept'23) https://github.com/mpi-forum/mpi-standard/pull/715/commits/9e81233953a280f867eb48fbe890f5108a5ed9af
no-no reading (diff from Sept'23) https://github.com/mpi-forum/mpi-standard/pull/715/commits/58283a760a35934c0331744f8245e552644d252a
Impact on Implementations
Implementations optionally to implement fault tolerance.
Implementations to add procedures MPI_COMM_AGREE (implementations that do not support FT can provide stubs that are not fault tolerant based on MPI_ALLREDUCE).
Impact on Users
Users can react to fault events, validate progress in collective phases, and synchronize knowledge of failures across ranks. (slice 3 will add features for repairing communicators as needed to use collective and process respawning after a fault).
References and Pull Requests
https://github.com/mpi-forum/mpi-standard/pull/715
Metadata
Metadata
Assignees
Labels
Type
Projects
Status