Skip to content

ULFM Fault Tolerance (slice 3: shrink) #583

@abouteiller

Description

@abouteiller

Problem

The monolithic ULFM proposal has been split in morsels so that the MPI Forum can focus on individual topics.

Main topic issue
#20

Proposal

The third topic slice contains the following concepts for communicators:

  • MPI_COMM_SHRINK

Changes to the Text

Addition of an FT chapter containing the proposed constructs

Impact on Implementations

Implementations optionally to implement fault tolerance.
Implementations to add procedures MPI_COMM_SHRINK (implementations that do not support FT can provide stubs that are not fault tolerant based on MPI_COMM_DUP).

Impact on Users

Users can repair communicators as needed to use collective and process respawning after a fault.

References and Pull Requests

https://github.com/mpi-forum/mpi-standard/pull/877

Metadata

Metadata

Assignees

Labels

mpi-nextFor inclusion in the MPI 5.1 or 6.0 standardscheduled readingReading is scheduled for the next meetingwg-ftFault Tolerance Working Group

Type

No type

Projects

Status

In Progress

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions