Merged in abouteiller/ulfm2/feature/README (pull request open-mpi#1)

Feature/README
abouteiller · Aug 28, 2017 · 42a3858 · 42a3858
2 parents 97070fa + 1434c0e
commit 42a3858
Show file tree

Hide file tree

Showing 2 changed files with 311 additions and 1 deletion.
diff --git a/README.ULFM b/README.ULFM
@@ -0,0 +1,310 @@
+Copyright (c) 2012-2017 The University of Tennessee and The University
+                        of Tennessee Research Foundation.  All rights
+                        reserved.
+
+$COPYRIGHT$
+
+Additional copyrights may follow
+
+$HEADER$
+
+===========================================================================
+
+Found a bug?  Got a question?  Want to make a suggestion?  Want to
+contribute to ULFM Open MPI?  Working on a cool use-case?
+Please let us know!
+
+The best way to report bugs, send comments, or ask questions is to
+sign up on the user's mailing list
+
+    <ulfm+subscribe@googlegroups.com>
+
+Because of spam, only subscribers are allowed to post to these lists
+(ensure that you subscribe with and post from exactly the same e-mail
+address -- joe@example.com is considered different than
+joe@mycomputer.example.com!).  Visit these pages to subscribe to the
+lists:
+
+    <https://groups.google.com/forum/#!forum/ulfm>
+
+When submitting questions and problems, be sure to include as much
+extra information as possible.  This web page details all the
+information that we request in order to provide assistance:
+
+    <http://www.open-mpi.org/community/help/>
+
+Thanks for your time.
+
+===========================================================================
+
+Much, much more information (tutorials, examples, build instructions for
+leading top500 systems) is also available in the Fault Tolerance Research
+Hub website:
+
+    <https://fault-tolerance.org>
+
+===========================================================================
+
+If you want to cite a general reference for ULFM, please use:
+
+Wesley Bland, Aurelien Bouteiller, Thomas Hérault, George Bosilca, Jack J.
+Dongarra: Post-failure recovery of MPI communication capability: Design and
+rationale. IJHPCA 27(3): 244-254 (2013)
+http://journals.sagepub.com/doi/10.1177/1094342013488238
+
+===========================================================================
+
+Network Support
+---------------
+
+- There are four main MPI network models available in Open MPI: "ob1", "cm",
+  "yalla", and "ucx". Only "ob1" is adapted to support fault tolerance.
+  "ob1" uses BTL ("Byte Transfer Layer") components for each supported
+  network.
+  - "ob1" supports a variety of networks that can be used in
+    combination with each other:
+    - Loopback (send-to-self)                               (FT supported)
+    - TCP                                                   (FT supported)
+    - OpenFabrics: InfiniBand, iWARP, and RoCE              (FT supported)
+    - uGNI (Cray Gemini, Aries)                             (FT supported)
+    - Shared memory Vader  (FT supported, CMA, XPmem, KNEM modes untested)
+    - Intel Phi SCIF                                         (FT untested)
+    - SMCUDA                                                 (FT untested)
+    - Cisco usNIC                                            (FT untested)
+
+===========================================================================
+
+Building ULFM Open MPI
+----------------------
+
+shell$ ./configure --with-ft [...options...]
+shell$ make [-j N] all install
+    (use an integer value of N for parallel builds)
+
+There are many available configure options (see "./configure --help"
+for a full list); a summary of the more commonly used ones is included
+in the regular Open MPI README file.
+
+Notable differences in ULFM Open MPI behavior regarding configure options
+are the following:
+
+DIFFERENCES WITH OPEN MPI INSTALLATION OPTIONS
+
+--with-ft=TYPE
+  Specify the type of fault tolerance to enable.  Options: mpi (ULFM MPI
+  draft standard), LAM (LAM/MPI-like), cr (Checkpoint/Restart).  Fault
+  tolerance support is enabled by default (as if --with-ft=mpi were
+  implicitly present on the configure line).
+  You may specify `--without-ft` to compile an almost stock Open MPI.
+
+--with-platform=FILE
+  Load configure options for the build from FILE.  When
+  --with-ft=mpi is set, the file `contrib/platform/ft_mpi_ulfm` is
+  loaded by default. This file disables components that are known to
+  not be able to sustain failures, or are insuficiently tested.
+  You may edit this file and/or force back these options on the
+  command line to enable these components.
+
+--enable-mca-no-build=LIST
+  Comma-separated list of <type>-<component> pairs that will not be
+  built. For example, "--enable-mca-no-build=btl-portals,oob-ud" will
+  disable building the portals BTL and the ud OOB component. When
+  --with-ft=mpi is set, this list is populated with the content of
+  the aforementionned platform file. You may overide the default list
+  with this parameter.
+
+--with-pmi
+--with-slurm
+  Force the building of SLURM scheduler support.
+  Slurm with fault tolerance is tested. Use `mpirun` in an
+  `salloc/sbatch`. Do not use `srun`, then your application would be
+  killed by the scheduler upon the first failure.
+
+--with-sge
+  This is untested with fault tolerance.
+
+--with-tm=<directory>
+  Force the building of PBS/Torque scheduler support.
+  PBS is tested with fault tolerance. Use `mpirun` in a `qsub`
+  allocation.
+
+--disable-mpi-thread-multiple
+  Disable the MPI thread level MPI_THREAD_MULTIPLE (it is enabled by
+  default).
+  Multiple threads with fault tolerance is lightly tested.
+
+--disable-oshmem
+  Disable building the OpenSHMEM implementation (by default, it is
+  enabled).
+  ULFM Fault Tolerance does not apply to OpenSHMEM.
+
+===========================================================================
+
+ULFM Open MPI Version Numbers and Binary Compatibility
+------------------------------------------------------
+
+Starting from ULFM Open MPI version 2.0, ULFM Open MPI is binary compatible
+with the corresponding Open MPI master branch and compatible releases (see
+the binary compatibility and version number section in the Open MPI README).
+That is, applications compiled with a compatible Open MPI can run with the
+ULFM Open MPI `mpirun` and mpi libraries. Conversely, _as long as the
+application does not employ one of the MPIX functions_, which are
+exclusively defined in ULFM Open MPI, an application compiled with
+ULFM Open MPI can be launched with a compatible Open MPI `mpirun` and run
+with the non-fault tolerant mpi library.
+
+
+===========================================================================
+
+The following frameworks/components are UNTESTED. They should work,
+but use at your own risk with FT.
+    btl-usnic, btl-portals4, btl-scif, btl-smcuda, pml-monitoring,
+    pml-v, vprotocol, crcp
+The following frameworks/components are UNTESTED, and probably
+won't work. You may try.
+    coll-cuda, coll-fca, coll-hcoll, coll-portals4
+The following frameworks/components are NOT WORKING. Do not enable
+these with --with-ft=mpi.
+    mtl, pml-bfo, pml-cm, pml-crcpw, pml-yalla, pml-ucx
+
+Frameworks which are not listed in the following list are unmodified and
+support fault tolerance. Listed frameworks are modified (and work after
+a failure), disabled, or untested (they work before a failure, but may
+malfunction after a failure).
+
+
+Frameworks modified in ULFM Open MPI:
+-------------------------------------
+
+coll      - MPI collective algorithms
+              "tuned", "basic", modified to handle errors
+              "fca", "hcoll", "ml", "portals4" disabled, untested
+fbtl      - file byte transfer layer: abstraction for individual
+            read/write operations for OMPIO
+               Unmodified, untested
+fcoll     - collective read and write operations for MPI I/O
+               Unmodified, untested
+fs        - file system functions for MPI I/O
+               Unmodified, untested
+io        - MPI I/O
+               Unmodified, not fault tolerant (post failure abort)
+mtl       - Matching transport layer, used for MPI point-to-point
+            messages on some types of networks
+               Disabled, not fault tolerant
+osc       - MPI one-sided communications
+               Unmodified, not fault tolerant (post failure deadlock)
+pml       - MPI point-to-point management layer
+               "ob1" modified to handle errors (other components disabled)
+sharedfp  - shared file pointer operations for MPI I/O
+               Unmodified, untested
+vprotocol - Protocols for the "v" PML
+               Disabled, untested
+
+Back-end run-time environment (RTE) component frameworks:
+---------------------------------------------------------
+
+All components unmodified.
+
+Miscellaneous frameworks:
+-------------------------
+
+btl         - Point-to-point Byte Transfer Layer
+                Supported BTLs modified to remove unconditional abort on error.
+threads/wait_sync
+                Added a global interrupt for wait_sync objects
+
+
+===========================================================================
+
+Changelog
+---------
+
+### Release 2.0
+
+Focus has been toward integration with current Open MPI master,
+performance, and stability.
+
+- ULFM is now based upon Open MPI master branch (#xxyyzz).
+- Fault Tolerance is enabled by default and is controlled with mca variables.
+- Added support for multithreaded modes (MPI_THREAD_MULTIPLE, etc.)
+- Added support for non-blocking collective operations (NBC).
+- Added support for CMA shared memory transport (Vader).
+- Added support for advanced failure detection at the MPI level.
+  Implements the algorithm described in "Failure detection and
+  propagation in HPC systems." <https://doi.org/10.1109/SC.2016.26>.
+- Removed the need for special handling of CID allocation.
+- Non-usable components are automatically removed from the build during configure
+- RMA, FILES, and TOPO components are enabled by default, and usage in a fault
+  tolerant execution warns that they may cause undefined behavior after a failure.
+- Bugfixes:
+    - Code cleanup and performance cleanup in non-FT builds; --without-ft at
+      configure time gives an almost stock Open MPI.
+    - Code cleanup and performance cleanup in FT builds with FT runtime disabled;
+      --mca ft_enable_mpi false thoroughly disables FT runtime activities.
+    - Some error cases would return ERR_PENDING instead of ERR_PROC_FAILED in
+      collective operations.
+    - Some test could set ERR_PENDING or ERR_PROC_FAILED instead of
+      ERR_PROC_FAILED_PENDING for ANY_SOURCE receptions.
+
+KNOWN LIMITATIONS:
+
+- ORTE daemon failures may cause full application abort in some instances.
+- ORTE daemon may stall after application process have finalized in
+  post-failure executions.
+- TOPO, FILE, RMA are not fault tolerant.
+- There is a tradeoff between failure detection accuracy and performance.
+  Maximum accuracy requires MPI_THREAD_MULTIPLE, which has an incidence on
+  non-thread aware MPI applications' latency. The current default is to
+  favor application performance at the expense of detection accuracy.
+  End-users can control this tradeoff by setting the following mca
+  parameters
+      - mpi_ft_detector_period (default 1e-1 (s))
+      - mpi_ft_detector_timeout (default 3e-1 (s))
+      - mpi_ft_detector_thread (default false)
+- The failure detector operates on MPI_COMM_WORLD exclusively. Processes
+  connected from MPI_COMM_CONNECT/ACCEPT and MPI_COMM_SPAWN may
+  occasionally not be detected when they fail.
+- Failures during some NBC collective may not be recovered properly.
+
+
+### Release 1.1
+
+Focus has been toward improving stability, feature coverage for intercomms, and following
+the updated specification for MPI_ERR_PROC_FAILED_PENDING.
+
+- Forked from Open MPI 1.5.5 devel branch
+- Addition of the MPI_ERR_PROC_FAILED_PENDING error code, as per newer specification revision. Properly returned from point-to-point, non-blocking ANY_SOURCE operations.
+- Alias MPI_ERR_PROC_FAILED, MPI_ERR_PROC_FAILED_PENDING and MPI_ERR_REVOKED to the corresponding standard blessed -extension- names MPIX_ERR_xxx.
+- Support for Intercommunicators:
+    - Support for the blocking version of the agreement, MPI_COMM_AGREE on Intercommunicators.
+    - MPI_COMM_REVOKE tested on intercommunicators.
+- Disabled completely (.ompi_ignore) many untested components.
+- Changed the default ORTE failure notification propagation aggregation delay from 1s to 25ms.
+- Added an OMPI internal failure propagator; failure propagation between SM domains is now immediate.
+- Bugfixes:
+    - SendRecv would not always report MPI_ERR_PROC_FAILED correctly.
+    - SendRecv could incorrectly update the status with errors pertaining to the Send portion of the Sendrecv.
+    - Revoked send operations are now always completed or remote cancelled and may not deadlock anymore.
+    - Cancelled send operations to a dead peer will not trigger an assert when the BTL reports that same failure.
+    - Repeat calls to operations returning MPI_ERR_PROC_FAILED will eventually return MPI_ERR_REVOKED when another process revokes the communicator.
+
+### Release 1.0
+
+Focus has been toward improving performance, both before and after the occurence of failures. The list of new features includes:
+- Support for the non-blocking version of the agreement, MPI_COMM_IAGREE.
+- Compliance with the latest ULFM specification draft. In particular, the MPI_COMM_(I)AGREE semantic has changed.
+- New algorithm to perform agreements, with a truly logarithmic complexity in number of ranks, which translates into huge performance boosts in MPI_COMM_(I)AGREE and MPI_COMM_SHRINK.- New algorithm to perform communicator revocation. MPI_COMM_REVOKE performs a reliable broadcast with a fixed maximum output degree, which scales logarithmically with the number of ranks.
+- Improved support for our traditional network layer:
+    - TCP: fully tested
+    - SM: fully tested (with the exception of XPMEM, which remains unsupported)
+- Added support for High Performance networks
+    - Open IB: reasonably tested
+    - uGNI: reasonably tested
+- The tuned collective module is now enabled by default (reasonably tested), expect a huge performance boost compared to the former basic default setting<
+    - Back-ported PBS/ALPS fixes from Open MPI
+    - Back-ported OpenIB bug/performance fixes from Open MPI
+    - Improve Context ID allocation algorithm to reduce overheads of Shrink
+    - Miscellaneous bug fixes
+
+
+
diff --git a/VERSION b/VERSION
@@ -26,7 +26,7 @@ release=0
 # requirement is that it must be entirely printable ASCII characters
 # and have no white space.
 
-greek=a1
+greek=ft-ulfm-a1
 
 # If repo_rev is empty, then the repository version number will be
 # obtained during "make dist" via the "git describe --tags --always"