Releases: pmodels/mpich
v4.2.0rc1
v4.2.0b1
v4.1.2
Changes in 4.1.2
-
Update UCX module to includes fixes for building with GCC 13
-
Update libfabric module to 1.18.0 with additional fixes for building
with recent versions of LLVM/Clang -
Fix compiler wrapper scripts to be compatible with CUDA memory hooks
-
Fix MPIX_WAITALL_ENQUEUE to make a copy of the input request array
-
Fix bug in MPI_ALLREDUCE that could result in ranks receiving
different floating point values -
Fix potential deadlock when progressing RMA windows
-
Fix potential crash in MPI_REDUCE with non-zero root and MPI_IN_PLACE
-
Fix potential crash during probe with libfabric CXI provider
-
Fix MPI_PARRIVED when the partitioned request is inactive
-
Fix potential bug when an attribute delete callback deletes another
attribute on the same object -
Fix build issue in ROMIO Lustre driver
-
Improve Fortran 2008 binding support detection during configure
-
Report an error if collective tuning json file fails to open
-
Several fixes for testsuite programs and build configuration
v4.1.1
Changes in 4.1.1
-
Update embedded UCX module to 1.13.1. Fixes a build issue with
binutils >= 2.39. -
Update yaksa module. Support explicit NVCC setting by the user. Fixes
a build issue when there is no libtool available in PATH. -
Fix ch4:ucx initialization when configured with
--enable-ch4-vci-method=implicit. -
Fix potential error handler leak during MPI_SESSION_FINALIZE
-
Fix value of MPI_UNDEFINED in mpif.h binding
-
Fix MPI_IALLTOALLW with MPI_IN_PLACE
-
Fix send attribute handling in IPC path
-
Fix a bug in persistent MPI_ALLGATHER
-
Fix tests for use with non-MPICH libraries
-
Add missing MPI_T_ERR_NOT_ACCESSIBLE error code
-
Fix manpages for MPIX functions
v4.1
Changes in 4.1
-
Thread-cs in ch4 changed to per-vci.
-
Testsuite (test/mpi) is configured separately from mpich configure.
-
Added options in autogen to accelerate CI builds, including using pre-built
sub-modules. Added -yaksa-depth option to generate shallower yaksa pup code
for faster build and smaller binaries. -
Support singleton init using hydra.
-
On OSX, link option flat_namespace is no longer turned on by default.
-
Generate mpi.mod Fortran interfaces using Python 3. For many compilers,
including gfortran, flags such as -fallow-mismatched-args is no longer
necessary. -
Fixed message queue debugger interface in ch4.
-
PMI (src/pmi) is refactored as a subdir and can be separately distributed.
-
Added MPIX_Comm_get_failed.
-
Experimental MPIX stream API to enable explicit thread contexts.
-
Experimental MPIX gpu enqueue API. It currently only supports CUDA streams.
-
Delays GPU resource allocation in yaksa.
-
CH3 nemesis ofi netmod is removed.
-
New collective algorithms. All collective algorithms are listed in
src/mpi/coll/coll_algorithms.txt -
Removed hydra2. We will port unique features of hydra2, including
tree-launching, to hydra in the future release. -
Added in-repository wiki documentation.
-
Added stream workq to support optimizations for enqueue operations.
-
Better support for large count APIs by eliminating type conversion issues.
-
Hydra now uses libpmi (src/pmi) for handling PMI messages.
-
Many bug fixes and enhancements.
v4.0.3
Changes in 4.0.3
-
Fix message queue dumping interface support
-
Fix multinic usage in ch4:ofi
-
Fix bug in MPI_WIN_CREATE in ch4:ucx when UCX >= 1.13.0
-
Fix MPIR_pmi_barrier when PMI2 is used
-
Fix ROMIO lazy mutex initialization
-
Fix build with HIP support
-
Fix potential dynamic process message mixups in ch3
-
Add missing const to MPI_Pready_list array_of_partitions argument
-
Add support for C++ datatypes even when the C++ binding is disabled
-
Add support for Intel OneAPI compilers
v4.0.2
Changes in 4.0.2
-
Fix CUDA configuration logic in yaksa
-
Fix support for dynamic process functionality with PMI2 clients
-
Fix non-zero appnum bug in PMI2 server in Hydra
-
Fix MPI_Op support for types created with MPI_Type_create_f90_xxx
-
Fix building ch4 with Intel compilers on macOS
-
Fix Level Zero properties initialization in MPL. Thanks to Brice
Videau for the report and patch. -
Use standard names for CPU affinity functions with POSIX
threads. Fixes building against Musl libc. Thanks to Mosè Giordano for
the report and patch. -
Add elemental to eq/neq operators in Fortran 2008 binding
-
Workaround for inter-process mutex bug on FreeBSD
v4.0.1
Changes in 4.0.1
-
Multiple fixes for NVIDIA/PGI HPC Compilers support
-
Fix ch4:ofi:gni provider capability set
-
Fix MPI_SESSION_INIT "thread_level" info hint
-
Fix build on macOS with --disable-shared
-
Fix QMPI function definitions
-
Fix support for "host" info hint in MPI_COMM_SPAWN[_MULTIPLE]
-
Fix manpage generation
-
Add missing MPI_F_sync_reg function
-
Add missing const to MPI_Psend_init buffer argument
-
Make Python 3 optional in configure script
-
Remove -Wl,flat_namespace from compile wrappers by default (macOS only)
-
Update UCX module to v1.12.0
-
Update yaksa module to support latest Ampere compute capability
v4.0
Changes in 4.0
-
All MPI-4 APIs have been implemented. Major MPI-4 features include MPI
sessions, partitioned point-to-point communications, events in the MPI tool
information interface, large-count functions, persistent collectives,
MPI_Comm_idup_with_info, MPI_Isendrecv and MPI_Isendrecv_replace,
MPI_Info_get_string, MPI_Comm_split_type with new split_type --
MPI_COMM_TYPE_HW_GUIDED and MPI_COMM_TYPE_HW_UNGUIDED. -
Add QMPI (experimental) support.
-
Add MPIX_Delete_error_{class,code,string}.
-
MPI_Info objects can be accessed before MPI_Init{_thread}.
-
Generate C API interface functions including man page notes and error
checking using Python scripts. -
Generate Fortran (mpif.h, mpi_f08) bindings using Python scripts.
-
Generate collective entrance functions and generate per-algorithm tests.
-
Support explicit --without-cuda configure option.
-
Drop support for UCX version < 1.7.0.
-
Configure now optionally require Python 3 (when F08 is enabled).
-
Multi-NIC support in ch4:ofi.
-
Default to ch4:ofi when configure doesn't have a clear choice. Add message
block at the end of configure to advise user. -
Multiple VCI is fully implemented including the active message fallback paths.
-
Extend IPC to support non-contig datatypes.
-
Add AMD GPU support using HIP.
-
Add generic RNDV callback mechanism with active messages.
-
Refactor ch4 dynamic process functions.
-
Avoid building MPL and hwloc multiple times.
-
Fix MPIX_Query_cuda_support.
-
Many bug fixes and code clean-ups.