Notes on Conference Call on Dual Serial MPI builds

Date

September 14th, 2020

Participants

(in no particular order)

Mikael Ohman
Davide Vanzo
Sam Moors
Bennet Fauber
Maxime Boissonneault
Kenneth Hoste
Bart Oldeman
Jorg Sabmannshausen
Alex Domingo

Notes

Description of the scope and goals of the project. No questions raised.

Point about the packages that should have dual serial/MPI builds:

HDF5 already done
Boost is done waiting for some changes to its easyblock. However there are no issues with the serial build of Boost.
Maxime and Bart suggest that next in line should be FFTW and VTK

Point about packages that can be moved to lower toolchains by depending on serial packages

Finding packages that sit in gompi/iimpi because of a dependency in Boost or HDF5 are easy to spot
It will be more complicated with those packages sitting in foss/intel, because then we need to verify that those indeed cannot use MPI and BLAS/LAPACK
We should start by checking what has already been done in Compute Canada
Reviewers of easyconfigs in EB usually check if the package can be moved to lower toolchains, so the number of misplaced packages in foss/intel should not be large
We could also consider that certain software supporting MPI can be build without it if it is not useful, as the dependencies will be automatically disabled

Point on name scheme

Using different package names is needed if the serial and MPI builds of a given package should be loaded at the same time
The requirements on naming scheme might be different depending on the module system (traditional vs hierarchical)
In Compute Canada serial/MPI modules use different names, but their modules cannot be loaded at the same time (controlled through Lmod)
If both modules can be loaded at the same time, then we might face a situation where a given package supporting MPI will load (for instance) Boost-MPI, but at the same time that package might depend on a non-MPI library that loads Boost-serial. Then what happens? Will there be conflicts?
- Relying in path precedence in LD_LIBRARY_PATH can lead to trouble, if the wrong library is loaded it can have missing/incompatible symbols
- Order also matters for the lookup of symbols in RPATH
- Boost is a good case because libraries are already well separated, file names of shared objects are different
- Serial packages should have no issues if they load the libraries of the MPI variant, as those libraries might just have additional symbols
- We should look at how CRAY handles serial and MPI packages, they already have such a split
- There is the option to rename symbols to avoid collision between the serial and MPI libraries, but that adds a whole new level of complexity
- It is not clear if such issues will indeed arise. If that is the case the easy solution is to also add dual builds of those libraries needed by packages with MPI support, to ensure that all loaded dependencies use MPI
Having X.serial and X.MPI builds that can be loaded at the same time will be helpful for complex workflows that combine MPI and non-MPI packages, otherwise we might end up in a full split of the tree. This might be specially important with bio workflows.
Users might want to load X.serial and X.MPI at the same time mostly for convenience, not because it is really required
The non-MPI and MPI libraries cannot be hot swapped as these can lead to more unforeseen problems. The library stack should be consistent top to bottom.
There is a risk of walking towards a split of the whole tree between serial and MPI builds. For instance, a large package such as TensorFlow might have a few components using MPI, but many of the packages depending on TensorFlow cannot use any of the MPI features. Will this require a split of TensorFlow.serial and TensorFlow.MPI? It's difficult to foresee the extent of it.
One of the core goals is to provide users with an MPI-free environment if MPI is not needed

Point on module visibility

The modules of serial builds could be made hidden as these modules are only intended to be used as dependencies
Both modules are visible in Compute Canada, this allows users developing code to not load MPI if it is not needed
As long as the description of modules is clear enough, it should not be a problem to have both modules visible.

Point on toolchain of the serial builds

Performance of dual libraries might be critical, but the cases considered so far will perform equally well in GCC based toolchains than in ICC based toolchains
Using GCCcore for the serial builds could be very useful to reduce the number of new easyconfigs and provide dependencies at even lower levels
Using GCCcore might cause trouble if its MPI counterpart uses Intel compilers. In such a case loading both modules at the same time will probably raise linking issues.

Concluding remarks

It is desirable that the serial and MPI modules of a given package can be loaded at the same time
- Pro: Minimize the amount of splitting in the easyconfig tree. Only the dual builds of some selected libraries will be needed.
- Pro: Users will gain a lot more flexibility to mix non-MPI and MPI modules
- Con: Loading both modules might cause linking issues with dependent packages. However, since we are choosing which libraries will be made dual, this can be tailored to our needs and problems will be analyzed on case per case basis.
Packages will have a suffix for serial builds and another suffix for MPI builds. Tentative names X.serial and X.MPI.
- Pro: Having explicit suffixes avoids wrong assumptions from the users. It is important that the chosen suffixes are as self-explanatory as possible.
- Con: The base name of the package looses its purpose. The default package name could be used to load either the serial or the MPI based on the state of the environment.
Both the serial and MPI modules will be visible by default. Making any of those modules invisible should be decided by each site.
The toolchain of the serial builds will be decided on a per package basis.
- Pro: in cases where GCCcore is suitable it will be used to provided a lower level dependency

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes on Conference Call on Dual Serial MPI builds

Date

Participants

Notes

Point about the packages that should have dual serial/MPI builds:

Point about packages that can be moved to lower toolchains by depending on serial packages

Point on name scheme

Point on module visibility

Point on toolchain of the serial builds

Concluding remarks

Clone this wiki locally