-
Notifications
You must be signed in to change notification settings - Fork 53
Cray XC Build Instructions
The OFI build is the easiest and most portable way to build SOS for Cray XC systems. The libfabric repository is located here:
Libfabric may be setup to use the GNI provider via the following configuration (as of libfabric v1.6.1):
$ ./configure --prefix=<libfabric_install_dir> --enable-gni --disable-verbs --disable-sockets --disable-udp --disable-psm --disable-tcp --disable-efa
where <libfabric_install_dir>
is an appropriate installation path.
On the NERSC Cori system, assure you have loaded the Cray-provided PrgEnv-gnu
module. The GNI provider requires C11 atomics, but the Intel compilers (at least up to 18.0.1) have limited support for the atomic integer types.
Use the following configure options to build SOS to use the libfabric GNI provider, with Cray XPMEM and Cray PMI:
$ ./autogen.sh
$ ./configure --prefix=<SOS_install_dir> --with-ofi=<libfabric_install_dir> --with-xpmem=/opt/cray/xpmem/default --with-pmi=/opt/cray/pe/pmi/default --enable-ofi-mr=basic --enable-completion-polling --disable-fortran
$ make
$ make install
where <SOS_install_dir>
is an appropriate installation path.
If you prefer not to use XPMEM for on-node communication, you must enable hard polling via the --enable-hard-polling
flag.
Due to a feature of Cray PMI, you will need to set the following environment variable before running OpenSHMEM jobs:
$ export PMI_MAX_KVS_ENTRIES=10000000
Note the maximum number of KVS entries may need to be increased depending on your job size.
On the NERSC systems, one can optionally use Slurm PMI. In which case, include the following configure options for SOS:
--with-pmi --with-pmi-libdir=/usr/lib64/slurmpmi --enable-pmi1
Note that when enabling Slurm PMI, you may need to assure the appropriate library is loaded:
$ export LD_LIBARARY_PATH=/usr/lib64/slurmpmi:$LD_LIBRARY_PATH
To test your build on a Cray XC system using SLURM, use the following $ make check
command line:
$ make check TEST_RUNNER="srun -n 2 -C haswell --exclusive"
Testing SOS on Cori may be easier by building the unit tests on the login node, then reserving 1 or more interactive nodes, then running make check
with the appropriate launcher:
$ make check TESTS= # (Build on the login node)
$ salloc -N 2 -C haswell -t 00:30:00 -q debug #(Reserve 2 interactive nodes)
$ make check TEST_RUNNER="srun -N 2 -n 8" #(Run tests on the interactive compute nodes)
If your Cray XC system uses aprun
, use
$ make check TEST_RUNNER="aprun -n 2"
The SOS build should be added to your path. If you used the build instructions posted here, this is done by running the following command:
$ export PATH=<SOS_install_dir>/bin:$PATH
Once SOS is in your path, you can use the compiler wrapper oshcc
to compile your application and the launcher wrapper oshrun
to run it.
Additionally you can utilize SMA_SYMMETRIC_HEAP_USE_HUGE_PAGES
when applicable
If you get an error like this when libsma
is being linked:
CC runtime-pmi2.lo
CCLD libsma.la
ld: cannot find -lalpsutil
ld: cannot find -lalpslli
ld: cannot find -lugni
Makefile:577: recipe for target 'libsma.la' failed
make[2]: *** [libsma.la] Error 1
this likely means that your libfabric needs to be rebuilt. This can happen after an OS upgrade on the system. Cray software installed in /opt/cray/... is sometimes de-installed during such upgrades, resulting in the libfabric.la file having out-of-date linking information for libtool.
There is a separate OFI libfabric-cray fork that provides special support for the GNI provider, but this fork is not regularly tested with SOS.
SPECIAL NOTE FOR NERSC USERS AS OF 8/8/17 It appears that the Cray PE group has elected to move the install location of Cray PMI. Probably this is associated with some arbitrary release of the Cray PE software (different from the Cray OS released software). So, the above configure options need to be slightly modified:
--with-pmi=/opt/cray/pe/pmi/default
should be used in place of /opt/cray/pmi/default
.
The SOS doesn't do a good job of rpathing shared libraries and binaries (including make check tests) at the time of this writing. That means you will likely need to adjust your LD_LIBRARY_PATH to pick up the right libfabric.so and the right libpmi.so. This is left as an exercise to the reader.
Regardless of which PMI you used when configuring SOS, as of this writing, you will need to add the following to your .bashrc
file or equivalent
for your favorite shell:
export SMA_OFI_ATOMIC_CHECKS_WARN=1