Skip to content

Cray XC Build Instructions

David Ozog edited this page Nov 17, 2022 · 5 revisions

Building and Installing Sandia-OpenSHMEM (SOS) on a Cray XC:

OFI libfabric

The OFI build is the easiest and most portable way to build SOS for Cray XC systems. The libfabric repository is located here:

OFI libfabric

Libfabric may be setup to use the GNI provider via the following configuration (as of libfabric v1.6.1):

$ ./configure --prefix=<libfabric_install_dir> --enable-gni --disable-verbs --disable-sockets --disable-udp --disable-psm --disable-tcp --disable-efa

where <libfabric_install_dir> is an appropriate installation path.

On the NERSC Cori system, assure you have loaded the Cray-provided PrgEnv-gnu module. The GNI provider requires C11 atomics, but the Intel compilers (at least up to 18.0.1) have limited support for the atomic integer types.

Building SOS to use the libfabric GNI provider

Use the following configure options to build SOS to use the libfabric GNI provider, with Cray XPMEM and Cray PMI:

$ ./autogen.sh
$ ./configure --prefix=<SOS_install_dir> --with-ofi=<libfabric_install_dir> --with-xpmem=/opt/cray/xpmem/default --with-pmi=/opt/cray/pe/pmi/default --enable-ofi-mr=basic --enable-completion-polling --disable-fortran
$ make
$ make install

where <SOS_install_dir> is an appropriate installation path.

If you prefer not to use XPMEM for on-node communication, you must enable hard polling via the --enable-hard-polling flag.

Due to a feature of Cray PMI, you will need to set the following environment variable before running OpenSHMEM jobs:

$ export PMI_MAX_KVS_ENTRIES=10000000

Note the maximum number of KVS entries may need to be increased depending on your job size.

On the NERSC systems, one can optionally use Slurm PMI. In which case, include the following configure options for SOS:

--with-pmi --with-pmi-libdir=/usr/lib64/slurmpmi --enable-pmi1

Note that when enabling Slurm PMI, you may need to assure the appropriate library is loaded:

$ export LD_LIBARARY_PATH=/usr/lib64/slurmpmi:$LD_LIBRARY_PATH

Testing the build

To test your build on a Cray XC system using SLURM, use the following $ make check command line:

$ make check TEST_RUNNER="srun -n 2 -C haswell --exclusive"

Testing SOS on Cori may be easier by building the unit tests on the login node, then reserving 1 or more interactive nodes, then running make check with the appropriate launcher:

$ make check TESTS=  # (Build on the login node)
$ salloc -N 2 -C haswell -t 00:30:00 -q debug  #(Reserve 2 interactive nodes)
$ make check TEST_RUNNER="srun -N 2 -n 8"  #(Run tests on the interactive compute nodes)

If your Cray XC system uses aprun, use

$ make check TEST_RUNNER="aprun -n 2"

Compiling and running OpenSHMEM programs

The SOS build should be added to your path. If you used the build instructions posted here, this is done by running the following command:

$ export PATH=<SOS_install_dir>/bin:$PATH

Once SOS is in your path, you can use the compiler wrapper oshcc to compile your application and the launcher wrapper oshrun to run it.

Additionally you can utilize SMA_SYMMETRIC_HEAP_USE_HUGE_PAGES when applicable

Troubleshooting

If you get an error like this when libsma is being linked:

CC       runtime-pmi2.lo
CCLD     libsma.la
ld: cannot find -lalpsutil
ld: cannot find -lalpslli
ld: cannot find -lugni
Makefile:577: recipe for target 'libsma.la' failed
make[2]: *** [libsma.la] Error 1

this likely means that your libfabric needs to be rebuilt. This can happen after an OS upgrade on the system. Cray software installed in /opt/cray/... is sometimes de-installed during such upgrades, resulting in the libfabric.la file having out-of-date linking information for libtool.

Older notes (possibly obsolete):

There is a separate OFI libfabric-cray fork that provides special support for the GNI provider, but this fork is not regularly tested with SOS.

SPECIAL NOTE FOR NERSC USERS AS OF 8/8/17 It appears that the Cray PE group has elected to move the install location of Cray PMI. Probably this is associated with some arbitrary release of the Cray PE software (different from the Cray OS released software). So, the above configure options need to be slightly modified:

--with-pmi=/opt/cray/pe/pmi/default

should be used in place of /opt/cray/pmi/default.

Woes of not using RPATH (OBSOLETE - THIS PROBLEM IS FIXED IN 1.3.2 and newer releases of SOS)

The SOS doesn't do a good job of rpathing shared libraries and binaries (including make check tests) at the time of this writing. That means you will likely need to adjust your LD_LIBRARY_PATH to pick up the right libfabric.so and the right libpmi.so. This is left as an exercise to the reader.

Regardless of which PMI you used when configuring SOS, as of this writing, you will need to add the following to your .bashrc file or equivalent for your favorite shell:

export SMA_OFI_ATOMIC_CHECKS_WARN=1