forked from Sandia-OpenSHMEM/SOS
-
Notifications
You must be signed in to change notification settings - Fork 0
Sandia SHMEM is an implementation of the OpenSHMEM Standard over multiple Networking APIs, including Portals4. Sandia SHMEM is designed to be a low-overhead implementation of the SHMEM standard which takes advantage of the many features of the advanced networking APIs, including the Portals 4 specification.
License
kseager/sandia-shmem
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Sandia OpenSHMEM ---------------- * About Sandia OpenSHMEM is an implementation of the OpenSHMEM specification over Portals 4.0, the Open Fabrics Interface (OFI), and XPMEM. * Building The Sandia OpenSHMEM implementation utilizes the Autoconf/Automake/Libtool build system. The standard GNU configure script and make system is used, as follows: % ./configure <options> % make % make check % make install The "make check" step is not strictly necessary, but is a good idea. Make check utilizes the TEST_RUNNER and NPROCS make variables, which can be used to override defaults, e.g. "make check NPROCS=4" or "make check TEST_RUNNER='mpiexec -n 2 -ppn 1 -hosts compute1,compute2'". Sandia OpenSHMEM must be configured to use either the Portals 4 or OFI network transport, but not both. It can optionally be configured to use XPMEM or CMA to optimize communication between PEs within the same shared memory domain. Options to configure include: --prefix=<DIR> Install implementation in <DIR>, default: /usr/local --with-portals4=<DIR> Find the Portals 4 library in <DIR> --with-ofi=<DIR> Find the libfabric library in <DIR> --with-xpmem=<DIR> Find the XPMEM library in <DIR> --with-cma Use cross-memory attach for on-node communication --with-pmi=DIR Location of PMI installation. Configure will automatically look for the PMI runtime provided by the Portals 4 reference implementation --enable-pmi-simple Include support for interfacing with a PMI 1.0 launcher. The launcher must be provided by a separate package, such as MPICH, Hydra, or SLURM. --enable-error-checking Enable error checking in SHMEM calls. This will increase the overhead of communication operations. --enable-hard-polling When using only the network transport, the implementation will use counting events to block the implementation when waiting for local memory changes. On some implementations, enabling hard polling may increase target side message rate. --enable-remote-virtual-addressing Enable optimizations assuming the symmetric heap is always symmetric with regards to virtual address. This may cause applications to abort during shmem_init() if such a symmetric heap can not be created, but will reduce the instruction count for some operations. This optimization also requires that the Portals 4 implementation support BIND_INACCESSIBLE on LEs. This optimization will reduce the overhead of communication calls. --disable-fortran Disable the Fortran bindings. This may be useful if the machine has a Fortran compiler which does not support ISO_C_BINDING. --enable-nonblocking-fence By default, shmem_fence() is equivalent to shmem_quiet(), which can be a lengthy operation. Enabling this feature results in the ordering point being moved from the shmem_fence() to the next put-like call, which can help improve overlap in some cases. --with-total-data-ordering=<yes|no|check> If a network supports total data ordering (that is, ordering guarantees to two different addresses on the same target node), this option can remove the shmem_quiet() from shmem_fence() calls when sending short messages. The option does, however, force ordering requirements on the network, so experimentation may be necessary to determine the best configuration. Yes means always assume total data ordering is available and abort a job if that's not the case. No means never use total data ordering optimizations. Check will result in slightly higher overhead than "yes", but will provide a fallback if the network doesn't provide total data ordering. There are many other options to configure to influence performance and behavior. See 'configure --help' for documentation on available options. * SHMEM Runtime Support Environment variables: SMA_VERSION: if defined, print SHMEM version during start_pes(). SMA_INFO: if defined, print (stdout) SHMEM environment variables. SMA_SYMMETRIC_SIZE (default: 64 MiB) The allocated size of the symmetric heap which shmalloc() and shfree() operates on. The size value can be scaled with a suffix of 'K' for kilobytes (B * 1024), 'M' for Megabytes (KiB * 1024) 'G' for Gigabytes (MiB * 1024) SMA_BOUNCE_SIZE (default: 2 KiB) The maximum size of a bounce buffer for put messages. Messages greater than the immediate send value for the underlying network but greater than this threshold will be copied into a bounce buffer and then sent. SMA_COLL_CROSSOVER (default: 4) For num_pes < SMA_COLL_CROSSOVER, collective algorithms are serial instead of tree based. SMA_COLL_RADIX (default: 4) Controls the width of the n-ary tree for collectives, such that each node will fanout-send to a max of approximately SMA_COLL_RADIX SMA_SYMMETRIC_HEAP_USE_MALLOC (default: 0) If set to a non-zero integer, will use malloc() instead of mmap() to allocate the symmetric heap. SMA_BARRIER_ALGORITHM (default: auto) Algorithm to use for barriers. Default is to auto-select (which may result in different algorithms being used for different PE sets). Options are: auto, linear, tree, dissem. SMA_BCAST_ALGORITHM (default: auto) Algorithm to use for broadcasts. Default is to auto-select (which may result in different algorithms being used for different PE sets). Options are: auto, linear, tree. SMA_REDUCE_ALGORITHM (default: auto) Algorithm to use for reductions. Default is to auto-select (which may result in different algorithms being used for different PE sets). Options are: auto, linear, tree, recdbl. SMA_COLLECT_ALGORITHM (default: auto) Algorithm to use for allgathers. Default is to auto-select (which may result in different algorithms being used for different PE sets). Options are: auto, linear. SMA_FCOLLECT_ALGORITHM (default: auto) Algorithm to use for allgathers with fixed contribution amounts. Default is to auto-select (which may result in different algorithms being used for different PE sets). Options are: auto, linear, ring, recdbl. Note that recursive doubling (recdbl) will fall back to ring if the PE set is not a power of two in size. SMA_CMA_PUT_MAX (default: 8192) '--with-cma', shmem put lengths <= CMA_PUT_MAX use process_vm_writev(); otherwise use Portals4 transport put. SMA_CMA_GET_MAX (default: 16384) '--with-cma', shmem get lengths <= CMA_GET_MAX use process_vm_readv(); otherwise use Portals4 transport get. SMA_SYMMETRIC_HEAP_USE_HUGE_PAGES (default: off) If defined, large pages will be used to back the symmetric heap. This feature is only available on Linux. SMA_SYMMETRIC_HEAP_PAGE_SIZE (default: 2MB) Used to specify a large page size when using large pages to back the symmetric heap. Ignored if SMA_SYMMETRIC_HEAP_USE_HUGE_PAGES is not set. Refer to SMA_SYMMETRIC_SIZE for input syntax. OFI Transport Environment variables: SMA_OFI_PROVIDER (default: auto) The name of the provider that should be used by the OFI transport. Shell-style wildcards, including * and ?, are allowed. The fi_info utility included with libfabric can be used for assistance with identifying the desired provider. SMA_OFI_FABRIC (default: auto) The name of the fabric that should be used by the OFI transport. Shell-style wildcards, including * and ?, are allowed. The fi_info utility included with libfabric can be used for assistance with identifying the desired fabric. SMA_OFI_DOMAIN (default: auto) The name of the fabric domain that should be used by the OFI transport. Shell-style wildcards, including * and ?, are allowed. The fi_info utility included with libfabric can be used for assistance with identifying the desired fabric domain. SMA_OFI_ATOMIC_CHECKS_WARN (default: off) If defined, OFI will not abort if fabric provider doesn't support every data type x op combination, instead it will print a warning. Debugging Environment variables: SMA_DEBUG (default: off) If defined enables debugging messages from OpenSHMEM runtime. SMA_TRAP_ON_ABORT (default: off) If defined, generate a trap when aborting an OpenSHMEM program. This can be used to interface with a debugger or generate core files.
About
Sandia SHMEM is an implementation of the OpenSHMEM Standard over multiple Networking APIs, including Portals4. Sandia SHMEM is designed to be a low-overhead implementation of the SHMEM standard which takes advantage of the many features of the advanced networking APIs, including the Portals 4 specification.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- C 80.8%
- Shell 9.6%
- M4 5.2%
- C++ 1.6%
- Fortran 1.3%
- Makefile 0.9%
- Perl 0.6%