The paper will be presented at the SC '24 (The International Conference for High Performance Computing, Networking, Storage, and Analysis).
- CPU-driven Multicast-based Allgather/Broadcast integrated within the UCC library:
./ucc/src/components/tl/spin
- collectives backend source-code../osu-micro-benchmarks-7.3
- OSU MPI benchmarks supporting per-iteration timing for Allgather/Broadcast collectives./ompi/
and./ucx
- subrepos with project dependencies
- DPA-offloaded collective receive datapath PoC:
./coll-offloading/*
./coll-offloading/README.md
- detailed manual for building/running PoC.
- Multicast-based Allgather traffic model:
./sim/estimate_allgather_cost.py
Prerequisites: InfiniBand-based cluster and user permissions to create multicast-groups through OpenSM.
$ git submodule update --init --recursive
$ bash ./build.sh
$ source ./sourceme.sh
$ cd ./osu-micro-benchmarks-7.3/ && ./configure CC=mpicc CXX=mpicxx && make -j
OpenMPI 5.0.3 compiled with the support of UCC multicast backend will be stored in ./ompi/build
. Multicast-based backend can be tested by running:
mpirun -x LD_LIBRARY_PATH -x PATH --hostfile <machine_hostfile> -np <nprocs> --mca coll_ucc_enable 1 --mca coll_ucc_priority 100 ./osu-micro-benchmarks-7.3/c/mpi/collective/blocking/osu_allgather -m 4096:8388608
Prerequisites: Two servers with BlueField-3 NIC and DOCA v2.2.0.
See ./coll-offloading/README.md
for building and running instructions instructions.
Run python3 ./sim/estimate_allgather_cost.py
to get .csv output with traffic estimations for different Allgather algorithms.