MPI: CMake 3.18+ support #1114

Robadob · 2023-09-25T11:08:50Z

Device link was failing building tests_mpi target of distributed_ensemble branch on Bede with CMake 3.18, but fixed when using CMake 3.22.

May need to consider updating minimum required CMake version.

CMake 3.18

[ 93%] Linking CUDA device code CMakeFiles/tests_mpi.dir/cmake_device_link.o
cd /users/robadob/fgpu2/build/tests && /opt/software/builder/developers/tools/cmake/3.18.4/1/default/bin/cmake -E cmake_link_script CMakeFiles/tests_mpi.dir/dlink.txt --verbose=1
/opt/software/builder/developers/compilers/cuda/11.4.1/1/default/bin/nvcc -forward-unknown-to-host-compiler -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] -Wno-deprecated-gpu-targets -Xcompiler=-Wl,-rpath,-Wl,/opt/software/builder/developers/libraries/openmpi/4.0.5/1/gcc-native-cuda-11.4.1/lib,-Wl,--enable-new-dtags,-pthread -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/tests_mpi.dir/test_cases/simulation/test_mpi_ensemble.cu.o CMakeFiles/tests_mpi.dir/helpers/host_reductions_common.cu.o CMakeFiles/tests_mpi.dir/helpers/device_initialisation.cu.o CMakeFiles/tests_mpi.dir/helpers/main.cu.o -o CMakeFiles/tests_mpi.dir/cmake_device_link.o   -L/opt/software/builder/developers/compilers/cuda/11.4.1/1/default/targets/ppc64le-linux/lib/stubs  ../lib/Release/libflamegpu.a ../lib/libgtest.a  ../lib/Release/libtinyxml2.a -lstdc++fs -ldl -lcudadevrt -lcudart  -L"/opt/software/builder/developers/compilers/cuda/11.4.1/1/default/lib64"
gcc: error: unrecognized command-line option ‘-Wl’; did you mean ‘-W’?
gcc: error: unrecognized command-line option ‘-rpath’
gcc: error: unrecognized command-line option ‘-Wl’; did you mean ‘-W’?
gcc: error: unrecognized command-line option ‘-Wl’; did you mean ‘-W’?
gmake[3]: *** [tests/CMakeFiles/tests_mpi.dir/build.make:155: tests/CMakeFiles/tests_mpi.dir/cmake_device_link.o] Error 1

CMake 3.22

[ 97%] Linking CUDA device code CMakeFiles/tests_mpi.dir/cmake_device_link.o
cd /users/robadob/fgpu2/build/tests && /users/robadob/miniconda/miniconda/envs/cmake/bin/cmake -E cmake_link_script CMakeFiles/tests_mpi.dir/dlink.txt --verbose=1
/opt/software/builder/developers/compilers/cuda/11.4.1/1/default/bin/nvcc -forward-unknown-to-host-compiler -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] -Wno-deprecated-gpu-targets -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/tests_mpi.dir/test_cases/simulation/test_mpi_ensemble.cu.o CMakeFiles/tests_mpi.dir/helpers/host_reductions_common.cu.o CMakeFiles/tests_mpi.dir/helpers/device_initialisation.cu.o CMakeFiles/tests_mpi.dir/helpers/main.cu.o -o CMakeFiles/tests_mpi.dir/cmake_device_link.o   -L/opt/software/builder/developers/compilers/cuda/11.4.1/1/default/targets/ppc64le-linux/lib/stubs  ../lib/Release/libflamegpu.a ../lib/libgtest.a  ../lib/Release/libtinyxml2.a -ldl -lpthread -lcudadevrt -lcudart  -L"/opt/software/builder/developers/compilers/cuda/11.4.1/1/default/lib64"

The text was updated successfully, but these errors were encountered:

Robadob · 2023-09-25T11:15:41Z

@ptheywood's research from Slack chat

Looks like it's an MPI specific thing

https://discourse.cmake.org/t/unable-to-link-cuda-device-code-with-mpich-implementation/3006/8

Which leads to an issue that was fixed by a merge request.

https://gitlab.kitware.com/cmake/cmake/-/issues/21887

https://gitlab.kitware.com/cmake/cmake/-/merge_requests/5966 is the mr

Which looks like was part of 3.20.1

ptheywood · 2023-09-29T16:01:36Z

Unable to reproduce this on x86_64 ubuntu machines, which dont' seem to require the flags being passed.

We probably just want to warn on <= 3.20.1 that it might error as a dev warning, as its not a universal mpi + old cmake error.

Closes #1114

ptheywood · 2023-10-02T10:11:43Z

I've just confirmed on Bede that CMake 3.20.0 fails to link with the error above, while 3.20.1 does work (when MPI is enabled and the MPI installation requires some extra env variables passing to the host linker, i.e. the OpenMPI install on Bede).

So adding a message(WARNING ...) when MPI is enabled and found, but CMake is < 3.20.1 as part of #1090 would be the way to address this (I'll quickly add one).

Closes #1114

Add optional support for distributing simulations within an ensemble across multiple machines via MPI * CMake: Add FLAMEGPU_ENABLE_MPI option. * Breaking: CUDAEnsemble::getLogs() now returns a map They key corresponds to the index of the corresponding RunPlan within the input RunPlanVector. * BugFix: Ensemble log already exist exception message contained bad filepath. * BugFix: Replace occurences of throw with THROW * MPI tests in new target tests_mpi * Warn about MPI link failures with CMake < 3.20.1 * Warn at CMake Configure when mpich forces -flto. * CI: Add MPI CI * Assigns GPUs to MPI ranks per node, allowing more flexible MPI configurations MPI ensembles can use multiple mpi ranks per node, evenly(ish) distributing GPUs across the ranks per shared memory system. If more MPI ranks are used on a node than GPUs, additional ranks will do nothing and a warning is reported. I.e. any number of mpi ranks can be launched, but only the sensible amount will be used. If the user specifies device indices, they will be load balanced, otherwise all visible devices within the node will be balanced. Only one rank per node sends the device string back for telemetry, others send back an empty string (while the assembleGPUsString method is expecting a message from each rank in the world. If no valid CUDAdevices are provided, an exception is raised Device allocation is implemented in a static method so it can be tested programmatically, without launching the test N times with different MPI configurations. Closes #1114 --------- Co-authored-by: Peter Heywood <p.heywood@sheffield.ac.uk>

Robadob added bug cmake labels Sep 25, 2023

ptheywood added a commit that referenced this issue Oct 2, 2023

Warn about MPI link failures with CMake < 3.20.1

338a9e3

Closes #1114

ptheywood mentioned this issue Oct 2, 2023

Distributed Ensemble (MPI Support) #1090

Merged

11 tasks

ptheywood added a commit that referenced this issue Oct 13, 2023

Warn about MPI link failures with CMake < 3.20.1

3af079f

Closes #1114

ptheywood added a commit that referenced this issue Nov 21, 2023

Warn about MPI link failures with CMake < 3.20.1

f8541a6

Closes #1114

ptheywood changed the title ~~CMake 3.18+ support~~ MPI: CMake 3.18+ support Dec 11, 2023

ptheywood added a commit that referenced this issue Dec 12, 2023

Warn about MPI link failures with CMake < 3.20.1

3bb6a29

Closes #1114

ptheywood added a commit that referenced this issue Dec 13, 2023

Warn about MPI link failures with CMake < 3.20.1

4bc317b

Closes #1114

ptheywood added a commit that referenced this issue Dec 15, 2023

Warn about MPI link failures with CMake < 3.20.1

31d9f2b

Closes #1114

ptheywood added a commit that referenced this issue Dec 15, 2023

Warn about MPI link failures with CMake < 3.20.1

c6a0c33

Closes #1114

ptheywood closed this as completed in #1090 Dec 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPI: CMake 3.18+ support #1114

MPI: CMake 3.18+ support #1114

Robadob commented Sep 25, 2023 •

edited

Loading

Robadob commented Sep 25, 2023

ptheywood commented Sep 29, 2023 •

edited

Loading

ptheywood commented Oct 2, 2023 •

edited

Loading

MPI: CMake 3.18+ support #1114

MPI: CMake 3.18+ support #1114

Comments

Robadob commented Sep 25, 2023 • edited Loading

Robadob commented Sep 25, 2023

ptheywood commented Sep 29, 2023 • edited Loading

ptheywood commented Oct 2, 2023 • edited Loading

Robadob commented Sep 25, 2023 •

edited

Loading

ptheywood commented Sep 29, 2023 •

edited

Loading

ptheywood commented Oct 2, 2023 •

edited

Loading