Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI: CMake 3.18+ support #1114

Closed
Robadob opened this issue Sep 25, 2023 · 3 comments · Fixed by #1090
Closed

MPI: CMake 3.18+ support #1114

Robadob opened this issue Sep 25, 2023 · 3 comments · Fixed by #1090

Comments

@Robadob
Copy link
Member

Robadob commented Sep 25, 2023

Device link was failing building tests_mpi target of distributed_ensemble branch on Bede with CMake 3.18, but fixed when using CMake 3.22.

May need to consider updating minimum required CMake version.

CMake 3.18

[ 93%] Linking CUDA device code CMakeFiles/tests_mpi.dir/cmake_device_link.o
cd /users/robadob/fgpu2/build/tests && /opt/software/builder/developers/tools/cmake/3.18.4/1/default/bin/cmake -E cmake_link_script CMakeFiles/tests_mpi.dir/dlink.txt --verbose=1
/opt/software/builder/developers/compilers/cuda/11.4.1/1/default/bin/nvcc -forward-unknown-to-host-compiler -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] -Wno-deprecated-gpu-targets -Xcompiler=-Wl,-rpath,-Wl,/opt/software/builder/developers/libraries/openmpi/4.0.5/1/gcc-native-cuda-11.4.1/lib,-Wl,--enable-new-dtags,-pthread -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/tests_mpi.dir/test_cases/simulation/test_mpi_ensemble.cu.o CMakeFiles/tests_mpi.dir/helpers/host_reductions_common.cu.o CMakeFiles/tests_mpi.dir/helpers/device_initialisation.cu.o CMakeFiles/tests_mpi.dir/helpers/main.cu.o -o CMakeFiles/tests_mpi.dir/cmake_device_link.o   -L/opt/software/builder/developers/compilers/cuda/11.4.1/1/default/targets/ppc64le-linux/lib/stubs  ../lib/Release/libflamegpu.a ../lib/libgtest.a  ../lib/Release/libtinyxml2.a -lstdc++fs -ldl -lcudadevrt -lcudart  -L"/opt/software/builder/developers/compilers/cuda/11.4.1/1/default/lib64"
gcc: error: unrecognized command-line option ‘-Wl’; did you mean ‘-W’?
gcc: error: unrecognized command-line option ‘-rpath’
gcc: error: unrecognized command-line option ‘-Wl’; did you mean ‘-W’?
gcc: error: unrecognized command-line option ‘-Wl’; did you mean ‘-W’?
gmake[3]: *** [tests/CMakeFiles/tests_mpi.dir/build.make:155: tests/CMakeFiles/tests_mpi.dir/cmake_device_link.o] Error 1

CMake 3.22

[ 97%] Linking CUDA device code CMakeFiles/tests_mpi.dir/cmake_device_link.o
cd /users/robadob/fgpu2/build/tests && /users/robadob/miniconda/miniconda/envs/cmake/bin/cmake -E cmake_link_script CMakeFiles/tests_mpi.dir/dlink.txt --verbose=1
/opt/software/builder/developers/compilers/cuda/11.4.1/1/default/bin/nvcc -forward-unknown-to-host-compiler -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] -Wno-deprecated-gpu-targets -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/tests_mpi.dir/test_cases/simulation/test_mpi_ensemble.cu.o CMakeFiles/tests_mpi.dir/helpers/host_reductions_common.cu.o CMakeFiles/tests_mpi.dir/helpers/device_initialisation.cu.o CMakeFiles/tests_mpi.dir/helpers/main.cu.o -o CMakeFiles/tests_mpi.dir/cmake_device_link.o   -L/opt/software/builder/developers/compilers/cuda/11.4.1/1/default/targets/ppc64le-linux/lib/stubs  ../lib/Release/libflamegpu.a ../lib/libgtest.a  ../lib/Release/libtinyxml2.a -ldl -lpthread -lcudadevrt -lcudart  -L"/opt/software/builder/developers/compilers/cuda/11.4.1/1/default/lib64"
@Robadob
Copy link
Member Author

Robadob commented Sep 25, 2023

@ptheywood's research from Slack chat

Looks like it's an MPI specific thing

https://discourse.cmake.org/t/unable-to-link-cuda-device-code-with-mpich-implementation/3006/8

Which leads to an issue that was fixed by a merge request.

https://gitlab.kitware.com/cmake/cmake/-/issues/21887

https://gitlab.kitware.com/cmake/cmake/-/merge_requests/5966 is the mr

Which looks like was part of 3.20.1

@ptheywood
Copy link
Member

ptheywood commented Sep 29, 2023

Unable to reproduce this on x86_64 ubuntu machines, which dont' seem to require the flags being passed.

We probably just want to warn on <= 3.20.1 that it might error as a dev warning, as its not a universal mpi + old cmake error.

ptheywood added a commit that referenced this issue Oct 2, 2023
@ptheywood
Copy link
Member

ptheywood commented Oct 2, 2023

I've just confirmed on Bede that CMake 3.20.0 fails to link with the error above, while 3.20.1 does work (when MPI is enabled and the MPI installation requires some extra env variables passing to the host linker, i.e. the OpenMPI install on Bede).

So adding a message(WARNING ...) when MPI is enabled and found, but CMake is < 3.20.1 as part of #1090 would be the way to address this (I'll quickly add one).

ptheywood added a commit that referenced this issue Oct 13, 2023
ptheywood added a commit that referenced this issue Nov 21, 2023
@ptheywood ptheywood changed the title CMake 3.18+ support MPI: CMake 3.18+ support Dec 11, 2023
ptheywood added a commit that referenced this issue Dec 12, 2023
ptheywood added a commit that referenced this issue Dec 13, 2023
ptheywood added a commit that referenced this issue Dec 15, 2023
ptheywood added a commit that referenced this issue Dec 15, 2023
ptheywood added a commit that referenced this issue Dec 16, 2023
Add optional support for distributing simulations within an ensemble across multiple machines via MPI

* CMake: Add FLAMEGPU_ENABLE_MPI option.
* Breaking: CUDAEnsemble::getLogs() now returns a map
They key corresponds to the index of the corresponding RunPlan within the input RunPlanVector.
* BugFix: Ensemble log already exist exception message contained bad filepath.
* BugFix: Replace occurences of throw with THROW
* MPI tests in new target tests_mpi
* Warn about MPI link failures with CMake < 3.20.1
* Warn at CMake Configure when mpich forces -flto.
* CI: Add MPI CI
* Assigns GPUs to MPI ranks per node, allowing more flexible MPI configurations
MPI ensembles can use multiple mpi ranks per node, evenly(ish) distributing GPUs across the ranks per shared memory system.
If more MPI ranks are used on a node than GPUs, additional ranks will do nothing and a warning is reported.

I.e. any number of mpi ranks can be launched, but only the sensible amount will be used.

If the user specifies device indices, they will be load balanced, otherwise all visible devices within the node will be balanced.

Only one rank per node sends the device string back for telemetry, others send back an empty string (while the assembleGPUsString method is expecting a message from each rank in the world.

If no valid CUDAdevices are provided, an exception is raised

Device allocation is implemented in a static method so it can be tested programmatically, without launching the test N times with different MPI configurations.

Closes #1114

---------

Co-authored-by: Peter Heywood <p.heywood@sheffield.ac.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants