Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA-aware MPI build on fresh Ubuntu 24.04 LTS, MPIX_Query_cuda_support() returns zero #13130

Open
niklebedenko opened this issue Mar 6, 2025 · 16 comments

Comments

@niklebedenko
Copy link

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v5.0.7

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Obtained from https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.7.tar.gz

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

N/A

Please describe the system on which you are running

  • Operating system/version: Ubuntu 24.04 LTS
  • Computer hardware: x86_64
  • Network type: single node, single GPU

Details of the problem

I'm really struggling to run CUDA-aware MPI on just one node. I want to do this so that I can test my code locally before deploying to a cluster. I've reproduced this on a fresh install of Ubuntu 24.04 on two different machines.

Here's my install steps:

tar xf openmpi-5.0.7.tar.gz
cd openmpi-5.0.7
mkdir build
cd build
../configure --with-cuda=/usr/local/cuda --prefix=/opt/openmpi | tee config.out
make -j$(nproc) all | tee make.out
sudo make install

Now, I build a very simple test program:

// mpi_check.c
#include "mpi.h"
#include <stdio.h>

#if !defined(OPEN_MPI) || !OPEN_MPI
#error This source code uses an Open MPI-specific extension
#endif

/* Needed for MPIX_Query_cuda_support(), below */
#include "mpi-ext.h"

int main(int argc, char* argv[]) {
        MPI_Init(&argc, &argv);

        printf("Compile time check:\n");
#if defined(MPIX_CUDA_AWARE_SUPPORT) && MPIX_CUDA_AWARE_SUPPORT
        printf("This MPI library has CUDA-aware support.\n");
#else
        printf("This MPI library does not have CUDA-aware support.\n");
#endif /* MPIX_CUDA_AWARE_SUPPORT */

        printf("Run time check:\n");
#if defined(MPIX_CUDA_AWARE_SUPPORT)
        if (1 == MPIX_Query_cuda_support()) {
                printf("This MPI library has CUDA-aware support.\n");
        }
        else {
                printf("This MPI library does not have CUDA-aware support.\n");
        }
#endif /* MPIX_CUDA_AWARE_SUPPORT */

        MPI_Finalize();

        return 0;
}

This was built with:

/opt/openmpi/bin/mpicc mpi_check.c -o mpi_check

/opt/openmpi/bin/mpirun -n 1 ./mpi_check

Then, we get this output:

Compile time check:
This MPI library has CUDA-aware support.
Run time check:
This MPI library does not have CUDA-aware support.

However, if I just run ./mpi_check, i.e. no mpirun, I get this output:

Authorization required, but no authorization protocol specified

Compile time check:
This MPI library has CUDA-aware support.
Run time check:
This MPI library has CUDA-aware support.

There's no other MPI installations, this was reproduced on two independent machines.

Perhaps I'm missing a step, or missing some configuration, but I've tried lots of variations of each of the above commands to no avail, and (I think?) I've followed the install instructions in the documentation correctly. So I believe it is a bug.

If I'm missing something, please let me know. Also please let me know if you'd like the config.out and make.out log files.

@bosilca
Copy link
Member

bosilca commented Mar 7, 2025

I am unable to replicate, as long as I run your example on a machine with CUDA devices attached it works just as expected. If I run on a machine without GPUs then it fails are runtime.

Can you run mpirun -np 1 --mca accelerator_base_verbose 100 ./mpi_check to see if there is anything interesting in the output of the accelerator module.

@niklebedenko
Copy link
Author

Thanks for responding so quickly --- here's the output you asked for:

$ /opt/openmpi/bin/mpirun -np 1 --mca accelerator_base_verbose 100 ./mpi_check
[<hostname>:670389] mca: base: components_register: registering framework accelerator components
[<hostname>:670389] mca: base: components_register: found loaded component null
[<hostname>:670389] mca: base: components_register: component null register function successful
[<hostname>:670389] mca: base: components_open: opening accelerator components
[<hostname>:670389] mca: base: components_open: found loaded component null
[<hostname>:670389] mca: base: components_open: component null open function successful
[<hostname>:670389] select: initializing accelerator component null
[<hostname>:670389] selected null
Compile time check:
This MPI library has CUDA-aware support.
Run time check:
This MPI library does not have CUDA-aware support.

I've also tried to add --mca accelerator cuda, but that gives:

$ /opt/openmpi/bin/mpirun -np 1 --mca accelerator_base_verbose 100 --mca accelerator cuda ./mpi_check
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      <hostname>
Framework: accelerator
Component: cuda
--------------------------------------------------------------------------
[<hostname>:671182] *** Process received signal ***
[<hostname>:671182] Signal: Segmentation fault (11)
[<hostname>:671182] Signal code: Address not mapped (1)
[<hostname>:671182] Failing at address: (nil)
[<hostname>:671182] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x45330)[0x791767e45330]
[<hostname>:671182] *** End of error message ***
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 671182 on node <hostname> exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------

I think these lines from config.out might also be helpful:

$ grep "cuda" config.out
configure: running /bin/bash ../../../3rd-party/openpmix/configure --disable-option-checking '--prefix=/opt/openmpi' --without-tests-examples --enable-pmix-binaries --disable-pmix-backward-compatibility --disable-visibility --disable-devel-check '--with-cuda=/usr/local/cuda' --cache-file=/dev/null --srcdir=../../../3rd-party/openpmix
checking for subdir args...  '--disable-option-checking' '--prefix=/opt/openmpi' '--without-tests-examples' '--enable-pmix-binaries' '--disable-pmix-backward-compatibility' '--disable-visibility' '--disable-devel-check' '--with-cuda=/usr/local/cuda'
configure: running /bin/bash ../../../3rd-party/prrte/configure --disable-option-checking '--prefix=/opt/openmpi' --enable-prte-ft --with-proxy-version-string=5.0.7 --with-proxy-package-name="Open MPI" --with-proxy-bugreport="https://www.open-mpi.org/community/help/" --disable-devel-check --enable-prte-prefix-by-default '--with-cuda=/usr/local/cuda' --cache-file=/dev/null --srcdir=../../../3rd-party/prrte
checking for subdir args...  '--disable-option-checking' '--prefix=/opt/openmpi' '--enable-prte-ft' '--with-proxy-version-string=5.0.7' '--with-proxy-package-name=Open MPI' '--with-proxy-bugreport=https://www.open-mpi.org/community/help/' '--disable-devel-check' '--enable-prte-prefix-by-default' '--with-cuda=/usr/local/cuda'
checking for subdir args...  '--with-cuda=/usr/local/cuda' '--prefix=/opt/openmpi'
checking which components should be run-time loadable... rcache-rgpusm rcache-gpusm btl-smcuda accelerator-ze accelerator-rocm accelerator-cuda (default)
checking for m4 configure components in framework accelerator... cuda, rocm
--- MCA component accelerator:cuda (m4 configuration macro)
checking for MCA component accelerator:cuda compile mode... dso
checking if --with-cuda is set... found (/usr/local/cuda/include/cuda.h)
checking for cuda pkg-config name... /usr/local/cuda/targets/x86_64-linux/lib/stubs/pkgconfig/cuda.pc
checking if cuda pkg-config module exists... no
checking for cuda header at /usr/local/cuda/include... found
checking for cuda library (cuda) in /usr/local/cuda/targets/x86_64-linux/lib/stubs... found
checking for cuda cppflags... -I/usr/local/cuda/include
checking for cuda ldflags... -L/usr/local/cuda/targets/x86_64-linux/lib/stubs
checking for cuda libs... -lcuda
checking for cuda static libs... -lcuda
checking for cuda.h... yes
checking if cuda requires libnl v1 or v3... none
checking if have cuda support... yes (-I/usr/local/cuda/include)
checking if MCA component accelerator:cuda can compile... yes
checking for m4 configure components in framework btl... ofi, portals4, sm, smcuda, tcp, uct, ugni, usnic
--- MCA component btl:smcuda (m4 configuration macro)
checking for MCA component btl:smcuda compile mode... dso
checking if --with-cuda is set... found (/usr/local/cuda/include/cuda.h)
checking for cuda pkg-config name... (cached) /usr/local/cuda/targets/x86_64-linux/lib/stubs/pkgconfig/cuda.pc
checking if cuda pkg-config module exists... (cached) no
checking for cuda header at /usr/local/cuda/include... found
checking for cuda library (cuda) in /usr/local/cuda/targets/x86_64-linux/lib/stubs... found
checking for cuda cppflags... -I/usr/local/cuda/include
checking for cuda ldflags... -L/usr/local/cuda/targets/x86_64-linux/lib/stubs
checking for cuda libs... -lcuda
checking for cuda static libs... -lcuda
checking for cuda.h... (cached) yes
checking if cuda requires libnl v1 or v3... (cached) none
checking if have cuda support... yes (-I/usr/local/cuda/include)
checking if MCA component btl:smcuda can compile... yes
checking if --with-cuda is set... found (/usr/local/cuda/include/cuda.h)
checking for cuda pkg-config name... (cached) /usr/local/cuda/targets/x86_64-linux/lib/stubs/pkgconfig/cuda.pc
checking if cuda pkg-config module exists... (cached) no
checking for cuda header at /usr/local/cuda/include... found
checking for cuda library (cuda) in /usr/local/cuda/targets/x86_64-linux/lib/stubs... found
checking for cuda cppflags... -I/usr/local/cuda/include
checking for cuda ldflags... -L/usr/local/cuda/targets/x86_64-linux/lib/stubs
checking for cuda libs... -lcuda
checking for cuda static libs... -lcuda
checking for cuda.h... (cached) yes
checking if cuda requires libnl v1 or v3... (cached) none
checking if have cuda support... yes (-I/usr/local/cuda/include)
checking if --with-cuda is set... found (/usr/local/cuda/include/cuda.h)
checking for cuda pkg-config name... (cached) /usr/local/cuda/targets/x86_64-linux/lib/stubs/pkgconfig/cuda.pc
checking if cuda pkg-config module exists... (cached) no
checking for cuda header at /usr/local/cuda/include... found
checking for cuda library (cuda) in /usr/local/cuda/targets/x86_64-linux/lib/stubs... found
checking for cuda cppflags... -I/usr/local/cuda/include
checking for cuda ldflags... -L/usr/local/cuda/targets/x86_64-linux/lib/stubs
checking for cuda libs... -lcuda
checking for cuda static libs... -lcuda
checking for cuda.h... (cached) yes
checking if cuda requires libnl v1 or v3... (cached) none
checking if have cuda support... yes (-I/usr/local/cuda/include)
checking for m4 configure components in framework coll... cuda, ftagree, hcoll, monitoring, portals4, sm, ucc
--- MCA component coll:cuda (m4 configuration macro)
checking for MCA component coll:cuda compile mode... static
checking if MCA component coll:cuda can compile... yes
configure: running /bin/bash '../../../3rd-party/romio341/configure'  FROM_OMPI=yes CC="gcc" CFLAGS="-O3 -DNDEBUG  -finline-functions -mcx16 -D__EXTENSIONS__" CPPFLAGS="" FFLAGS="" LDFLAGS="" --enable-shared --disable-static  --prefix=/opt/openmpi --disable-aio --disable-weak-symbols --enable-strict --disable-f77 --disable-f90 ac_cv_lib_cuda_cuMemGetAddressRange=no ac_cv_lib_cudart_cudaStreamSynchronize=no --cache-file=/dev/null --srcdir=../../../3rd-party/romio341 --disable-option-checking
configure: running /bin/bash ../../../../3rd-party/romio341/mpl/configure --disable-option-checking '--prefix=/opt/openmpi' --disable-versioning --enable-embedded 'FROM_OMPI=yes' 'CC=gcc' 'CFLAGS=-O3 -DNDEBUG  -finline-functions -mcx16 -D__EXTENSIONS__' 'CPPFLAGS=' 'FFLAGS=' 'LDFLAGS=' '--enable-shared' '--disable-static' '--disable-aio' '--disable-weak-symbols' '--enable-strict' '--disable-f77' '--disable-f90' 'ac_cv_lib_cuda_cuMemGetAddressRange=no' 'ac_cv_lib_cudart_cudaStreamSynchronize=no' --cache-file=/dev/null --srcdir=../../../../3rd-party/romio341/mpl
checking for cuda_runtime_api.h... no
checking for cudaStreamSynchronize in -lcudart... (cached) no
checking for cuda.h... no
checking for cuMemGetAddressRange in -lcuda... (cached) no
checking for available MPI Extensions... affinity, cuda, ftmpi, rocm, shortfloat
--- MPI Extension cuda
checking if MPI Extension cuda can compile... yes
checking if MPI Extension cuda has C bindings... yes (required)
checking if MPI Extension cuda has mpif.h bindings... no
checking if MPI Extension cuda has "use mpi" bindings... no
checking if MPI Extension cuda has "use mpi_f08" bindings... no
config.status: creating opal/mca/accelerator/cuda/Makefile
config.status: creating opal/mca/btl/smcuda/Makefile
config.status: creating ompi/mca/coll/cuda/Makefile
config.status: creating ompi/mpiext/cuda/Makefile
config.status: creating ompi/mpiext/cuda/c/Makefile
config.status: creating ompi/mpiext/cuda/c/mpiext_cuda_c.h

It seems that most of the components can find cuda.h, apart from one: the (recently-removed) romio341 thing.

Finally, both machines were running with CUDA GPUs inside. Here's some relevant output for that:

$ nvidia-smi
Fri Mar  7 08:25:24 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:02:00.0 Off |                  Off |
|  0%   32C    P8              12W / 450W |     67MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1914      G   /usr/lib/xorg/Xorg                           39MiB |
|    0   N/A  N/A      2442      G   /usr/bin/gnome-shell                         15MiB |
+---------------------------------------------------------------------------------------+

$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Jan_15_19:20:09_PST_2025
Cuda compilation tools, release 12.8, V12.8.61
Build cuda_12.8.r12.8/compiler.35404655_0

Admittedly, there's a mismatch in CUDA versions between the driver and compiler. Not sure if that's the issue.

@bosilca
Copy link
Member

bosilca commented Mar 7, 2025

You are correct, most of the components building as DSO found CUDA. However, coll:cuda decided to be built statically and failed.

Something is weird with your build because at the end the CUDA accelerator module has not been build, that's why you don't see it on the output I asked you for, and it fails when you try to force loading it. If you go in build directory then opal/mca/accelerator/cuda do you see a Makefile and if yes what's the output of make clean && make V=1?

@niklebedenko
Copy link
Author

niklebedenko commented Mar 7, 2025

There is a Makefile. Here's the output now

$ make clean
test -z "*~ .#*" || rm -f *~ .#*
rm -rf .libs _libs
test -z "mca_accelerator_cuda.la" || rm -f mca_accelerator_cuda.la
rm -f ./so_locations
test -z "" || rm -f 
rm -f *.o
rm -f *.lo
$ make V=1
depbase=`echo accelerator_cuda_component.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/bash ../../../../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I../../../../../opal/mca/accelerator/cuda -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../ompi/mpiext/cuda/c -I../../../../ompi/mpiext/rocm/c  -I/usr/local/cuda/include -iquote../../../../.. -iquote../../../.. -iquote../../../../../opal/include -iquote../../../../../ompi/include -iquote../../../../../oshmem/include  -I/usr/lib/x86_64-linux-gnu/pmix2/include -I/usr/lib/x86_64-linux-gnu/pmix2/include/pmix  -O3 -DNDEBUG  -finline-functions -mcx16 -MT accelerator_cuda_component.lo -MD -MP -MF $depbase.Tpo -c -o accelerator_cuda_component.lo ../../../../../opal/mca/accelerator/cuda/accelerator_cuda_component.c &&\
mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../../../opal/mca/accelerator/cuda -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../ompi/mpiext/cuda/c -I../../../../ompi/mpiext/rocm/c -I/usr/local/cuda/include -iquote../../../../.. -iquote../../../.. -iquote../../../../../opal/include -iquote../../../../../ompi/include -iquote../../../../../oshmem/include -I/usr/lib/x86_64-linux-gnu/pmix2/include -I/usr/lib/x86_64-linux-gnu/pmix2/include/pmix -O3 -DNDEBUG -finline-functions -mcx16 -MT accelerator_cuda_component.lo -MD -MP -MF .deps/accelerator_cuda_component.Tpo -c ../../../../../opal/mca/accelerator/cuda/accelerator_cuda_component.c  -fPIC -DPIC -o .libs/accelerator_cuda_component.o
depbase=`echo accelerator_cuda.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/bash ../../../../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I../../../../../opal/mca/accelerator/cuda -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../ompi/mpiext/cuda/c -I../../../../ompi/mpiext/rocm/c  -I/usr/local/cuda/include -iquote../../../../.. -iquote../../../.. -iquote../../../../../opal/include -iquote../../../../../ompi/include -iquote../../../../../oshmem/include  -I/usr/lib/x86_64-linux-gnu/pmix2/include -I/usr/lib/x86_64-linux-gnu/pmix2/include/pmix  -O3 -DNDEBUG  -finline-functions -mcx16 -MT accelerator_cuda.lo -MD -MP -MF $depbase.Tpo -c -o accelerator_cuda.lo ../../../../../opal/mca/accelerator/cuda/accelerator_cuda.c &&\
mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../../../opal/mca/accelerator/cuda -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../ompi/mpiext/cuda/c -I../../../../ompi/mpiext/rocm/c -I/usr/local/cuda/include -iquote../../../../.. -iquote../../../.. -iquote../../../../../opal/include -iquote../../../../../ompi/include -iquote../../../../../oshmem/include -I/usr/lib/x86_64-linux-gnu/pmix2/include -I/usr/lib/x86_64-linux-gnu/pmix2/include/pmix -O3 -DNDEBUG -finline-functions -mcx16 -MT accelerator_cuda.lo -MD -MP -MF .deps/accelerator_cuda.Tpo -c ../../../../../opal/mca/accelerator/cuda/accelerator_cuda.c  -fPIC -DPIC -o .libs/accelerator_cuda.o
/bin/bash ../../../../libtool  --tag=CC   --mode=link gcc  -O3 -DNDEBUG  -finline-functions -mcx16 -module -avoid-version -L/usr/local/cuda/targets/x86_64-linux/lib/stubs -L/usr/lib/x86_64-linux-gnu/pmix2/lib -Wl,-rpath -Wl,/usr/lib/x86_64-linux-gnu/pmix2/lib -Wl,--enable-new-dtags -o mca_accelerator_cuda.la -rpath /opt/openmpi/lib/openmpi accelerator_cuda_component.lo accelerator_cuda.lo ../../../../opal/libopen-pal.la -lcuda -lm -levent_core -levent_pthreads -lhwloc -lpmix
libtool: link: gcc -shared  -fPIC -DPIC  .libs/accelerator_cuda_component.o .libs/accelerator_cuda.o   -Wl,-rpath -Wl,/home/<user>/Downloads/openmpi-5.0.7/build/opal/.libs -Wl,-rpath -Wl,/opt/openmpi/lib -L/usr/local/cuda/targets/x86_64-linux/lib/stubs -L/usr/lib/x86_64-linux-gnu/pmix2/lib ../../../../opal/.libs/libopen-pal.so -ldl -lcuda -lm -levent_core -levent_pthreads -lhwloc -lpmix  -O3 -mcx16 -Wl,-rpath -Wl,/usr/lib/x86_64-linux-gnu/pmix2/lib -Wl,--enable-new-dtags   -Wl,-soname -Wl,mca_accelerator_cuda.so -o .libs/mca_accelerator_cuda.so
libtool: link: ( cd ".libs" && rm -f "mca_accelerator_cuda.la" && ln -s "../mca_accelerator_cuda.la" "mca_accelerator_cuda.la" )

@bosilca
Copy link
Member

bosilca commented Mar 7, 2025

The CUDA accelerator component is build, but not loaded.

  1. Let's check that it is installed properly. what's the output of make install V=1 in the same directory as above ? Do you see the MCA module in ${INSTALLDIR}/lib/openmpi/ ?
  2. What is ompi_info reporting ? Do you see the CUDA accelerator component in the output ? If yes what is the output of ompi_info --param accelerator cuda -l 9 ?

@niklebedenko
Copy link
Author

niklebedenko commented Mar 7, 2025

$ sudo make install V=1
make[1]: Entering directory '/home/<user>/Downloads/openmpi-5.0.7/build/opal/mca/accelerator/cuda'
make[1]: Nothing to be done for 'install-exec-am'.
 /usr/bin/mkdir -p '/opt/openmpi/share/openmpi'
 /usr/bin/install -c -m 644 ../../../../../opal/mca/accelerator/cuda/help-accelerator-cuda.txt '/opt/openmpi/share/openmpi'
 /usr/bin/mkdir -p '/opt/openmpi/lib/openmpi'
 /bin/bash ../../../../libtool   --mode=install /usr/bin/install -c   mca_accelerator_cuda.la '/opt/openmpi/lib/openmpi'
libtool: warning: relinking 'mca_accelerator_cuda.la'
libtool: install: (cd /home/<user>/Downloads/openmpi-5.0.7/build/opal/mca/accelerator/cuda; /bin/bash "/home/<user>/Downloads/openmpi-5.0.7/build/libtool"  --tag CC --mode=relink gcc -O3 -DNDEBUG -finline-functions -mcx16 -module -avoid-version -L/usr/local/cuda/targets/x86_64-linux/lib/stubs -L/usr/lib/x86_64-linux-gnu/pmix2/lib -Wl,-rpath -Wl,/usr/lib/x86_64-linux-gnu/pmix2/lib -Wl,--enable-new-dtags -o mca_accelerator_cuda.la -rpath /opt/openmpi/lib/openmpi accelerator_cuda_component.lo accelerator_cuda.lo ../../../../opal/libopen-pal.la -lcuda -lm -levent_core -levent_pthreads -lhwloc -lpmix )
libtool: relink: gcc -shared  -fPIC -DPIC  .libs/accelerator_cuda_component.o .libs/accelerator_cuda.o   -Wl,-rpath -Wl,/opt/openmpi/lib -L/usr/local/cuda/targets/x86_64-linux/lib/stubs -L/usr/lib/x86_64-linux-gnu/pmix2/lib -L/opt/openmpi/lib -lopen-pal -ldl -lcuda -lm -levent_core -levent_pthreads -lhwloc -lpmix  -O3 -mcx16 -Wl,-rpath -Wl,/usr/lib/x86_64-linux-gnu/pmix2/lib -Wl,--enable-new-dtags   -Wl,-soname -Wl,mca_accelerator_cuda.so -o .libs/mca_accelerator_cuda.so
libtool: install: /usr/bin/install -c .libs/mca_accelerator_cuda.soT /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so
libtool: install: /usr/bin/install -c .libs/mca_accelerator_cuda.lai /opt/openmpi/lib/openmpi/mca_accelerator_cuda.la
libtool: finish: PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/sbin" ldconfig -n /opt/openmpi/lib/openmpi
----------------------------------------------------------------------
Libraries have been installed in:
   /opt/openmpi/lib/openmpi

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the '-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the 'LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the 'LD_RUN_PATH' environment variable
     during linking
   - use the '-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to '/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
make[1]: Leaving directory '/home/<user>/Downloads/openmpi-5.0.7/build/opal/mca/accelerator/cuda'
$ cd /opt/openmpi/lib/openmpi/
$ ls
libompi_dbg_msgq.la  mca_accelerator_cuda.la  mca_btl_smcuda.la  mca_rcache_gpusm.la  mca_rcache_rgpusm.la
libompi_dbg_msgq.so  mca_accelerator_cuda.so  mca_btl_smcuda.so  mca_rcache_gpusm.so  mca_rcache_rgpusm.so
$ /opt/openmpi/bin/ompi_info | grep "cuda"
  Configure command line: '--with-cuda=/usr/local/cuda' '--prefix=/opt/openmpi'
          MPI extensions: affinity, cuda, ftmpi, rocm, shortfloat
         MCA accelerator: cuda (MCA v2.1.0, API v1.0.0, Component v5.0.7)
                 MCA btl: smcuda (MCA v2.1.0, API v3.3.0, Component v5.0.7)
                MCA coll: cuda (MCA v2.1.0, API v2.4.0, Component v5.0.7)
$ /opt/openmpi/bin/ompi_info --param accelerator cuda -l 9
         MCA accelerator: cuda (MCA v2.1.0, API v1.0.0, Component v5.0.7)

FYI I'm still getting the same behaviour from the compile / run commands for mpi_check.c.

@bosilca
Copy link
Member

bosilca commented Mar 7, 2025

Everything seems to be in place, but the CUDA accelerator component is not loaded. ldd /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so or readelf -d /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so please.

@niklebedenko
Copy link
Author

$ ldd /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so
        linux-vdso.so.1 (0x0000720a8b478000)
        libopen-pal.so.80 => /opt/openmpi/lib/libopen-pal.so.80 (0x0000720a8b355000)
        libcuda.so.1 => /lib/x86_64-linux-gnu/libcuda.so.1 (0x0000720a89600000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000720a89200000)
        libevent_core-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_core-2.1.so.7 (0x0000720a8b307000)
        libevent_pthreads-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x0000720a8b302000)
        libhwloc.so.15 => /lib/x86_64-linux-gnu/libhwloc.so.15 (0x0000720a8b29f000)
        libpmix.so.2 => /usr/lib/x86_64-linux-gnu/pmix2/lib/libpmix.so.2 (0x0000720a88e00000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000720a89517000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x0000720a8b29a000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x0000720a8b295000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x0000720a8b28e000)
        /lib64/ld-linux-x86-64.so.2 (0x0000720a8b47a000)
        libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x0000720a894e4000)
        libmunge.so.2 => /lib/x86_64-linux-gnu/libmunge.so.2 (0x0000720a8b286000)
        libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x0000720a8b279000)
$ readelf -d /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so

Dynamic section at offset 0x4d60 contains 28 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libopen-pal.so.80]
 0x0000000000000001 (NEEDED)             Shared library: [libcuda.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000e (SONAME)             Library soname: [mca_accelerator_cuda.so]
 0x000000000000001d (RUNPATH)            Library runpath: [/opt/openmpi/lib:/usr/lib/x86_64-linux-gnu/pmix2/lib]
 0x000000000000000c (INIT)               0x2000
 0x000000000000000d (FINI)               0x3f7c
 0x0000000000000019 (INIT_ARRAY)         0x5d50
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x5d58
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x2f0
 0x0000000000000005 (STRTAB)             0x980
 0x0000000000000006 (SYMTAB)             0x338
 0x000000000000000a (STRSZ)              1570 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000003 (PLTGOT)             0x5fe8
 0x0000000000000002 (PLTRELSZ)           1080 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x1500
 0x0000000000000007 (RELA)               0x1068
 0x0000000000000008 (RELASZ)             1176 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x1028
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0xfa2
 0x000000006ffffff9 (RELACOUNT)          30
 0x0000000000000000 (NULL)               0x0

@bosilca
Copy link
Member

bosilca commented Mar 7, 2025

Everything looks normal. Let's make sure launching an app does not screw up the environment mpirun -np 1 ldd /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so

@niklebedenko
Copy link
Author

niklebedenko commented Mar 7, 2025

$ /opt/openmpi/bin/mpirun -np 1 ldd /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so
        linux-vdso.so.1 (0x0000711305890000)
        libopen-pal.so.80 => /opt/openmpi/lib/libopen-pal.so.80 (0x000071130576d000)
        libcuda.so.1 => /lib/x86_64-linux-gnu/libcuda.so.1 (0x0000711303a00000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000711303600000)
        libevent_core-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_core-2.1.so.7 (0x000071130571f000)
        libevent_pthreads-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x000071130571a000)
        libhwloc.so.15 => /lib/x86_64-linux-gnu/libhwloc.so.15 (0x00007113056b7000)
        libpmix.so.2 => /usr/lib/x86_64-linux-gnu/pmix2/lib/libpmix.so.2 (0x0000711303200000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000711303917000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007113056b2000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007113056ad000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007113056a6000)
        /lib64/ld-linux-x86-64.so.2 (0x0000711305892000)
        libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x0000711305673000)
        libmunge.so.2 => /lib/x86_64-linux-gnu/libmunge.so.2 (0x000071130390f000)
        libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x0000711303902000)

Btw I've not added /opt/openmpi/bin to path, in case that matters.

@bosilca
Copy link
Member

bosilca commented Mar 7, 2025

that might be a reason, not PATH but LD_LIBRARY_PATH. Most of the components are build statically in the libmpi.so with a few exceptions, and CUDA-based components are part of these exceptions. But I'm slightly skeptical as ompi_info managed to find the CUDA shared library and all the processes are local so they should inherit the mpirun environement.

But just in case you can try

export LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH
/opt/openmpi/bin/mpirun -np 1 -x LD_LIBRARY_PATH ./mpi_check

I'm running out of ideas unfortunately.

@niklebedenko
Copy link
Author

niklebedenko commented Mar 7, 2025

Unfortunately still the same behaviour :(

Thank you so much for taking the time.

You said you were unable to reproduce this error --- could you tell me what setup you used on your end to produce a working CUDA-aware OpenMPI build on Ubuntu 24.04 LTS? If there's a docker container that has a working installation that I could run my code in, that would work too.

I'm also really puzzled that the output of simply ./mpi_check shows CUDA-awareness:

$ ./mpi_check 
Authorization required, but no authorization protocol specified

Compile time check:
This MPI library has CUDA-aware support.
Run time check:
This MPI library has CUDA-aware support.

@nariaki3551
Copy link

nariaki3551 commented Mar 14, 2025

#12334 (comment) might be related.

I encountered same issue, and it was resolved by adding the --with-cuda-libdir=/usr/local/cuda/lib64/stubs option when running configure.

# mpirun -n 1 ./mpi_check
Compile time check:
This MPI library has CUDA-aware support.
Run time check:
This MPI library has CUDA-aware support.

Environment: OpenMPI: v5.0.3, Ubuntu: 20.04.6 LTS

@niklebedenko
Copy link
Author

Hey thanks for your advice! I tried this, but still no luck :/

My exact installation instructions this time (still on Ubuntu 24.04 LTS, but this time with OpenMPI v5.0.3 and no --prefix=/opt/openmpi):

# clean up old install
cd /path/to/openmpi-<version>/build
sudo make uninstall
make clean
cd ../..
rm -rf openmpi-<version>

# also ensure there's no existing openmpi install
sudo apt-get remove --purge openmpi*
which mpicc # should return nothing
which mpirun # should return nothing
ls /usr/local/lib/openmpi/ # should return nothing, or complain about nonexistent dir
ls /usr/local/lib # should have nothing to do with OpenMPI

# install CUDA toolkit from https://developer.nvidia.com/cuda-downloads
nvidia-smi # should print out some basic info about the CUDA driver. In my case: NVIDIA-SMI 550.144.03, CUDA Version: 12.4
nvcc --version # should print out basic info about NVIDIA compiler. In my case: Cuda compilation tools, release 12.8, V12.8.9

# set up env variables appropriately
echo "export LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib/openmpi:\$LD_LIBRARY_PATH" >> ~/.bashrc
source ~/.bashrc

# just in case, reboot

# check the paths are ok:
echo $LD_LIBRARY_PATH
# output for me: /usr/local/lib:/usr/local/lib/openmpi:/usr/local/cuda/lib64

# rebuild from scratch
wget https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.3.tar.gz
tar xf openmpi-5.0.3.tar.gz
cd openmpi-5.0.3
mkdir build
cd build
../configure --with-cuda=/usr/local/cuda --with-cuda-libdir=/usr/local/cuda/lib64/stubs | tee config.out
make -j$(nproc) all | tee make.out
sudo make install | tee install.out

# just in case, reboot (again)

vim mpi_check.c
# paste the contents of `mpi_check.c` from above
mpicc mpi_check.c -o mpi_check
mpirun -n 1 ./mpi_check # passes compile time check, fails run time check
./mpi_check # passes both checks

I've also tried this with OpenMPI v5.0.7, v5.0.6, with and without an install prefix, with and without --with-cuda-libdir, with and without the LD_LIBRARY_PATH thing. I keep getting the same behaviour with the ldd and ompi_info checks above, and the same weird thing where ./mpi_check passes both tests, but mpirun -n 1 ./mpi_check fails the runtime check. I suspect it must be something to do with the versions of the other stuff (Ubuntu 24.04 LTS, NVIDIA-SMI 550.144.03, NVCC V12.8.9).

Maybe I'm not removing old installs properly? If anyone has any other ideas / maybe a Docker container with a working installation, I would really appreciate it :)

@niklebedenko
Copy link
Author

niklebedenko commented Apr 2, 2025

For anyone stumbling across this with the same issue, I've made a Dockerfile which appears to do the trick. It does mean that you will have to run your code in a container (requiring Nvidia Container Toolkit to be able to attach gpus etc), but at least it builds CUDA-aware MPI correctly.

FROM nvidia/cuda:12.8.1-devel-ubuntu22.04

# Prevent interactive prompts during package installation
ENV DEBIAN_FRONTEND=noninteractive
ENV PATH=${PATH}:/usr/local/cuda/bin

# No clue why you need to install CUDA when you're already using a CUDA Docker,
# but otherwise you don't have `libcuda.so.1`.
RUN apt-get update && apt-get install -y \
    build-essential \
    wget \
    git \
    python3 \
    python3-pip \
    pkg-config \
    libevent-dev \
    file \
    cuda \
    && rm -rf /var/lib/apt/lists/*

# Install OpenMPI with CUDA support
RUN wget https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.3.tar.gz && \
    tar xzf openmpi-5.0.3.tar.gz && \
    cd openmpi-5.0.3 && \
    ./configure --prefix=/usr/local \
               --with-cuda=/usr/local/cuda \
               --with-cuda-libdir=/usr/local/cuda/lib64/stubs && \
    make -j$(nproc) && \
    make install && \
    ldconfig && \
    cd .. && \
    rm -rf openmpi-5.0.3 openmpi-5.0.3.tar.gz

# Set up environment variables for OpenMPI
ENV PATH=/usr/local/bin:$PATH
ENV LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
ENV OMPI_DIR=/usr/local
ENV OMPI_VERSION=5.0.3
ENV OMPI_ALLOW_RUN_AS_ROOT=1
ENV OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1

# Create a test directory
WORKDIR /mpi_check
COPY mpi_check.c .

# Compile the test program
RUN mpicc mpi_check.c -o mpi_check

# Add a simple test to verify the installation
RUN which mpirun && \
    mpirun --version && \
    ldd $(which mpirun) && \
    file $(which mpirun)

# Set the entrypoint
ENTRYPOINT ["/bin/bash"]

(Edit: This uses Ubuntu 22.04 + MPI v5.0.3, but I've since tried this with Ubuntu 24.04 + MPI v5.0.7 and it also works fine)

Installation steps:

  • Ensure you have Docker installed
  • Copy the Dockerfile into an empty dir.
  • Copy mpi_check.c (as above) into the same dir.
  • Build the container with docker build -t cuda_mpi . (will take a few minutes)
  • Open a session inside the container with docker run -it cuda_mpi
  • Execute mpirun -n 1 ./mpi_check inside the container.

Result:

# mpirun -n 1 ./mpi_check
Compile time check:
This MPI library has CUDA-aware support.
Run time check:
This MPI library has CUDA-aware support.

I have no idea why this works, but it does. I hope this helps someone else. Ubuntu 24.04 build really ought to be fixed, but in the meantime, use this workaround.

@ggouaillardet
Copy link
Contributor

@niklebedenko once you have manually built and installed the accelerator/cuda component, can you please one again run

/opt/openmpi/bin/mpirun -np 1 --mca accelerator_base_verbose 100 --mca accelerator cuda ./mpi_check

At first glance it does not make sense why the accelerator/cuda component was not built.
Can you please compress and upload your config.log?

Also, since we are all running out of ideas, can you please confirm

/opt/openmpi/bin/mpirun -np 1 nvidia-smi

works as expected (e.g. same output thant when run without mpirun)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants