Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpirun with OpenCL #12813

Closed
TheFloatingBrain opened this issue Sep 16, 2024 · 5 comments
Closed

mpirun with OpenCL #12813

TheFloatingBrain opened this issue Sep 16, 2024 · 5 comments

Comments

@TheFloatingBrain
Copy link

TheFloatingBrain commented Sep 16, 2024

Background information

I am trying to run MEEP on my GPU, I have both an Nvidia Card and an integrated Radeon card. Meep has the feature Parallel Meep, where scripts written using meep are launched through mpirun. I have successfully done this on the CPU. However I would really like to speed things up using the GPU, and I would like to avoid using the proprietary nvidia driver for now on on linux, its high maintenance and taints my kernel. I would like to try and run meep on my Nvidia GPU using the open source Mesa OpenCL driver instead. I saw OpenMPI does seem to support OpenCL [0], [1], [2], [3] I would like to avoid edits to actual meep code, Parallel Meep seems to be chunked, which from what I read is a necessity to run the code on the GPU?

It does seem possible to interface with the gpu through mpirun, conda gave me this message

On Linux, Open MPI is built with UCX support but it is disabled by default.                                                                                                                     
To enable it, first install UCX (conda install -c conda-forge ucx).                                                                                                                             
Afterwards, set the environment variables                                                                                                                                                       
OMPI_MCA_pml=ucx OMPI_MCA_osc=ucx
before launching your MPI processes.
Equivalently, you can set the MCA parameters in the command line:
mpiexec --mca pml ucx --mca osc ucx ...


On Linux, Open MPI is built with CUDA awareness but it is disabled by default.
To enable it, please set the environment variable
OMPI_MCA_opal_cuda_support=true
before launching your MPI processes.
Equivalently, you can set the MCA parameter in the command line:
mpiexec --mca opal_cuda_support 1 ...
Note that you might also need to set UCX_MEMTYPE_CACHE=n for CUDA awareness via
UCX. Please consult UCX documentation for further details.

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Using dnf on fedora... conda? (See above)

Please describe the system on which you are running

  • Operating system/version: Fedora 40
  • Computer hardware: Laptop with integrated AMD Radeon (Ryzen 7 Series APU), and Nvidia RTX 3000 Series GPU
  • Network type: Local

Details of the problem

Im sorry this is a bit of a noob question. I described the background information above, I simply am having difficulty figuring out how to use mpirun with opencl. Does it have such an interface? Do I need to recompile meep? If so what should I do specifically? Is what I am trying to do possible?

P.s getting both GPU's and the CPU in the game would be great as well, but that might be a separate question

@wenduwan
Copy link
Contributor

@TheFloatingBrain Could you please provide your Open MPI version? Pasting the output of ompi_info could be a good start.

@TheFloatingBrain
Copy link
Author

@wenduwan Thanks for the reply, I have to enter the following commands before I can run this command:

> source /etc/profile.d/modules.sh
> module load mpi/openmpi-x86_64

According to this (the command is not recognized otherwise)

> ompi_info
                 Package: Open MPI mockbuild@02dc1f9e2ab145fdb212b01bdd462369
                          Distribution
                Open MPI: 5.0.2
  Open MPI repo revision: v5.0.2
   Open MPI release date: Feb 06, 2024
                 MPI API: 3.1.0
            Ident string: 5.0.2
                  Prefix: /usr/lib64/openmpi
 Configured architecture: x86_64-pc-linux-gnu
           Configured by: mockbuild
           Configured on: Mon Mar  4 00:00:00 UTC 2024
          Configure host: 02dc1f9e2ab145fdb212b01bdd462369
  Configure command line: '--prefix=/usr/lib64/openmpi'
                          '--mandir=/usr/share/man/openmpi-x86_64'
                          '--includedir=/usr/include/openmpi-x86_64'
                          '--sysconfdir=/etc/openmpi-x86_64'
                          '--disable-silent-rules' '--enable-builtin-atomics'
                          '--enable-ipv6' '--enable-mpi-java'
                          '--enable-mpi1-compatibility' '--enable-sphinx'
                          '--with-prrte=external' '--with-sge'
                          '--with-valgrind' '--enable-memchecker'
                          '--with-hwloc=/usr' '--with-libevent=external'
                          '--with-pmix=external'
                Built by: mockbuild
                Built on: Mon Mar  4 00:00:00 UTC 2024
              Built host: 02dc1f9e2ab145fdb212b01bdd462369
              C bindings: yes
             Fort mpif.h: yes (all)
            Fort use mpi: yes (full: ignore TKR)
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: yes
 Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
                          limitations in the gfortran compiler and/or Open
                          MPI, does not support the following: array
                          subsections, direct passthru (where possible) to
                          underlying Open MPI's C functionality
  Fort mpi_f08 subarrays: no
           Java bindings: yes
  Wrapper compiler rpath: runpath
              C compiler: gcc
     C compiler absolute: /bin/gcc
  C compiler family name: GNU
      C compiler version: 14.0.1
            C++ compiler: g++
   C++ compiler absolute: /bin/g++
           Fort compiler: gfortran
       Fort compiler abs: /bin/gfortran
         Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
   Fort 08 assumed shape: yes
      Fort optional args: yes
          Fort INTERFACE: yes
    Fort ISO_FORTRAN_ENV: yes
       Fort STORAGE_SIZE: yes
      Fort BIND(C) (all): yes
      Fort ISO_C_BINDING: yes
 Fort SUBROUTINE BIND(C): yes
       Fort TYPE,BIND(C): yes
 Fort T,BIND(C,name="a"): yes
            Fort PRIVATE: yes
           Fort ABSTRACT: yes
       Fort ASYNCHRONOUS: yes
          Fort PROCEDURE: yes
         Fort USE...ONLY: yes
           Fort C_FUNLOC: yes
 Fort f08 using wrappers: yes
         Fort MPI_SIZEOF: yes
             C profiling: yes
   Fort mpif.h profiling: yes
  Fort use mpi profiling: yes
   Fort use mpi_f08 prof: yes
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, Event lib: yes)
           Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
              dl support: yes
   Heterogeneous support: no
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
            IPv6 support: yes
          MPI extensions: affinity, cuda, ftmpi, rocm, shortfloat
 Fault Tolerance support: yes
          FT MPI support: yes
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
         MCA accelerator: null (MCA v2.1.0, API v1.0.0, Component v5.0.2)
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v5.0.2)
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v5.0.2)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v5.0.2)
                 MCA btl: self (MCA v2.1.0, API v3.3.0, Component v5.0.2)
                 MCA btl: ofi (MCA v2.1.0, API v3.3.0, Component v5.0.2)
                 MCA btl: sm (MCA v2.1.0, API v3.3.0, Component v5.0.2)
                 MCA btl: tcp (MCA v2.1.0, API v3.3.0, Component v5.0.2)
                 MCA btl: uct (MCA v2.1.0, API v3.3.0, Component v5.0.2)
                 MCA btl: usnic (MCA v2.1.0, API v3.3.0, Component v5.0.2)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v5.0.2)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v5.0.2)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v5.0.2)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v5.0.2)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v5.0.2)
          MCA memchecker: valgrind (MCA v2.1.0, API v2.0.0, Component v5.0.2)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v5.0.2)
               MCA mpool: hugepage (MCA v2.1.0, API v3.1.0, Component v5.0.2)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v5.0.2)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v5.0.2)
           MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v5.0.2)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v5.0.2)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v5.0.2)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v5.0.2)
                MCA smsc: cma (MCA v2.1.0, API v1.0.0, Component v5.0.2)
             MCA threads: pthreads (MCA v2.1.0, API v1.0.0, Component v5.0.2)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v5.0.2)
                 MCA bml: r2 (MCA v2.1.0, API v2.1.0, Component v5.0.2)
                MCA coll: adapt (MCA v2.1.0, API v2.4.0, Component v5.0.2)
                MCA coll: basic (MCA v2.1.0, API v2.4.0, Component v5.0.2)
                MCA coll: han (MCA v2.1.0, API v2.4.0, Component v5.0.2)
                MCA coll: inter (MCA v2.1.0, API v2.4.0, Component v5.0.2)
                MCA coll: libnbc (MCA v2.1.0, API v2.4.0, Component v5.0.2)
                MCA coll: self (MCA v2.1.0, API v2.4.0, Component v5.0.2)
                MCA coll: sync (MCA v2.1.0, API v2.4.0, Component v5.0.2)
                MCA coll: tuned (MCA v2.1.0, API v2.4.0, Component v5.0.2)
                MCA coll: ftagree (MCA v2.1.0, API v2.4.0, Component v5.0.2)
                MCA coll: monitoring (MCA v2.1.0, API v2.4.0, Component
                          v5.0.2)
                MCA coll: sm (MCA v2.1.0, API v2.4.0, Component v5.0.2)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v5.0.2)
                MCA fbtl: pvfs2 (MCA v2.1.0, API v2.0.0, Component v5.0.2)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v5.0.2)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v5.0.2)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v5.0.2)
               MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v5.0.2)
                  MCA fs: pvfs2 (MCA v2.1.0, API v2.0.0, Component v5.0.2)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v5.0.2)
                MCA hook: comm_method (MCA v2.1.0, API v1.0.0, Component
                          v5.0.2)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v5.0.2)
                  MCA io: romio341 (MCA v2.1.0, API v2.0.0, Component v5.0.2)
                 MCA mtl: ofi (MCA v2.1.0, API v2.0.0, Component v5.0.2)
                 MCA mtl: psm2 (MCA v2.1.0, API v2.0.0, Component v5.0.2)
                  MCA op: avx (MCA v2.1.0, API v1.0.0, Component v5.0.2)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v5.0.2)
                 MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
                          v5.0.2)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v5.0.2)
                 MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v5.0.2)
                MCA part: persist (MCA v2.1.0, API v4.0.0, Component v5.0.2)
                 MCA pml: cm (MCA v2.1.0, API v2.1.0, Component v5.0.2)
                 MCA pml: monitoring (MCA v2.1.0, API v2.1.0, Component
                          v5.0.2)
                 MCA pml: ob1 (MCA v2.1.0, API v2.1.0, Component v5.0.2)
                 MCA pml: ucx (MCA v2.1.0, API v2.1.0, Component v5.0.2)
                 MCA pml: v (MCA v2.1.0, API v2.1.0, Component v5.0.2)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v5.0.2)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v5.0.2)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v5.0.2)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v5.0.2)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v5.0.2)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v5.0.2)

@TheFloatingBrain
Copy link
Author

TheFloatingBrain commented Sep 22, 2024

Looking through my links, some of those resources are old, has OpenCL support been removed?

It seems hwloc supports an OpenCL plugin, does OpenMPI?

open-mpi/hwloc#641
https://www-lb.open-mpi.org/projects/hwloc/doc/v2.11.1/a00356.php

I feel I should also perhaps explain what I am looking for, AFIK open mpi uses CUDA as a sort of "implementation" and calls out to the CUDA driver (or ROCm driver) to do actually code execution with generalized tasks. what I am wondering is: can the same thing be done with OpenCL driver or is there an extension to do so? Please correct me if my understanding is mistaken.

I imagine something similar happens on CPU's the underlying implementation might be pthreads or something.

@TheFloatingBrain
Copy link
Author

Please see Issue #12831

@TheFloatingBrain
Copy link
Author

Would be curious to know, can one send/recv from OpenCL device buffers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants