Skip to content

HOWTO enable CUDA

Andrea Lani edited this page May 20, 2019 · 30 revisions

Installation of CUDA-enabled PETSc

The last version of PETSc including CUDA bindings that has been tested in COOLFluiD is "3.9.3". This is also the default version inside $YOUR_COOLFluiD/tools/scripts/install-coolfluid-deps.pl.

If you want to try your luck with the latest development version of PETSc (which could have some API incompatibilities with the current PETSc linear system solver interface in COOLFluiD), you can download it from PETSc, rename it to petsc-3.9.3.tar.gz and copy it inside $YOUR_COOLFluiD/packages. Otherwise (recommended solution!) you can just use the 3.9.3 version which is already inside that folder. In order to install an optimized PETSc version with CUDA bindings, run the following command from inside $YOUR_COOLFluiD/tools/scripts:

./install-coolfluid-deps.pl --install=petsc --install-petsc-dir=$PETSC_INSTALLDIR --install-mpi-dir=$MPI_INSTALLDIR --tmp-dir=$TMP_DIR --cuda-dir=$CUDA_DIR --install-dir=$PETSC_INSTALLDIR --debug=0 CXXOPTFLAGS="-fPIC -O3" COPTFLAGS="-fPIC -O3" FOPTFLAGS="-fPIC -O3"


Configuration for activating CUDA-bindings

The following three lines must appear in $YOUR_COOLFluiD/coolfluid.conf, where $CUDA_DIR is substituted with the actual full path to your CUDA installation:

cudac = $CUDA_DIR/bin/nvcc

cuda_dir = $CUDA_DIR

withcuda = 1

When configuring COOLFluiD, ./prepare.pl --build=cuda (optimized with debugging) or ./prepare.pl --build=cudarelease (more optimized w/o debugging) must be used. Those configuration modes will automatically set some tricky compilation flags.

NOTE: Only --build=cudarelease can be used on MAC OS X, since --build=cuda will allow for compiling the code but lead to run-time errors when allocating memory on the device. This issue is due to the assertion definition which is enabled only in debugging mode.


Installation and configuration for using PARALUTION

PARALUTION is a very powerful linear system solver package specifically designed for GPU architectures. Unfortunately, the free version only supports running on one GPU. In our case, PARALUTION has shown a much better performance than PETSc concerning the CUDA implementation of GMRES and ILU preconditioner.

In order to use PARALUTION, you have to first download it from PARALUTION website, install it and then add the following two lines in $YOUR_COOLFluiD/coolfluid.conf:

with_paralution = 1

paralution_dir = $PARALUTION_INSTALL_DIR

where $PARALUTION_INSTALL_DIR is the full path to your installation directory for PARALUTION.


NOTE: For now, only few solvers have been ported to GPU using CUDA, namely:

  • the Finite Volume MHD solver (2D/3D);
  • the Finite Volume Maxwell solver (only 2D);
  • the Finite Volume MultiFluidMHD solver (only 2D w/o diffusive terms);
  • the Finite Volume DOM radiation solver;
  • the PETSc wrapper;
  • the PARALUTION wrapper.

The following regression testcases show examples of how to configure such GPU-enabled solvers:

MHD 2D nozzle (explicit)

MHD 2D nozzle (implicit using PETSc)

MultiFluidMHD 2D circular polarized waves (implicit using PARALUTION)

In particular, any CFcase using PETSC and FiniteVolume can be easily converted to solve the linear system on GPUs instead of on CPUs, by setting the flag UseGPU = true as in the following example:

Simulator.SubSystem.LinearSystemSolver = PETSC

Simulator.SubSystem.LSSNames = BwdEulerLSS # alias of PETSC needed for configuration

Simulator.SubSystem.BwdEulerLSS.Data.UseGPU = true

Simulator.SubSystem.CellCenterFVM.JacobianSparsity = FVMCellCenteredNoBlock

Contacts

Home

Gallery

HOWTO

2019 NASA Ames presentation

2014 NASA Ames presentation


parallel computations of complex problems

Parallel mesh decomposition


Scalability in large scale simulation

High-performance computing (strong scaling on NASA Pleiades for 1/2 billion-cells 3D grid)


Modeling of high-speed reacting flows and plasma

Chemically reacting flows and plasma


Numerical Schlieren of turbulent flow on wing computed with RDS-LES

Complex all-speed flow simulations

Clone this wiki locally