QUDA Build With CMake

QUDA and CMake

Starting with version 0.9, QUDA must be built using cmake. To build QUDA with CMake you need to have cmake 3.15 (QUDA or later for version 1.1, and 3.18 or later with the current develop. Try

cmake --version

to make sure you have cmake and your version is recent enough. If you do not have a CMake on your system, please follow the following instructions, else skip to the Building QUDA using CMake section.

For multi-GPU builds with OpenMPI, we recommend using at least version 4.0.x, and compiling OpenMPI with a recent version of UCX (at least 1.10) for CUDA-aware MPI support.

Obtaining CMake

You are likely going to build QUDA on a remote machine with a module system. Try

module avail cmake

to see if the module loader has a CMake option. If it does not have a CMake module loaded, please ask the system administrator to add the module. In the meantime, you can download the source code here. Once you've gone through the build steps of CMake, prepend your PATH so that your environment can access the binaries.

Building QUDA using CMake

It is recommend to build QUDA in a separate folder (out-of_source). This has the advantage that you don't need to have different copies of the QUDA source code on your disk to build separate configurations (e.g. for different GPU architectures) or need to trigger a full rebuild in your local QUDA copy to build it for a different architecture. For example, suppose you have a machine with two GPU partitions. One has NVIDIA P100, and the other has NVIDIA V100. One can download one copy of the QUDA source code (typically named quda) and then have two build directories (say, build_p100 and build_v100). The advantage here is that when the source code is updated or modified, one need only change the source code once, then update each build as required.

cmake Vs ccmake

After downloading QUDA from github, create a build directory and cd into it (the name is arbitrary - here we use build):

mkdir build
cd build

There are two methods one can use to build. The first is to use ccmake:

ccmake

ccmake ../quda

NOTE, for this to work, you may first need to run

cmake [-DQUDA_TARGET_TYPE=<TARGET>] ../quda

and then, launch with ccmake. This will bring up a text based GUI for all the QUDA CMake options. If you take this route, please take note that pressing the t key in the GUI will bring up extra CMAKE options. This, at first, can seem a little daunting, but the majority of the options you see here are automatically populated. Options are grouped into two main parts: CMAKE options (revealed by hitting t) and QUDA options, each prepended accordingly. CMAKE options are more to do with HOW to build QUDA, and QUDA options are more to do with WHAT parts to build.

The CMAKE options CMAKE_CUDA_HOST_COMPILER, CMAKE_CXX_COMPILER and CMAKE_C_COMPILER dictate which host C++ and C and compiler to use. If you want to use a specific compiler, you must set these manually.

The QUDA options, such as QUDA_DIRAC_CLOVER instruct CMake to build the wilson clover kernels. If you wish to use them, you must set them ON. If you do not wish to use a specific part of QUDA, it is strongly recommended that you turn OFF that QUDA option. That way QUDA will not compile unwanted parts of the code.

After changing the options to your preferences, press c to configure. As this will force CMake to find further tools / libraries (like locate mpi if you build using mpi). New variables may pop up here and may require you to run multiple times. As soon as the Press [g] to generate and exit option is shown at the bottom of the screen you may use it and cmake will generate your configuration.

cmake

If using the text GUI is not to your liking, then you can configure QUDA directly using cmake. For example,

cmake ../quda -DQUDA_MPI=ON
cmake .

This will configure QUDA with the default options, except QUDA_MPI will be turned ON. Make sure you used the correct architecture for your GPUs in first configuring step. Default architecture is sm_70 but you may want to specify different architectures such as -DQUDA_GPU_ARCH=sm_60 for a Pascal GPU or -DQUDA_GPU_ARCH=sm_80 for A100. The second cmake . (and no other arguments) command is often required to ensure that all configuration is completed. Without this second step, some configuration may not be complete (this is equivalent to ccmake requiring multiple configuration passes.

Building

In either case, once QUDA has been configured, you can build with

make -j N

where N is the number of available CPU cores, or alternatively just make -j, when oversubscribe the CPU cores. This latter approach has typically the shortest time to compile.

Reducing QUDA's build time

Due to QUDA's extensive use of templates, compiling QUDA can take a long time to complete. For this reason, we have provided a variety of CMake options to constrain the compilation trajectory, which can dramatically reduce compilation time.

First and foremost, only enable the Dirac operators that you intend to use. By default all Dirac operators are enabled, so need to disable these. The following is the list of Dirac operators that are present in QUDA, these can be disabled using cmake with -D or can be set directly with ccmake, e.g., --DQUDA_DIRAC_WILSON=OFF would disable Wilson fermions.

QUDA_DIRAC_WILSON - Wilson Dirac operators
QUDA_DIRAC_CLOVER - Wilson-clover operators (implies QUDA_DIRAC_WILSON)
QUDA_DIRAC_TWISTED_MASS - Twisted-mass Dirac operators
QUDA_DIRAC_NDEG_TWISTED_MASS - Non-degenerate twisted-mass operators
QUDA_DIRAC_TWISTED_CLOVER - Twisted-clover Dirac operators (implies QUDA_DIRAC_CLOVER)
QUDA_DIRAC_CLOVER_HASENBUSCH** - Specialized operator for Hasenbusch preconditioning (implies QUDA_DIRAC_CLOVER and QUDA_DIRAC_TWISTED_CLOVER)
QUDA_DIRAC_STAGGERED - Naive staggered and improved (HISQ) staggered operators
QUDA_DIRAC_DOMAIN_WALL - Shamir Domain-wall (4-d and 5-d preconditioned) and Möbius operators

To simplify this process, we have also added the flag -DQUDA_DIRAC_DEFAULT_OFF for use on the command line, which by default disables all Dirac operators. This can then be selectively overridden by enabling specific operators. For example, for a build that only supports staggered fermions, one can use

cmake -DQUDA_DIRAC_DEFAULT_OFF=ON -DQUDA_DIRAC_STAGGERED=ON

The following are advanced options that can be specified directly to cmake with -D or can be set using ccmake under advanced options:

QUDA_PRECISION=n - where n is 4-bit number that specifies which precisions we will enable (8 - double, 4 - single, 2 - half, 1 - quarter). Default value is 14, which corresponds to double/single/half enabled and quarter disabled.
QUDA_RECONSTRUCT=n - where n is 3-bit number that specifies which reconstructs we will enable (4 - reconstruct-no, 2 - reconstruct-12/13, 1 - reconstruct-8/9). Default value is 7 which enables all reconstruct types.
QUDA_FAST_COMPILE_REDUCE=ON** - where this option only compiles reduction kernels with block-size = 32, dramatically accelerating of the reduction kernels (reduce_quda.cu, multi_reduce_quda.cu, etc.) Additionally, the multi-blas kernels will not employ the warp-shfl optimization. This will affect performance, so should be used for fast debugging or development builds, hence the default value is OFF.
QUDA_FAST_COMPILE_DSLASH=ON** - disables some dslash specialization optimization at the cost of performance. The performance penalty is up to 20% (depending on the action), however the compilation overhead will approximately halve.
QUDA_MAX_MULTI_BLAS_N=1 - disables some kernel fusion optimization for BLAS routines

** - signifies this option is post QUDA 1.0

By default, QUDA builds as a shared library and takes advantage of rpath to avoid needing to set LD_LIBRARY_PATH in most cases. If, for some reason, you would prefer a static library build you can set QUDA_BUILD_SHAREDLIB=OFF. We do not recommend this because it creates a large spike in link time and binary sizes.

Improving build times with Ninja

You can use Ninja instead of make to improve parallel builds by specifying it as cmake generator in the initial cmake run

cmake -GNinja ...

and then build using

ninja

or just use

cmake --build .

Improving link times with Mold

A further reduction of the overall build time can be achieved by using an alternative linker like LLVM's lld or mold. For using mold you can just use

mold -run ninja

Specifying a separate Eigen installation (if absolutely necessary)

By default, QUDA will automatically download Eigen (version 3.3.9 at time of writing) as part of the build process. As part of this, the CMake configuration scripts bake in a checksum to verify the download. While neither of these are issues the majority of the time, we do provide a way via cmake to specify a local copy of Eigen which bypasses the download (for example, for a machine without an external internet connection) and checksum verification (in the rare cases where the the downloaded tarball is updated and the checksum changes).

As an example, one can download Eigen from https://gitlab.com/libeigen/eigen/-/archive/3.3.9/eigen-3.3.9.tar.bz2 , untar it, and then specify the installation location via:

cmake -DQUDA_DOWNLOAD_EIGEN=OFF -DEIGEN_INCLUDE_DIR=${HOME}/eigen-3.3.9/ [...]

Update EIGEN_INCLUDE_DIR as appropriate for your download location.

Building QUDA with clang as CUDA compiler

While QUDA can be build using clang as compiler this is still considered early and might not work for all possible options and the performance may not perform as expected!

The development version of QUDA now supports building QUDA with clang as CUDA compiler. This requires

CMake >= 3.18
Clang >= 10 and a compatible CUDA toolkit (see https://www.llvm.org/docs/CompileCudaWithLLVM.html for details)

To enable the use of clang as CUDA compiler execute the initial cmake call with the options

-DCMAKE_CUDA_COMPILER=clang++ -DCMAKE_CXX_COMPILER=clang++

You might need to specify the full path to clang++ and append a version number. If you need to specify a specific CUDA toolkit or have it installed in an uncommon location you can do that with

-DCUDAToolkit_ROOT=/some/path

Note: The CUDA Toolkit detection is done by FindCUDAToolkit and its documentation has more details on determining the CUDA Toolkit.

QUDA calls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly