Refactor CMake logic for the CUDA compiler #3582

jngrad · 2020-03-12T20:07:37Z

Description of changes:

the CUDA compiler guessing mechanism was removed
the WITH_CUDA option is now opt-in
for non-standard CUDA compilers (Clang, HIP), the user has to provide an extra string option WITH_CUDA_COMPILER (any of "nvcc", "clang", "hip")
if the requested CUDA compiler is not found or is too old, CMake will fail
the minimal CUDA compiler version numbers are defined in the top-level CMakeLists.txt
the convoluted (and incorrect) CXX optimization flag deduction mechanism in build_cmake.sh was replaced with dedicated build types (Coverage and RelWithAssert)

codecov · 2020-03-12T20:35:30Z

Codecov Report

Merging #3582 into python will not change coverage.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           python   #3582   +/-   ##
======================================
  Coverage      87%     87%           
======================================
  Files         533     533           
  Lines       22959   22959           
======================================
  Hits        20159   20159           
  Misses       2800    2800

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58978df...58978df. Read the comment docs.

CMakeLists.txt

jngrad · 2020-03-13T14:23:06Z

cmake/FindCUDACompilerNVCC.cmake

+
+set(CUDA_NVCC_FLAGS_DEBUG "${CUDA_NVCC_FLAGS_DEBUG} -g")
+set(CUDA_NVCC_FLAGS_RELEASE "${CUDA_NVCC_FLAGS_RELEASE} -O3 -DNDEBUG")
+set(CUDA_NVCC_FLAGS_MINSIZEREL "${CUDA_NVCC_FLAGS_MINSIZEREL} -Os -DNDEBUG")


several issues here:

-Os is not a valid optimization level: nvcc fatal : 's': expected a number

there seems to be a difference between -O2, -Xptxas -O2, -Xcompiler -O2 (StackOverflow)

progress:

CUDA compiler flags are actually quite complicated. Also, device and host compiler optimization flags don't mix well.

We don't need to touch the host and device flags in build_cmake.sh. The logic is wrong anyway, and it is partially overridden by CMake in Debug builds.

I'm currently removing a lot of useless code in build_cmake.sh, but getting the CMake logic right will take me a few more days, I'm afraid. After that, we'll be able to experiment with CMake pseudo-interfaces for GPU targets.

I wasted two man hours investigating a Clang compiler error, because when we do

espresso/src/core/CMakeLists.txt

Line 147 in d67ab4a

add_gpu_library(EspressoCore SHARED ${EspressoCore_SRC} ${EspressoCuda_SRC})

we forward CUDA compiler flags to regular .cpp files:

espresso/CMakeLists.txt

Line 165 in d67ab4a

set_source_files_properties(${ARG_UNPARSED_ARGUMENTS} PROPERTIES LANGUAGE "CXX" COMPILE_FLAGS "${CUDA_NVCC_FLAGS}")

even though a few lines later there's a mechanism to set those flags to .cu files only:

espresso/CMakeLists.txt

Line 172 in d67ab4a

if(${file} MATCHES "\\.cu$")

I wasted one man hour because neither CMakeLists.txt nor build_cmake.sh throw an error when using the incompatible combination of CMake flags -DCUDA_NVCC_EXECUTABLE=$(which clang++) -DWITH_COVERAGE=ON.

My last PR introduced a regression: the boost version depends on the CUDA version, so CUDA must be loaded before boost...

Why do we override C++ flags only for release builds and not all other types of build???

espresso/src/core/CMakeLists.txt

Lines 78 to 79 in f291cf4

"$<$<AND:$<BOOL:${WITH_COVERAGE}>,$<CONFIG:Release>>:-g>"

"$<$<AND:$<BOOL:${WITH_COVERAGE}>,$<CONFIG:Release>>:-O0>"

Why -01 specifically? And why only for release builds?

espresso/CMakeLists.txt

Lines 347 to 348 in f291cf4

if(WITH_ASAN)

set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -g -O1")

FindCUDA didn‘t really have a concept of debug vs. release, so we manually set these flags to match the host compiler and get debuggable or optimized CUDA code as requested.

FindCUDA didn‘t really have a concept of debug vs. release

doesn't it since 3.0? CUDA_NVCC_FLAGS_<CONFIG>

HIP also does: HIP_<COMPILER>_FLAGS_<CONFIG>

I believe that didn't set the flags for both host and device, but that may no longer be true. The -g -O1 for WITH_ASAN is a compromise between good sanitizability and reasonable performance. In Debug mode, you have -O0 -g anyway.

Fixes a regression from 1e3fc3b: the Boost version depends on the CUDA version for Intel, therefore CUDA must be loaded first, but the CUDA version depends on the C++ version, so C++14 must be defined at the top of the file.

The user can now provide the name of the CUDA compiler as a CMake flag instead of modifying CMake variables to bypass the guessing mechanism. CMake fails if the selected compiler is not found. The FindCUDACompiler helper file was split into separate .cmake files.

The minimum CUDA version is 9.0 for NVCC and 8.0 for Clang. For the Clang compiler and the Clang-based HIP compiler, the minimum version that is tested in CI is 6.0.0.

The ESPResSo CMake project now has proper dependencies for every target in the source code, documentation and tests. It is no longer necessary to delete the build directory to start a fresh build.

Explicitly toggle GPU code on/off with WITH_CUDA and select the compiler with WITH_CUDA_COMPILER.

With the CUDA compiler guessing mechanism removed, the user must explicitly states if CUDA is available and which compiler to use.

In ROCm 3.0 and 3.1, environment variables for hipcc and hcc are overriden by incorrect paths (espressomd/docker#156). This causes CMake to generate an incorrect linking command for EspressoCore.so: in `/opt/rocm/bin/hipcc_cmake_linker_helper /opt/rocm -fPIC ...`, either path `/opt/rocm` is an empty string, or both the linker path and path `/opt/rocm` are empty strings. Calling `find_package()` twice with an overriden `HCC_PATH` fixes the linking command.

Knowing the CMake version is extremely useful when reviewing CMake logs attached in bug reports.

The C++ standard used for CUDA code is set in `CMAKE_CUDA_STANDARD`. Variable `CMAKE_CUDA_VERSION` was renamed to `MINIMAL_CUDA_VERSION` for clarity.

The nvcc `-O<N>` optimization flag can only take a number.

Cannot compile CUDA code with coverage enabled using Clang.

Rewrite CUDA flags based on the CXX flags: CMAKE_CXX_FLAGS_DEBUG = -O0 -g CMAKE_CXX_FLAGS_RELEASE = -O3 -DNDEBUG CMAKE_CXX_FLAGS_MINSIZEREL = -Os -DNDEBUG CMAKE_CXX_FLAGS_RELWITHDEBINFO = -O2 -g -DNDEBUG Add a COVERAGE build type that uses -O0 for host and -O3 for device. This replaces the logic in the CI script that had to touch `CMAKE_CXX_FLAGS` and `CUDA_NVCC_FLAGS`. The -O0 optimization flag for host avoids ending up with source code lines in the gcov output with neither hit or miss. According to `man gcov`: > compile your code without optimization if you plan to use gcov > because the optimization, by combining some lines of code into > one function, may not give you as much information as you need

Generate a warning for incorrect build types and override them with 'Release'. Move the `set(CMAKE_BUILD_TYPE CACHE)` declaration out of the conditional such that its help message always gets displayed in cmake/ccmake. List the possible values as properties to allow cycling in ccmake and cmake-gui (instead of manually typing them). Same thing for WITH_CUDA_COMPILER. This is achieved by creating a wrapper around the `option()` function that accepts an enum value (stored as a string) instead of a boolean value.

The same guard is used in the CMake logic for Python tests.

jngrad · 2020-04-04T16:39:54Z

Splitting the ${GPU_TARGET_NAME} = EspressoCore into two targets with different compiler flags doesn't seem to work even with PUBLIC: properties of EspressoCore are not inherited by EspressoCore_cxx and EspressoCore_cu:

  add_library(${GPU_TARGET_NAME}_cxx ${${GPU_TARGET_NAME}_sources_cxx})
  add_library(${GPU_TARGET_NAME}_cu  ${${GPU_TARGET_NAME}_sources_cu})
  target_link_libraries(${GPU_TARGET_NAME}_cu PUBLIC gpu_interface)
  add_library(${GPU_TARGET_NAME})
  set_target_properties(${GPU_TARGET_NAME} PROPERTIES LINKER_LANGUAGE "CXX")
  target_link_libraries(${GPU_TARGET_NAME} PUBLIC ${GPU_TARGET_NAME}_cxx ${GPU_TARGET_NAME}_cu)
  target_link_libraries(${GPU_TARGET_NAME} PRIVATE ${CUDA_LIBRARY} ${CUDART_LIBRARY})

You have called ADD_LIBRARY for library EspressoCore without any source files. This typically indicates a problem with your CMakeLists.txt file
[  1%] Building CXX object src/core/CMakeFiles/EspressoCore_cxx.dir/cells.cpp.o
In file included from /work/jgrad/espresso-fork-PR/src/core/cells.cpp:27:
In file included from /work/jgrad/espresso-fork-PR/src/core/cells.hpp:41:
In file included from /work/jgrad/espresso-fork-PR/src/core/CellStructure.hpp:25:
In file included from /work/jgrad/espresso-fork-PR/src/core/Cell.hpp:22:
/work/jgrad/espresso-fork-PR/src/core/Particle.hpp:22:10: fatal error: 'config.hpp' file not found
#include "config.hpp"
         ^~~~~~~~~~~~
[  3%] Building CXX object src/core/CMakeFiles/EspressoCore_cu.dir/actor/DipolarBarnesHut_cuda.cu.o
/work/jgrad/espresso-fork-PR/src/core/actor/DipolarBarnesHut_cuda.cu:24:10: fatal error: 'cuda_wrapper.hpp' file not found
#include "cuda_wrapper.hpp"
         ^~~~~~~~~~~~~~~~~~

jngrad · 2020-04-04T17:41:17Z

Oh right, EspressoConfig is PRIVATE. That explains everything. In fact, we can decouple the CUDA code from the C++ code using EspressoCore for C++ and EspressoCore_cu for CUDA, and only expose the bare minimum of libraries and header files to EspressoCore_cu:

target_link_libraries(EspressoCore_cu PRIVATE EspressoConfig shapes)
target_include_directories(EspressoCore_cu PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(EspressoCore PRIVATE EspressoCore_cu)

This actually compiles on Clang and passes the Python tests.

jngrad · 2020-04-04T22:54:31Z

Let me amend the previous statement: we can't separate CUDA from C++ sources files using two targets. If EspressoCore_cu is:

a shared library (58bdbc8): there's a relink issue: can't install ESPResSo outside the build dir

/usr/bin/python3: Relink `/lib/x86_64-linux-gnu/libsystemd.so.0' with `/lib/x86_64-linux-gnu/librt.so.1' for IFUNC symbol `clock_gettime'
/usr/bin/python3: Relink `/lib/x86_64-linux-gnu/libudev.so.1' with `/lib/x86_64-linux-gnu/librt.so.1' for IFUNC symbol `clock_gettime'

a static library (71937be):

Clang fails if -fPIC isn't placed judiciously in the list of compiler flags for EspressoCore

/usr/bin/ld: libEspressoCuda.a(DipolarBarnesHut_cuda.cu.o): relocation R_X86_64_32 against symbol `_Z20initializationKernelv' can not be used when making a shared object; recompile with -fPIC

in addition, there is a linking issue: can't install ESPResSo outside the build dir

ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory

an object library (bc89a3e): can't link EspressoConfig shapes nor inherit them: compilation fails

The non-portable $<COMPILE_LANGUAGE:language> evaluates to something non-binary: both $<COMPILE_LANGUAGE:CUDA> and $<NOT:$<COMPILE_LANGUAGE:CXX>> evaluate to false on CUDA files.

I love CMake.

@KaiSzuttor Do you have any suggestion? I can't think of another way to introduce a GPU INTERFACE library that would only affect the .cu files and not the .cpp files in target EspressoCore.

jngrad · 2020-04-09T16:56:43Z

@KaiSzuttor Found the solution: installing thrust 1.9.5 to fix the va_printf issue and doing export LD_LIBRARY_PATH=/work/jgrad/cuda-10.0-thrust195/lib64:$LD_LIBRARY_PATH to get shared libraries to find the correct CUDA libraries. I'll resume work on this PR.

fweik · 2020-04-11T22:37:09Z

CMakeLists.txt

 endif()

+add_library(cxx_interface INTERFACE)
+target_compile_options(cxx_interface INTERFACE ${cxx_interface_flags})


out of currriosity, why do you keep the list in cxx_interface_flags instead of directly using target_compile_options?

We actually need to pass these flags to both the compiler and linker, otherwise we get unresolved symbols (e.g. ASAN in clang:6.0 logfile).

In the line below, I pass the same list to target_link_libraries(), which defines linker flags. CMake 3.13 introduced target_link_options() to pass only linker flags, in an effort to make the intent clearer thanks to the name similarity with target_compile_options().

Only the -fsanitize= no? (c.f. https://gitlab.icp.uni-stuttgart.de/fweik/mdx/-/blob/master/CMakeLists.txt) Otherwise passing compiler options to the linker doesn't do anything, and is just confusing. Also you still wouldn't need the list, right?

Thanks for the hint! I wasn't sure if -fsanitize=... alone was sufficient, but it actually was.

Yeah, some of the sanitizers have a runtime component, which needs to be linked...

jngrad · 2020-04-12T00:10:59Z

The osx-cuda CI job is failing because of unresolved symbols in library EspressoCuda. These symbols are defined in EspressoCore. For example:

// lb_inertialess_tracers_cuda_interface.hpp
extern IBM_CUDA_ParticleDataInput *IBM_ParticleDataInput_host;
// lb_inertialess_tracers_cuda_interface.cpp
IBM_CUDA_ParticleDataInput *IBM_ParticleDataInput_host = nullptr;
// lb_inertialess_tracers_cuda.cu
    if (IBM_ParticleDataInput_host != NULL) {

Did I introduce a design flaw by creating a shared object EspressoCuda.so with the cuda code and EspressoCore.so for the C++ code? None of the other GPU build seems to mind. If it's not an issue, then we can ignore the CI failure: osx-cuda was removed in #3652.

The intel:19 CI failure is just a regression in the image that wasn't caught in docker CI.

fweik · 2020-04-12T11:31:08Z

@jngrad IIRC there are cyclic dependencies between the CUDA and the core code...

jngrad · 2020-04-16T21:15:58Z

@KaiSzuttor The PR is ready from my side. Let me now if you have additional changes to suggest. Otherwise, I'll rebase commits 58bdbc8 to 7025e65 to make the git history less chaotic, and resolve the merge conflict. The slowdown issues mentioned 2 weeks ago have been resolved and the EspressoCuda and EspressoCore are now a single target again to solve the cyclic dependency.

KaiSzuttor

Looks good to me besides the comments. Thanks @jngrad that looks much cleaner now.

CMakeLists.txt

Use interface libraries for compiler flags instead of populating variable with global scope that are injected in the compilation and linking commands of all libraries (e.g. CMAKE_CXX_FLAGS). Remove duplicated compile flags. Give meaningful names to variables in FindCUDA* CMake files. Document CMake policies and add CMP0025 to distinguish between Clang and AppleClang. Replace simple if/else blocks by generator expressions.

Move ROCm path patching logic in FindCUDACompilerHIP.cmake and check only the HIP version.

The double quotes were not removed by the shell interpreter.

Clang returns "version unknown" for unsupported CUDA libraries, or doesn't return a version string (depending on CMAKE_CXX_FLAGS), causing the CMake regex to store the complete Clang stdout in the CUDA_VERSION variable instead of a valid version number. This is now fixed, and the CUDA version is now shown as <major>.<minor>.

Points have been addressed

jngrad added CMake ApiChange labels Mar 12, 2020

jngrad added this to the Espresso 4.2 milestone Mar 12, 2020

jngrad changed the title ~~Refactor CMake logic for the CUDA compiler~~ WIP: Refactor CMake logic for the CUDA compiler Mar 12, 2020

KaiSzuttor reviewed Mar 13, 2020

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

KaiSzuttor reviewed Mar 13, 2020

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

jngrad commented Mar 13, 2020

View reviewed changes

jngrad force-pushed the refactor-cmake-with_gpu branch from 64e9a4c to 2c93112 Compare March 13, 2020 19:27

jngrad added 13 commits March 14, 2020 23:07

Define C++ version and load CUDA first

2d1f600

Fixes a regression from 1e3fc3b: the Boost version depends on the CUDA version for Intel, therefore CUDA must be loaded first, but the CUDA version depends on the C++ version, so C++14 must be defined at the top of the file.

Check the CUDA version for all compilers

f148cca

The minimum CUDA version is 9.0 for NVCC and 8.0 for Clang. For the Clang compiler and the Clang-based HIP compiler, the minimum version that is tested in CI is 6.0.0.

Document CUDA compiler selection mechanism

8008f8d

Document how to access CMake variables

57e1e99

Remove outdated CMake advice

64a13a5

The ESPResSo CMake project now has proper dependencies for every target in the source code, documentation and tests. It is no longer necessary to delete the build directory to start a fresh build.

New CUDA compiler selection mechanism in CI

6bac9e9

Explicitly toggle GPU code on/off with WITH_CUDA and select the compiler with WITH_CUDA_COMPILER.

Make the default value of WITH_CUDA option OFF

b1277d8

With the CUDA compiler guessing mechanism removed, the user must explicitly states if CUDA is available and which compiler to use.

Print CMake version during configuration

662d03f

Knowing the CMake version is extremely useful when reviewing CMake logs attached in bug reports.

Modern CMake: naming convention for CUDA variables

f8457a3

The C++ standard used for CUDA code is set in `CMAKE_CUDA_STANDARD`. Variable `CMAKE_CUDA_VERSION` was renamed to `MINIMAL_CUDA_VERSION` for clarity.

Fix nvcc compiler error

929e16c

The nvcc `-O<N>` optimization flag can only take a number.

Fix clang compiler error

4447570

Cannot compile CUDA code with coverage enabled using Clang.

jngrad force-pushed the refactor-cmake-with_gpu branch 2 times, most recently from 66fe6cf to 69d4962 Compare March 14, 2020 23:26

jngrad added 5 commits March 16, 2020 16:47

Remove obsolete CI options

df04e08

Update CMake documentation and license headers

6865fe9

Add oversubscription guard

ff2f98f

The same guard is used in the CMake logic for Python tests.

jngrad force-pushed the refactor-cmake-with_gpu branch from 69d4962 to ff2f98f Compare March 16, 2020 15:56

jngrad mentioned this pull request Apr 9, 2020

Update CI images #3642

Merged

fweik assigned KaiSzuttor Apr 9, 2020

This was referenced Apr 9, 2020

WIP: Sanitizer build with Ubuntu 20.04 #3636

Closed

Move clang-tidy, ASAN, UBSAN to clang-9 #3627

Closed

jngrad force-pushed the refactor-cmake-with_gpu branch from 8420864 to b6a485c Compare April 11, 2020 22:14

fweik reviewed Apr 11, 2020

View reviewed changes

jngrad force-pushed the refactor-cmake-with_gpu branch from b6a485c to 9912ad5 Compare April 11, 2020 23:29

KaiSzuttor previously requested changes Apr 17, 2020

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

CMakeLists.txt Outdated Show resolved Hide resolved

CMakeLists.txt Outdated Show resolved Hide resolved

CMakeLists.txt Outdated Show resolved Hide resolved

CMakeLists.txt Outdated Show resolved Hide resolved

Detect CUDA libraries installed via nvidia-cuda-toolkit

7336b0b

mkuron reviewed Apr 17, 2020

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

jngrad added 3 commits April 17, 2020 18:54

Simplify HIP detection

be61c3a

Move ROCm path patching logic in FindCUDACompilerHIP.cmake and check only the HIP version.

Fix Travis shell error

b798fab

The double quotes were not removed by the shell interpreter.

jngrad force-pushed the refactor-cmake-with_gpu branch from 7025e65 to 732b47d Compare April 17, 2020 17:02

jngrad changed the title ~~WIP: Refactor CMake logic for the CUDA compiler~~ Refactor CMake logic for the CUDA compiler Apr 17, 2020

Merge branch 'python' into refactor-cmake-with_gpu

b53e4c9

jngrad force-pushed the refactor-cmake-with_gpu branch from 732b47d to b53e4c9 Compare April 17, 2020 17:17

fweik approved these changes Apr 20, 2020

View reviewed changes

fweik added the automerge Merge with kodiak label Apr 20, 2020

kodiakhq bot merged commit 7f90057 into espressomd:python Apr 20, 2020

jngrad deleted the refactor-cmake-with_gpu branch January 18, 2022 12:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor CMake logic for the CUDA compiler #3582

Refactor CMake logic for the CUDA compiler #3582

jngrad commented Mar 12, 2020 •

edited

Loading

codecov bot commented Mar 12, 2020 •

edited

Loading

jngrad Mar 13, 2020

jngrad Mar 13, 2020

jngrad Mar 14, 2020

jngrad Mar 14, 2020

jngrad Mar 14, 2020

jngrad Mar 14, 2020

jngrad Mar 14, 2020

mkuron Mar 15, 2020

jngrad Mar 16, 2020

mkuron Mar 16, 2020 •

edited

Loading

jngrad commented Apr 4, 2020 •

edited

Loading

jngrad commented Apr 4, 2020

jngrad commented Apr 4, 2020

jngrad commented Apr 9, 2020

fweik Apr 11, 2020

jngrad Apr 11, 2020

jngrad Apr 11, 2020

fweik Apr 11, 2020

jngrad Apr 11, 2020

fweik Apr 12, 2020 •

edited

Loading

jngrad commented Apr 12, 2020

fweik commented Apr 12, 2020

jngrad commented Apr 16, 2020

KaiSzuttor left a comment

	"$<$<AND:$<BOOL:${WITH_COVERAGE}>,$<CONFIG:Release>>:-g>"
	"$<$<AND:$<BOOL:${WITH_COVERAGE}>,$<CONFIG:Release>>:-O0>"

	if(WITH_ASAN)
	set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -g -O1")

Refactor CMake logic for the CUDA compiler #3582

Refactor CMake logic for the CUDA compiler #3582

Conversation

jngrad commented Mar 12, 2020 • edited Loading

codecov bot commented Mar 12, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkuron Mar 16, 2020 • edited Loading

Choose a reason for hiding this comment

jngrad commented Apr 4, 2020 • edited Loading

jngrad commented Apr 4, 2020

jngrad commented Apr 4, 2020

jngrad commented Apr 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fweik Apr 12, 2020 • edited Loading

Choose a reason for hiding this comment

jngrad commented Apr 12, 2020

fweik commented Apr 12, 2020

jngrad commented Apr 16, 2020

KaiSzuttor left a comment

Choose a reason for hiding this comment

jngrad commented Mar 12, 2020 •

edited

Loading

codecov bot commented Mar 12, 2020 •

edited

Loading

mkuron Mar 16, 2020 •

edited

Loading

jngrad commented Apr 4, 2020 •

edited

Loading

fweik Apr 12, 2020 •

edited

Loading