Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CMake logic for the CUDA compiler #3582

Merged
merged 31 commits into from
Apr 20, 2020

Conversation

jngrad
Copy link
Member

@jngrad jngrad commented Mar 12, 2020

Description of changes:

  • the CUDA compiler guessing mechanism was removed
  • the WITH_CUDA option is now opt-in
  • for non-standard CUDA compilers (Clang, HIP), the user has to provide an extra string option WITH_CUDA_COMPILER (any of "nvcc", "clang", "hip")
  • if the requested CUDA compiler is not found or is too old, CMake will fail
  • the minimal CUDA compiler version numbers are defined in the top-level CMakeLists.txt
  • the convoluted (and incorrect) CXX optimization flag deduction mechanism in build_cmake.sh was replaced with dedicated build types (Coverage and RelWithAssert)

@jngrad jngrad added this to the Espresso 4.2 milestone Mar 12, 2020
@codecov
Copy link

codecov bot commented Mar 12, 2020

Codecov Report

Merging #3582 into python will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff           @@
##           python   #3582   +/-   ##
======================================
  Coverage      87%     87%           
======================================
  Files         533     533           
  Lines       22959   22959           
======================================
  Hits        20159   20159           
  Misses       2800    2800           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58978df...58978df. Read the comment docs.

@jngrad jngrad changed the title Refactor CMake logic for the CUDA compiler WIP: Refactor CMake logic for the CUDA compiler Mar 12, 2020
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved

set(CUDA_NVCC_FLAGS_DEBUG "${CUDA_NVCC_FLAGS_DEBUG} -g")
set(CUDA_NVCC_FLAGS_RELEASE "${CUDA_NVCC_FLAGS_RELEASE} -O3 -DNDEBUG")
set(CUDA_NVCC_FLAGS_MINSIZEREL "${CUDA_NVCC_FLAGS_MINSIZEREL} -Os -DNDEBUG")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

several issues here:

  • -Os is not a valid optimization level: nvcc fatal : 's': expected a number
  • there seems to be a difference between -O2, -Xptxas -O2, -Xcompiler -O2 (StackOverflow)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

progress:

  • CUDA compiler flags are actually quite complicated. Also, device and host compiler optimization flags don't mix well.
  • We don't need to touch the host and device flags in build_cmake.sh. The logic is wrong anyway, and it is partially overridden by CMake in Debug builds.

I'm currently removing a lot of useless code in build_cmake.sh, but getting the CMake logic right will take me a few more days, I'm afraid. After that, we'll be able to experiment with CMake pseudo-interfaces for GPU targets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasted two man hours investigating a Clang compiler error, because when we do

add_gpu_library(EspressoCore SHARED ${EspressoCore_SRC} ${EspressoCuda_SRC})

we forward CUDA compiler flags to regular .cpp files:
set_source_files_properties(${ARG_UNPARSED_ARGUMENTS} PROPERTIES LANGUAGE "CXX" COMPILE_FLAGS "${CUDA_NVCC_FLAGS}")

even though a few lines later there's a mechanism to set those flags to .cu files only:
if(${file} MATCHES "\\.cu$")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasted one man hour because neither CMakeLists.txt nor build_cmake.sh throw an error when using the incompatible combination of CMake flags -DCUDA_NVCC_EXECUTABLE=$(which clang++) -DWITH_COVERAGE=ON.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My last PR introduced a regression: the boost version depends on the CUDA version, so CUDA must be loaded before boost...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we override C++ flags only for release builds and not all other types of build???

"$<$<AND:$<BOOL:${WITH_COVERAGE}>,$<CONFIG:Release>>:-g>"
"$<$<AND:$<BOOL:${WITH_COVERAGE}>,$<CONFIG:Release>>:-O0>"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why -01 specifically? And why only for release builds?

espresso/CMakeLists.txt

Lines 347 to 348 in f291cf4

if(WITH_ASAN)
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -g -O1")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FindCUDA didn‘t really have a concept of debug vs. release, so we manually set these flags to match the host compiler and get debuggable or optimized CUDA code as requested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FindCUDA didn‘t really have a concept of debug vs. release

doesn't it since 3.0? CUDA_NVCC_FLAGS_<CONFIG>

HIP also does: HIP_<COMPILER>_FLAGS_<CONFIG>

Copy link
Member

@mkuron mkuron Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that didn't set the flags for both host and device, but that may no longer be true. The -g -O1 for WITH_ASAN is a compromise between good sanitizability and reasonable performance. In Debug mode, you have -O0 -g anyway.

Fixes a regression from 1e3fc3b: the Boost version depends on
the CUDA version for Intel, therefore CUDA must be loaded first,
but the CUDA version depends on the C++ version, so C++14 must be
defined at the top of the file.
The user can now provide the name of the CUDA compiler as a CMake
flag instead of modifying CMake variables to bypass the guessing
mechanism. CMake fails if the selected compiler is not found. The
FindCUDACompiler helper file was split into separate .cmake files.
The minimum CUDA version is 9.0 for NVCC and 8.0 for Clang. For the
Clang compiler and the Clang-based HIP compiler, the minimum version
that is tested in CI is 6.0.0.
The ESPResSo CMake project now has proper dependencies for every
target in the source code, documentation and tests. It is no longer
necessary to delete the build directory to start a fresh build.
Explicitly toggle GPU code on/off with WITH_CUDA and select the
compiler with WITH_CUDA_COMPILER.
With the CUDA compiler guessing mechanism removed, the user must
explicitly states if CUDA is available and which compiler to use.
In ROCm 3.0 and 3.1, environment variables for hipcc and hcc are
overriden by incorrect paths (espressomd/docker#156). This causes
CMake to generate an incorrect linking command for EspressoCore.so:
in `/opt/rocm/bin/hipcc_cmake_linker_helper /opt/rocm -fPIC ...`,
either path `/opt/rocm` is an empty string, or both the linker path
and path `/opt/rocm` are empty strings. Calling `find_package()`
twice with an overriden `HCC_PATH` fixes the linking command.
Knowing the CMake version is extremely useful when reviewing CMake
logs attached in bug reports.
The C++ standard used for CUDA code is set in `CMAKE_CUDA_STANDARD`.
Variable `CMAKE_CUDA_VERSION` was renamed to `MINIMAL_CUDA_VERSION`
for clarity.
The nvcc `-O<N>` optimization flag can only take a number.
Cannot compile CUDA code with coverage enabled using Clang.
@jngrad jngrad force-pushed the refactor-cmake-with_gpu branch 2 times, most recently from 66fe6cf to 69d4962 Compare March 14, 2020 23:26
Rewrite CUDA flags based on the CXX flags:

CMAKE_CXX_FLAGS_DEBUG = -O0 -g
CMAKE_CXX_FLAGS_RELEASE = -O3 -DNDEBUG
CMAKE_CXX_FLAGS_MINSIZEREL = -Os -DNDEBUG
CMAKE_CXX_FLAGS_RELWITHDEBINFO = -O2 -g -DNDEBUG

Add a COVERAGE build type that uses -O0 for host and -O3 for
device. This replaces the logic in the CI script that had to touch
`CMAKE_CXX_FLAGS` and `CUDA_NVCC_FLAGS`. The -O0 optimization flag
for host avoids ending up with source code lines in the gcov output
with neither hit or miss. According to `man gcov`:

> compile your code without optimization if you plan to use gcov
> because the optimization, by combining some lines of code into
> one function, may not give you as much information as you need
Generate a warning for incorrect build types and override them with
'Release'. Move the `set(CMAKE_BUILD_TYPE CACHE)` declaration out
of the conditional such that its help message always gets displayed
in cmake/ccmake. List the possible values as properties to allow
cycling in ccmake and cmake-gui (instead of manually typing them).
Same thing for WITH_CUDA_COMPILER. This is achieved by creating a
wrapper around the `option()` function that accepts an enum value
(stored as a string) instead of a boolean value.
The same guard is used in the CMake logic for Python tests.
@jngrad
Copy link
Member Author

jngrad commented Apr 4, 2020

Splitting the ${GPU_TARGET_NAME} = EspressoCore into two targets with different compiler flags doesn't seem to work even with PUBLIC: properties of EspressoCore are not inherited by EspressoCore_cxx and EspressoCore_cu:

  add_library(${GPU_TARGET_NAME}_cxx ${${GPU_TARGET_NAME}_sources_cxx})
  add_library(${GPU_TARGET_NAME}_cu  ${${GPU_TARGET_NAME}_sources_cu})
  target_link_libraries(${GPU_TARGET_NAME}_cu PUBLIC gpu_interface)
  add_library(${GPU_TARGET_NAME})
  set_target_properties(${GPU_TARGET_NAME} PROPERTIES LINKER_LANGUAGE "CXX")
  target_link_libraries(${GPU_TARGET_NAME} PUBLIC ${GPU_TARGET_NAME}_cxx ${GPU_TARGET_NAME}_cu)
  target_link_libraries(${GPU_TARGET_NAME} PRIVATE ${CUDA_LIBRARY} ${CUDART_LIBRARY})
You have called ADD_LIBRARY for library EspressoCore without any source files. This typically indicates a problem with your CMakeLists.txt file
[  1%] Building CXX object src/core/CMakeFiles/EspressoCore_cxx.dir/cells.cpp.o
In file included from /work/jgrad/espresso-fork-PR/src/core/cells.cpp:27:
In file included from /work/jgrad/espresso-fork-PR/src/core/cells.hpp:41:
In file included from /work/jgrad/espresso-fork-PR/src/core/CellStructure.hpp:25:
In file included from /work/jgrad/espresso-fork-PR/src/core/Cell.hpp:22:
/work/jgrad/espresso-fork-PR/src/core/Particle.hpp:22:10: fatal error: 'config.hpp' file not found
#include "config.hpp"
         ^~~~~~~~~~~~
[  3%] Building CXX object src/core/CMakeFiles/EspressoCore_cu.dir/actor/DipolarBarnesHut_cuda.cu.o
/work/jgrad/espresso-fork-PR/src/core/actor/DipolarBarnesHut_cuda.cu:24:10: fatal error: 'cuda_wrapper.hpp' file not found
#include "cuda_wrapper.hpp"
         ^~~~~~~~~~~~~~~~~~

@jngrad
Copy link
Member Author

jngrad commented Apr 4, 2020

Oh right, EspressoConfig is PRIVATE. That explains everything. In fact, we can decouple the CUDA code from the C++ code using EspressoCore for C++ and EspressoCore_cu for CUDA, and only expose the bare minimum of libraries and header files to EspressoCore_cu:

target_link_libraries(EspressoCore_cu PRIVATE EspressoConfig shapes)
target_include_directories(EspressoCore_cu PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(EspressoCore PRIVATE EspressoCore_cu)

This actually compiles on Clang and passes the Python tests.

@jngrad
Copy link
Member Author

jngrad commented Apr 4, 2020

Let me amend the previous statement: we can't separate CUDA from C++ sources files using two targets. If EspressoCore_cu is:

  • a shared library (58bdbc8): there's a relink issue: can't install ESPResSo outside the build dir
    /usr/bin/python3: Relink `/lib/x86_64-linux-gnu/libsystemd.so.0' with `/lib/x86_64-linux-gnu/librt.so.1' for IFUNC symbol `clock_gettime'
    /usr/bin/python3: Relink `/lib/x86_64-linux-gnu/libudev.so.1' with `/lib/x86_64-linux-gnu/librt.so.1' for IFUNC symbol `clock_gettime'
    
  • a static library (71937be):
    • Clang fails if -fPIC isn't placed judiciously in the list of compiler flags for EspressoCore
      /usr/bin/ld: libEspressoCuda.a(DipolarBarnesHut_cuda.cu.o): relocation R_X86_64_32 against symbol `_Z20initializationKernelv' can not be used when making a shared object; recompile with -fPIC
      
    • in addition, there is a linking issue: can't install ESPResSo outside the build dir
      ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory
      
  • an object library (bc89a3e): can't link EspressoConfig shapes nor inherit them: compilation fails

The non-portable $<COMPILE_LANGUAGE:language> evaluates to something non-binary: both $<COMPILE_LANGUAGE:CUDA> and $<NOT:$<COMPILE_LANGUAGE:CXX>> evaluate to false on CUDA files.

I love CMake.

@KaiSzuttor Do you have any suggestion? I can't think of another way to introduce a GPU INTERFACE library that would only affect the .cu files and not the .cpp files in target EspressoCore.

@jngrad
Copy link
Member Author

jngrad commented Apr 9, 2020

@KaiSzuttor Found the solution: installing thrust 1.9.5 to fix the va_printf issue and doing export LD_LIBRARY_PATH=/work/jgrad/cuda-10.0-thrust195/lib64:$LD_LIBRARY_PATH to get shared libraries to find the correct CUDA libraries. I'll resume work on this PR.

CMakeLists.txt Outdated
endif()

add_library(cxx_interface INTERFACE)
target_compile_options(cxx_interface INTERFACE ${cxx_interface_flags})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of currriosity, why do you keep the list in cxx_interface_flags instead of directly using target_compile_options?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually need to pass these flags to both the compiler and linker, otherwise we get unresolved symbols (e.g. ASAN in clang:6.0 logfile).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the line below, I pass the same list to target_link_libraries(), which defines linker flags. CMake 3.13 introduced target_link_options() to pass only linker flags, in an effort to make the intent clearer thanks to the name similarity with target_compile_options().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the -fsanitize= no? (c.f. https://gitlab.icp.uni-stuttgart.de/fweik/mdx/-/blob/master/CMakeLists.txt) Otherwise passing compiler options to the linker doesn't do anything, and is just confusing. Also you still wouldn't need the list, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint! I wasn't sure if -fsanitize=... alone was sufficient, but it actually was.

Copy link
Contributor

@fweik fweik Apr 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, some of the sanitizers have a runtime component, which needs to be linked...

@jngrad
Copy link
Member Author

jngrad commented Apr 12, 2020

The osx-cuda CI job is failing because of unresolved symbols in library EspressoCuda. These symbols are defined in EspressoCore. For example:

// lb_inertialess_tracers_cuda_interface.hpp
extern IBM_CUDA_ParticleDataInput *IBM_ParticleDataInput_host;
// lb_inertialess_tracers_cuda_interface.cpp
IBM_CUDA_ParticleDataInput *IBM_ParticleDataInput_host = nullptr;
// lb_inertialess_tracers_cuda.cu
    if (IBM_ParticleDataInput_host != NULL) {

Did I introduce a design flaw by creating a shared object EspressoCuda.so with the cuda code and EspressoCore.so for the C++ code? None of the other GPU build seems to mind. If it's not an issue, then we can ignore the CI failure: osx-cuda was removed in #3652.

The intel:19 CI failure is just a regression in the image that wasn't caught in docker CI.

@fweik
Copy link
Contributor

fweik commented Apr 12, 2020

@jngrad IIRC there are cyclic dependencies between the CUDA and the core code...

@jngrad
Copy link
Member Author

jngrad commented Apr 16, 2020

@KaiSzuttor The PR is ready from my side. Let me now if you have additional changes to suggest. Otherwise, I'll rebase commits 58bdbc8 to 7025e65 to make the git history less chaotic, and resolve the merge conflict. The slowdown issues mentioned 2 weeks ago have been resolved and the EspressoCuda and EspressoCore are now a single target again to solve the cyclic dependency.

Copy link
Member

@KaiSzuttor KaiSzuttor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me besides the comments. Thanks @jngrad that looks much cleaner now.

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
Use interface libraries for compiler flags instead of populating
variable with global scope that are injected in the compilation and
linking commands of all libraries (e.g. CMAKE_CXX_FLAGS). Remove
duplicated compile flags. Give meaningful names to variables in
FindCUDA* CMake files. Document CMake policies and add CMP0025 to
distinguish between Clang and AppleClang. Replace simple if/else
blocks by generator expressions.
Move ROCm path patching logic in FindCUDACompilerHIP.cmake and
check only the HIP version.
The double quotes were not removed by the shell interpreter.
@jngrad jngrad changed the title WIP: Refactor CMake logic for the CUDA compiler Refactor CMake logic for the CUDA compiler Apr 17, 2020
Clang returns "version unknown" for unsupported CUDA libraries,
or doesn't return a version string (depending on CMAKE_CXX_FLAGS),
causing the CMake regex to store the complete Clang stdout in the
CUDA_VERSION variable instead of a valid version number. This is
now fixed, and the CUDA version is now shown as <major>.<minor>.
@fweik fweik dismissed KaiSzuttor’s stale review April 20, 2020 18:56

Points have been addressed

@fweik fweik added the automerge Merge with kodiak label Apr 20, 2020
@kodiakhq kodiakhq bot merged commit 7f90057 into espressomd:python Apr 20, 2020
@jngrad jngrad deleted the refactor-cmake-with_gpu branch January 18, 2022 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants