Fix: CUDA Host-Side -O3 with CMake #562

ax3l · 2021-07-12T22:08:51Z

-O3 was only set for CUDA if we ran CMake at least twice. This was
an intialization logic bug and should be set from the beginning, but
we initialized CUDA too late / applied the transform too early.

We now change the default build type to Release to avoid changing
defaults and avoid further surprises.

We will add optional -g for optimized builds again on a later PR.

Same as: ECP-WarpX/WarpX#2078 and similar implications as in #71. With the herein fixed issue for CUDA builds, the host-side default optimization was taken (GCC: -O0 but NVCC device-side is still defaulting to -O3). We saw ~30% performance regressions for GPU runs with WarpX with this unoptimized host code path for CUDA translation units.

Small enough (< few 100s of lines), otherwise it should probably be split into smaller PRs
Tested (describe the tests in the PR description)
Runs on GPU (basic: the code compiles and run well with the new module)
Contains an automated test (checksum and/or comparison with theory)
Documented: all elements (classes and their members, functions, namespaces, etc.) are documented
Constified (All that can be const is const)
Code is clean (no unwanted comments, )
Style and code conventions are respected at the bottom of https://github.com/Hi-PACE/hipace
Proper label and GitHub project, if applicable

ax3l · 2021-07-12T22:12:03Z

cmake/HiPACEFunctions.cmake

-        list(TRANSFORM CMAKE_C_FLAGS_RELWITHDEBINFO REPLACE "-O2" "-O3")
-        list(TRANSFORM CMAKE_CXX_FLAGS_RELWITHDEBINFO REPLACE "-O2" "-O3")
-        # FIXME: due to the "AMReX inits CUDA first" logic we will first see this with -O2 in output
-        list(TRANSFORM CMAKE_CUDA_FLAGS_RELWITHDEBINFO REPLACE "-O2" "-O3")


This line was called before we set enable_language(cuda).
The result was an empty CUDA flag list, which as a result compiled the host with -O0 (GCC) and the device with -O3 (NVCC) for CUDA language translation units (TUs):

nvcc [no -O flags] [includes] [file].cpp

When calling cmake -S <src_dir> or ccmake again on the build directory, the CUDA language was enabled and then this was initialized correctly to -O3 ... for nvcc-compiled TUs.

`-O3` was only set for CUDA if we ran CMake at least twice. This was an intialization logic bug and should be set from the beginning, but we initialized CUDA too late / applied the transform too early. We now change the default build type to `Release` to avoid changing defaults and avoid further surprises. We will add optional `-g` for optimized builds again on a later PR.

SeverinDiederichs

Thanks for this fix! On my laptop, I get something like a 5% speedup with this PR. To be tested on an A100

ax3l added bug Something isn't working GPU Related to GPU acceleration performance optimization, benchmark, profiling, etc. labels Jul 12, 2021

ax3l commented Jul 12, 2021

View reviewed changes

ax3l force-pushed the fix-cmakeCUDAO3 branch from 2d12ea7 to 7458428 Compare July 12, 2021 22:12

ax3l requested review from MaxThevenet and SeverinDiederichs July 12, 2021 22:29

SeverinDiederichs approved these changes Jul 13, 2021

View reviewed changes

SeverinDiederichs merged commit babbaa2 into Hi-PACE:development Jul 13, 2021

ax3l deleted the fix-cmakeCUDAO3 branch July 14, 2021 00:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: CUDA Host-Side -O3 with CMake #562

Fix: CUDA Host-Side -O3 with CMake #562

ax3l commented Jul 12, 2021

ax3l Jul 12, 2021

SeverinDiederichs left a comment

Fix: CUDA Host-Side -O3 with CMake #562

Fix: CUDA Host-Side -O3 with CMake #562

Conversation

ax3l commented Jul 12, 2021

ax3l Jul 12, 2021

Choose a reason for hiding this comment

SeverinDiederichs left a comment

Choose a reason for hiding this comment