Fix: CUDA Host-Side -O3 with CMake #2078

ax3l · 2021-07-12T16:50:11Z

-O3 was only set for CUDA if we ran CMake at least twice. This was an intialization logic bug and should be set from the beginning, but we initialized CUDA too late / applied the transform too early.

We now change the default build type to Release to avoid changing defaults and avoid further surprises.

Similar implications as in #1203: here the host-side default optimization was taken (GCC: -O0 but NVCC device-side is still defaulting to -O3). We saw ~30% performance regressions for GPU runs with this unoptimized host code path for CUDA translation units.

Thanks to @NeilZaim for raising this! Thanks to @WeiqunZhang for debugging this with me!

To Do

test again: chicken-egg problem with AMReX summary print on first pass and need-to-update after CUDA was initialized
change default build type to Release to avoid any future issues
follow-up: add potentially an option that adds / controls -g explicitly for any build mode

`-O3` was only set for CUDA if we ran CMake at least twice. This was an intialization logic bug and should be set from the beginning, but we initialized CUDA too late / applied the transform too early. We now change the default build type to `Release` to avoid changing defaults and avoid further surprises.

ax3l · 2021-07-12T22:02:45Z

cmake/WarpXFunctions.cmake

-        list(TRANSFORM CMAKE_C_FLAGS_RELWITHDEBINFO REPLACE "-O2" "-O3")
-        list(TRANSFORM CMAKE_CXX_FLAGS_RELWITHDEBINFO REPLACE "-O2" "-O3")
-        # FIXME: due to the "AMReX inits CUDA first" logic we will first see this with -O2 in output
-        list(TRANSFORM CMAKE_CUDA_FLAGS_RELWITHDEBINFO REPLACE "-O2" "-O3")


This line was called before we set enable_language(cuda).
The result was an empty CUDA flag list, which as a result compiled the host with -O0 (GCC) and the device with -O3 (NVCC) for CUDA language translation units (TUs):

nvcc [no -O flags] [includes] [file].cpp

When calling cmake -S <src_dir> or ccmake again on the build directory, the CUDA language was enabled and then this was initialized correctly to -O3 ... for nvcc-compiled TUs.

ax3l added bug Something isn't working backend: cuda Specific to CUDA execution (GPUs) Performance optimization install bug: affects latest release Bug also exists in latest release version labels Jul 12, 2021

ax3l requested a review from WeiqunZhang July 12, 2021 16:50

ax3l changed the title ~~Fix: CUDA -O3 with CMake~~ [WIP] Fix: CUDA -O3 with CMake Jul 12, 2021

ax3l force-pushed the fix-cmakeCUDAO3 branch from 23d4aee to cb2ec52 Compare July 12, 2021 21:20

ax3l changed the title ~~[WIP] Fix: CUDA -O3 with CMake~~ Fix: CUDA -O3 with CMake Jul 12, 2021

ax3l changed the title ~~Fix: CUDA -O3 with CMake~~ Fix: CUDA Host-Side -O3 with CMake Jul 12, 2021

WeiqunZhang approved these changes Jul 12, 2021

View reviewed changes

ax3l commented Jul 12, 2021

View reviewed changes

ax3l mentioned this pull request Jul 12, 2021

Fix: CUDA Host-Side -O3 with CMake Hi-PACE/hipace#562

Merged

9 tasks

Docs: Update Default Build Type (Release)

6f53dbf

ax3l force-pushed the fix-cmakeCUDAO3 branch from df57212 to 6f53dbf Compare July 12, 2021 22:13

RemiLehe merged commit 0379aad into ECP-WarpX:development Jul 13, 2021

ax3l deleted the fix-cmakeCUDAO3 branch July 14, 2021 00:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: CUDA Host-Side -O3 with CMake #2078

Fix: CUDA Host-Side -O3 with CMake #2078

ax3l commented Jul 12, 2021 •

edited

Loading

ax3l Jul 12, 2021 •

edited

Loading

Fix: CUDA Host-Side -O3 with CMake #2078

Fix: CUDA Host-Side -O3 with CMake #2078

Conversation

ax3l commented Jul 12, 2021 • edited Loading

To Do

ax3l Jul 12, 2021 • edited Loading

Choose a reason for hiding this comment

ax3l commented Jul 12, 2021 •

edited

Loading

ax3l Jul 12, 2021 •

edited

Loading