Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: CUDA Host-Side -O3 with CMake #2078

Merged
merged 2 commits into from
Jul 13, 2021

Conversation

ax3l
Copy link
Member

@ax3l ax3l commented Jul 12, 2021

-O3 was only set for CUDA if we ran CMake at least twice. This was an intialization logic bug and should be set from the beginning, but we initialized CUDA too late / applied the transform too early.

We now change the default build type to Release to avoid changing defaults and avoid further surprises.

Similar implications as in #1203: here the host-side default optimization was taken (GCC: -O0 but NVCC device-side is still defaulting to -O3). We saw ~30% performance regressions for GPU runs with this unoptimized host code path for CUDA translation units.

Thanks to @NeilZaim for raising this! Thanks to @WeiqunZhang for debugging this with me!

To Do

  • test again: chicken-egg problem with AMReX summary print on first pass and need-to-update after CUDA was initialized
  • change default build type to Release to avoid any future issues
  • follow-up: add potentially an option that adds / controls -g explicitly for any build mode

@ax3l ax3l added bug Something isn't working backend: cuda Specific to CUDA execution (GPUs) Performance optimization install bug: affects latest release Bug also exists in latest release version labels Jul 12, 2021
@ax3l ax3l requested a review from WeiqunZhang July 12, 2021 16:50
@ax3l ax3l changed the title Fix: CUDA -O3 with CMake [WIP] Fix: CUDA -O3 with CMake Jul 12, 2021
`-O3` was only set for CUDA if we ran CMake at least twice. This was
an intialization logic bug and should be set from the beginning, but
we initialized CUDA too late / applied the transform too early.

We now change the default build type to `Release` to avoid changing
defaults and avoid further surprises.
@ax3l ax3l force-pushed the fix-cmakeCUDAO3 branch from 23d4aee to cb2ec52 Compare July 12, 2021 21:20
@ax3l ax3l changed the title [WIP] Fix: CUDA -O3 with CMake Fix: CUDA -O3 with CMake Jul 12, 2021
@ax3l ax3l changed the title Fix: CUDA -O3 with CMake Fix: CUDA Host-Side -O3 with CMake Jul 12, 2021
list(TRANSFORM CMAKE_C_FLAGS_RELWITHDEBINFO REPLACE "-O2" "-O3")
list(TRANSFORM CMAKE_CXX_FLAGS_RELWITHDEBINFO REPLACE "-O2" "-O3")
# FIXME: due to the "AMReX inits CUDA first" logic we will first see this with -O2 in output
list(TRANSFORM CMAKE_CUDA_FLAGS_RELWITHDEBINFO REPLACE "-O2" "-O3")
Copy link
Member Author

@ax3l ax3l Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was called before we set enable_language(cuda).
The result was an empty CUDA flag list, which as a result compiled the host with -O0 (GCC) and the device with -O3 (NVCC) for CUDA language translation units (TUs):

nvcc [no -O flags] [includes] [file].cpp

When calling cmake -S <src_dir> or ccmake again on the build directory, the CUDA language was enabled and then this was initialized correctly to -O3 ... for nvcc-compiled TUs.

@ax3l ax3l force-pushed the fix-cmakeCUDAO3 branch from df57212 to 6f53dbf Compare July 12, 2021 22:13
@RemiLehe RemiLehe merged commit 0379aad into ECP-WarpX:development Jul 13, 2021
@ax3l ax3l deleted the fix-cmakeCUDAO3 branch July 14, 2021 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: cuda Specific to CUDA execution (GPUs) bug: affects latest release Bug also exists in latest release version bug Something isn't working install Performance optimization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants