Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: CUDA Host-Side -O3 with CMake #562

Merged
merged 1 commit into from
Jul 13, 2021

Conversation

ax3l
Copy link
Member

@ax3l ax3l commented Jul 12, 2021

-O3 was only set for CUDA if we ran CMake at least twice. This was
an intialization logic bug and should be set from the beginning, but
we initialized CUDA too late / applied the transform too early.

We now change the default build type to Release to avoid changing
defaults and avoid further surprises.

We will add optional -g for optimized builds again on a later PR.

Same as: ECP-WarpX/WarpX#2078 and similar implications as in #71. With the herein fixed issue for CUDA builds, the host-side default optimization was taken (GCC: -O0 but NVCC device-side is still defaulting to -O3). We saw ~30% performance regressions for GPU runs with WarpX with this unoptimized host code path for CUDA translation units.

  • Small enough (< few 100s of lines), otherwise it should probably be split into smaller PRs
  • Tested (describe the tests in the PR description)
  • Runs on GPU (basic: the code compiles and run well with the new module)
  • Contains an automated test (checksum and/or comparison with theory)
  • Documented: all elements (classes and their members, functions, namespaces, etc.) are documented
  • Constified (All that can be const is const)
  • Code is clean (no unwanted comments, )
  • Style and code conventions are respected at the bottom of https://github.com/Hi-PACE/hipace
  • Proper label and GitHub project, if applicable

@ax3l ax3l added bug Something isn't working GPU Related to GPU acceleration performance optimization, benchmark, profiling, etc. labels Jul 12, 2021
list(TRANSFORM CMAKE_C_FLAGS_RELWITHDEBINFO REPLACE "-O2" "-O3")
list(TRANSFORM CMAKE_CXX_FLAGS_RELWITHDEBINFO REPLACE "-O2" "-O3")
# FIXME: due to the "AMReX inits CUDA first" logic we will first see this with -O2 in output
list(TRANSFORM CMAKE_CUDA_FLAGS_RELWITHDEBINFO REPLACE "-O2" "-O3")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was called before we set enable_language(cuda).
The result was an empty CUDA flag list, which as a result compiled the host with -O0 (GCC) and the device with -O3 (NVCC) for CUDA language translation units (TUs):

nvcc [no -O flags] [includes] [file].cpp

When calling cmake -S <src_dir> or ccmake again on the build directory, the CUDA language was enabled and then this was initialized correctly to -O3 ... for nvcc-compiled TUs.

`-O3` was only set for CUDA if we ran CMake at least twice. This was
an intialization logic bug and should be set from the beginning, but
we initialized CUDA too late / applied the transform too early.

We now change the default build type to `Release` to avoid changing
defaults and avoid further surprises.

We will add optional `-g` for optimized builds again on a later PR.
Copy link
Member

@SeverinDiederichs SeverinDiederichs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix! On my laptop, I get something like a 5% speedup with this PR. To be tested on an A100

@SeverinDiederichs SeverinDiederichs merged commit babbaa2 into Hi-PACE:development Jul 13, 2021
@ax3l ax3l deleted the fix-cmakeCUDAO3 branch July 14, 2021 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working GPU Related to GPU acceleration performance optimization, benchmark, profiling, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants