Fix: CUDA Host-Side -O3 with CMake #562
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
-O3
was only set for CUDA if we ran CMake at least twice. This wasan intialization logic bug and should be set from the beginning, but
we initialized CUDA too late / applied the transform too early.
We now change the default build type to
Release
to avoid changingdefaults and avoid further surprises.
We will add optional
-g
for optimized builds again on a later PR.Same as: ECP-WarpX/WarpX#2078 and similar implications as in #71. With the herein fixed issue for CUDA builds, the host-side default optimization was taken (GCC:
-O0
but NVCC device-side is still defaulting to-O3
). We saw ~30% performance regressions for GPU runs with WarpX with this unoptimized host code path for CUDA translation units.const
isconst
)