-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconstruction Kernel Fusion 3: Fusing PCM with the Riemann Solvers #377
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bcaddy
force-pushed
the
dev-pcmFusion-3
branch
2 times, most recently
from
April 29, 2024 17:50
ca333fa
to
b0a50d4
Compare
Moved the struct to avoid circular includes with reconstructor files including reconstruction.h which included them.
The element-by-element test was brittle to small changes
This simple macro is used to wrap kernel names that have more than 1 template argument since the comma in the template arguments otherwise messes with the kernel launch syntax. It is part of the HIP runtime already.
This overload takes a hydro_utilities::Vector object
Added a version of PCM that can be called as a device function from within a Riemann solver
- Fuse PCM reconstruction and the HLLD riemann solver kernels - Make direction and reconstructor template arguments in HLLD solver. This should improve performance vs. having them as runtime arguments since the compiler can discard many paths at compile time - Update HLLD tests for new fused version of HLLD. Since the fusion also changed how data is loaded some of the tests (the negative density tests) are no longer useful and have been removed. The fiducial data for one other test was changed as well since it was borderline between two states and this change caused it to switch states. - reconstruction::Riemann_Thread_Guard function for riemann solvers with fused reconstruction. Added a test for same function.
Previously this function only worked in 3D. Not it works in 1D, 2D, and 3D.
In the fusion of the PCM reconstructor it created a few scope based bugs in the riemann solvers. This fixes them
The half time step riemann solve in VL3D was done in direction 0 three times instead of in all three directions. Fixed.
Also, fixed a bug in AutomaticLaunchParams that would let it set a threads per block number higher than what __launch_bounds__() specified. Now it queries the kernel for that number.
Thread guard was too lenient and caused out of bounds reads
evaneschneider
approved these changes
May 22, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
The primary purpose of this PR is to remove the PCM kernel in favor of a device function called within the Riemann solver kernels. This builds off of PRs #371 and #375 and will show changes from both those PRs and this PR until those PRs are merged into dev. Most of the relevant changes in this PR are in the following files:
There is some extra machinery in the Riemann solvers at the moment to deal with the fact that one reconstruction is fused and the rest aren't but that will go away once all reconstructions are fused.
The performance gain from fusing the PCM kernel into the Riemann solvers is ~7% in hydro builds and ~12.6% in MHD builds compared to the version of Cholla in PR #375 run_timing.log.
Other Changes
ifndef
that should have been anifdef
when warning that CUDA error checking was disabledHIP_KERNEL_NAME
macro to CUDA builds. This macro is part of the HIP runtime but is not present in the CUDA runtime. It's used in kernel launches to wrap kernel names that have more than one template parameter since the comma in the template arguments plays havoc with some internals of the HIP runtime.OUTPUT_ALWAYS
build macroAutomaticLaunchParams
that would let it set a threads per block number higher than what__launch_bounds__()
specified. Now it queries the kernel for that number and sets that as the maximum threads per block. This isn't causing any bugs at th moment but It did during some intermediate testing I was doing.