Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconstruction Kernel Fusion 3: Fusing PCM with the Riemann Solvers #377

Merged
merged 30 commits into from
May 23, 2024

Conversation

bcaddy
Copy link
Collaborator

@bcaddy bcaddy commented Feb 27, 2024

Summary

The primary purpose of this PR is to remove the PCM kernel in favor of a device function called within the Riemann solver kernels. This builds off of PRs #371 and #375 and will show changes from both those PRs and this PR until those PRs are merged into dev. Most of the relevant changes in this PR are in the following files:

  • pcm_cuda.cu and .h
  • reconstruction.h
  • All Riemann solvers

There is some extra machinery in the Riemann solvers at the moment to deal with the fact that one reconstruction is fused and the rest aren't but that will go away once all reconstructions are fused.

The performance gain from fusing the PCM kernel into the Riemann solvers is ~7% in hydro builds and ~12.6% in MHD builds compared to the version of Cholla in PR #375 run_timing.log.

Other Changes

  • Fixed a ifndef that should have been an ifdef when warning that CUDA error checking was disabled
  • Added a HIP_KERNEL_NAME macro to CUDA builds. This macro is part of the HIP runtime but is not present in the CUDA runtime. It's used in kernel launches to wrap kernel names that have more than one template parameter since the comma in the template arguments plays havoc with some internals of the HIP runtime.
  • Remove reference to deprecated OUTPUT_ALWAYS build macro
  • Fixed a bug in AutomaticLaunchParams that would let it set a threads per block number higher than what __launch_bounds__() specified. Now it queries the kernel for that number and sets that as the maximum threads per block. This isn't causing any bugs at th moment but It did during some intermediate testing I was doing.

@bcaddy bcaddy force-pushed the dev-pcmFusion-3 branch from efab027 to 98822cb Compare March 8, 2024 16:37
@bcaddy bcaddy changed the base branch from dev to dev-fusedReconstruction March 13, 2024 15:23
@bcaddy bcaddy changed the base branch from dev-fusedReconstruction to dev March 13, 2024 18:52
@bcaddy bcaddy force-pushed the dev-pcmFusion-3 branch 2 times, most recently from ca333fa to b0a50d4 Compare April 29, 2024 17:50
bcaddy added 22 commits April 29, 2024 14:30
Moved the struct to avoid circular includes with reconstructor files
including reconstruction.h which included them.
The element-by-element test was brittle to small changes
This simple macro is used to wrap kernel names that have more than 1
template argument since the comma in the template arguments otherwise
messes with the kernel launch syntax. It is part of the HIP runtime
already.
This overload takes a hydro_utilities::Vector object
Added a version of PCM that can be called as a device function from
within a Riemann solver
- Fuse PCM reconstruction and the HLLD riemann solver kernels
- Make direction and reconstructor template arguments in HLLD solver.
  This should improve performance vs. having them as runtime arguments
  since the compiler can discard many paths at compile time
- Update HLLD tests for new fused version of HLLD. Since the fusion
  also changed how data is loaded some of the tests (the negative
  density tests) are no longer useful and have been removed. The
  fiducial data for one other test was changed as well since it was
  borderline between two states and this change caused it to switch
  states.
- reconstruction::Riemann_Thread_Guard function for riemann solvers
  with fused reconstruction. Added a test for same function.
Previously this function only worked in 3D. Not it works in 1D, 2D, and
3D.
In the fusion of the PCM reconstructor it created a few scope based bugs
in the riemann solvers. This fixes them
The half time step riemann solve in VL3D was done in direction 0 three
times instead of in all three directions. Fixed.
bcaddy added 5 commits April 29, 2024 14:30
Also, fixed a bug in AutomaticLaunchParams that would let it set a
threads per block number higher than what __launch_bounds__() specified.
Now it queries the kernel for that number.
Thread guard was too lenient and caused out of bounds reads
@evaneschneider evaneschneider marked this pull request as ready for review May 22, 2024 18:52
@evaneschneider evaneschneider merged commit 1e71914 into cholla-hydro:dev May 23, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants