New Discrete Sine Transformation #534

AlexanderSinn · 2021-06-16T18:46:18Z

This PR introduces an alternative GPU Discrete Sine Transformation for PoissonSolverDirichlet. Instead of R2C fft that is applied to an array 4 times the original problem size, a C2R fft is used that has about the same size as the problem, however in exchange some pre- and postprocessing is required. This makes the new dst much faster for large resolutions, but also slightly slower for low resolutions as more GPU kernels have to be launched.

Use hipace.use_small_dst = 1 to use the new dst.

old.txt
new.txt

This test is from a fp64 GPU simulation with n_cell = 2047 2047 4096. The new AnyDST::Execute() only takes about half the time compared to the old one while using 500MB less GPU memory. The new dst calculation is very accrued as can be seen by the relative B field error being almost the same as with the old dst.

The E-Field result along z of another Simulation with 1023 1023 10000 Cells shows again that between the old and new DSTs there is no observable difference in fp64, while in fp32 a small deviation can be seen when zooming in, however both transformations are still about equally distant from the fp64 result.

Currently there seems to be a performance issue with ToComplex() where for low resolutions its first call in Execute() takes significantly more time than the second one.

Small enough (< few 100s of lines), otherwise it should probably be split into smaller PRs
Tested (describe the tests in the PR description)
Runs on GPU (basic: the code compiles and run well with the new module)
Contains an automated test (checksum and/or comparison with theory)
Documented: all elements (classes and their members, functions, namespaces, etc.) are documented
Constified (All that can be const is const)
Code is clean (no unwanted comments, )
Style and code conventions are respected at the bottom of https://github.com/Hi-PACE/hipace
Proper label and GitHub project, if applicable

SeverinDiederichs

Great PR, well done! In my tests I get a speedup > 2 for the DST.

See minor comments below

SeverinDiederichs · 2021-06-22T09:54:14Z

src/fields/fft_poisson_solver/fft/WrapCuDST.cpp

@@ -1,13 +1,16 @@
 #include "AnyDST.H"
 #include "CuFFTUtils.H"
 #include "utils/HipaceProfilerWrapper.H"
+#include "utils/Constants.H"


Suggested change

#include "utils/Constants.H"

I think this may have been used at some point, but is not used in the final implementation. Please correct me, if I am wrong.

SeverinDiederichs · 2021-06-22T09:58:23Z

src/fields/fft_poisson_solver/fft/WrapCuDST.cpp

-        if ( result != CUFFT_SUCCESS ) {
-            amrex::Print() << " cufftplan failed! Error: " <<
-                CuFFTUtils::cufftErrorToString(result) << "\n";
+        amrex::ParmParse pp("fields");


Suggested change

amrex::ParmParse pp("fields");

amrex::ParmParse pp("hipace");

I would prefer to parse for "hipace". We do this for other field related things like the bxby_solver as well and this would be the only parsing for fields. Maybe we can separate these at some point, but for convenience I would stick with hipace for now.

SeverinDiederichs · 2021-06-22T09:59:11Z

docs/source/run/parameters.rst

@@ -92,6 +92,10 @@ Modeling ion motion is not yet supported by the explicit solver
    Which solver to use.
    Possible values: ``predictor-corrector`` and ``explicit``.

+* ``fields.use_small_dst`` (`bool`) optional (default `0`)


Suggested change

* ``fields.use_small_dst`` (`bool`) optional (default `0`)

* ``hipace.use_small_dst`` (`bool`) optional (default `0`)

see comment at the parsing call

MaxThevenet

Looks great, thanks a lot for this PR! You will find a few minor comments below.
As discussed offline, it would be great if the code could propose a good default behaviour for parameter hipace.use_small_dst, so users don't have to exactly understand this point and still get the best performance. A solution would be to run a simulation with n=5 to n=12 and calculate the time spent per call to the Poisson solver with the following options:

box size nx = ny = 2^n or 2^n-1
single or double precision

and compare the small and large dst. Then you could store, for each configuration, the minimal number of grid points for which the small dst is faster, and use it to decide at runtime whether to use the small dst. Of course the user could still specify hipace.use_small_dst, which would override the code defaults. Of course we can add any relevant parameter to this list. Note that this should probably be in another PR, not in this one.

MaxThevenet · 2021-06-28T12:02:42Z

src/fields/fft_poisson_solver/fft/WrapCuDST.cpp

+                    const int n_data, const int n_batch)
+    {
+        HIPACE_PROFILE("AnyDST::Transpose()");
+        constexpr int tile_dim = 32;


Just a suggestion to make it a bit safer, in case we need to play with this parameter in the future.

Suggested change

constexpr int tile_dim = 32;

constexpr int tile_dim_power = 5;

const int tile_dim = std::pow(2, tile_dim_power);

You would probably need to add

#include <cmath>

at the end of the includes at the beginning of this file.

Looking at Compiler Explorer, I found that with constexpr int i = 32; num>>5 and num/i are exactly equivalent for gcc, clang and nvcc, as long as num is an unsigned int.
With default settings, gcc was also able to do the same thing with const int i = std::pow(2,5);, but not clang which would actually call std::pow.
I think it’s best to replace the >>5 with a divide.

src/fields/fft_poisson_solver/fft/WrapCuDST.cpp

src/fields/fft_poisson_solver/fft/AnyDST.H

MaxThevenet

Looks great to me, thanks for this great PR!

AlexanderSinn and others added 8 commits June 2, 2021 22:58

new dst wrong result

8c94fdb

fix awesome nvidia backwards indexing

77a9d9a

speed test

b3ca1d8

cleaning

5e21245

cleaning

c3f30c2

more cleaning

d9c8ab8

Merge branch 'development' into small_sine_fft

832901d

desynchronize

9e432ed

AlexanderSinn added component: fields About 3D fields and slices, field solvers etc. GPU Related to GPU acceleration labels Jun 16, 2021

AlexanderSinn requested review from MaxThevenet and SeverinDiederichs June 16, 2021 19:56

SeverinDiederichs requested changes Jun 22, 2021

View reviewed changes

add suggestions

c4036e1

SeverinDiederichs approved these changes Jun 22, 2021

View reviewed changes

AlexanderSinn mentioned this pull request Jun 23, 2021

Unify forward and backward dst plans to save GPU memory #537

Merged

9 tasks

MaxThevenet reviewed Jun 28, 2021

View reviewed changes

add suggestions

a174a8f

AlexanderSinn requested a review from MaxThevenet June 30, 2021 11:47

MaxThevenet approved these changes Jun 30, 2021

View reviewed changes

MaxThevenet merged commit 768b098 into Hi-PACE:development Jun 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Discrete Sine Transformation #534

New Discrete Sine Transformation #534

AlexanderSinn commented Jun 16, 2021 •

edited

Loading

SeverinDiederichs left a comment

SeverinDiederichs Jun 22, 2021

SeverinDiederichs Jun 22, 2021

SeverinDiederichs Jun 22, 2021

MaxThevenet left a comment •

edited

Loading

MaxThevenet Jun 28, 2021

AlexanderSinn Jun 30, 2021 •

edited

Loading

MaxThevenet Jun 30, 2021

MaxThevenet left a comment

	amrex::ParmParse pp("fields");
	amrex::ParmParse pp("hipace");

	* ``fields.use_small_dst`` (`bool`) optional (default `0`)
	* ``hipace.use_small_dst`` (`bool`) optional (default `0`)

	constexpr int tile_dim = 32;
	constexpr int tile_dim_power = 5;
	const int tile_dim = std::pow(2, tile_dim_power);

New Discrete Sine Transformation #534

New Discrete Sine Transformation #534

Conversation

AlexanderSinn commented Jun 16, 2021 • edited Loading

SeverinDiederichs left a comment

Choose a reason for hiding this comment

SeverinDiederichs Jun 22, 2021

Choose a reason for hiding this comment

SeverinDiederichs Jun 22, 2021

Choose a reason for hiding this comment

SeverinDiederichs Jun 22, 2021

Choose a reason for hiding this comment

MaxThevenet left a comment • edited Loading

Choose a reason for hiding this comment

MaxThevenet Jun 28, 2021

Choose a reason for hiding this comment

AlexanderSinn Jun 30, 2021 • edited Loading

Choose a reason for hiding this comment

MaxThevenet Jun 30, 2021

Choose a reason for hiding this comment

MaxThevenet left a comment

Choose a reason for hiding this comment

AlexanderSinn commented Jun 16, 2021 •

edited

Loading

MaxThevenet left a comment •

edited

Loading

AlexanderSinn Jun 30, 2021 •

edited

Loading