ROCm & rocFFT #583

ax3l · 2021-08-10T19:45:32Z

Implement rocFFT from equivalent WarpX routines.
Add HIP CI.

Note: do to an upstream ROCm bug in 4.2.0, you need to build either with ROCm 4.1.0 or ROCm 4.3.0 (and corresponding rocFFT releases), please.

ROCm is still a moving target in AMReX, so don't hesitate to reach out on the AMReX/WarpX channels.
For Cray/HPE software environments with MPI (e.g., OLCF Spock), also take note of this improvement for CMake 3.22+: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/6264 and the work-around posted therein to build in the meantime.

Small enough (< few 100s of lines), otherwise it should probably be split into smaller PRs
Tested (describe the tests in the PR description)
Runs on GPU (basic: the code compiles and run well with the new module)
Contains an automated test (checksum and/or comparison with theory)
Documented: all elements (classes and their members, functions, namespaces, etc.) are documented
Constified (All that can be const is const)
Code is clean (no unwanted comments, )
Style and code conventions are respected at the bottom of https://github.com/Hi-PACE/hipace
Proper label and GitHub project, if applicable

src/fields/fft_poisson_solver/fft/WrapRocDST.cpp

src/fields/fft_poisson_solver/fft/RocFFTUtils.cpp

ax3l · 2021-08-11T17:20:37Z

@MaxThevenet @wetzel-desy I updated the WrapRocDST implementation with the content from #584

Please feel free to review. I saw there are a few inline comments left in Execute that might need double-checking.
Also compare this to the WarpCuDST if needed, where we seem to handle forward/backward transformation slightly differently.

.github/workflows/hip.yml

Implement rocFFT from equivalent WarpX routines.

C++ inside `extern C` does not work: https://github.com/ROCmSoftwarePlatform/rocFFT/blob/rocm-4.3.0/library/include/rocfft.h#L36-L42 Fixed in `develop` of rocFFT, but did not land in 4.3.0

Co-authored-by: Maxence Thevenet <maxence.thevenet@desy.de>

src/fields/fft_poisson_solver/fft/WrapRocDST.cpp

`Transpose` all but Transpose

ax3l · 2021-08-13T16:02:48Z

@SeverinDiederichs via Slack:
For testing, using hipace.do_small_dst = 0 is a good first step, which only uses the ShrinkC2R and ExpandR2C functions.
That way we do a 2D FFT and don't need the transpose, while the default does 1D DSTs (which need transposes).

MaxThevenet · 2021-08-13T17:12:10Z

src/fields/fft_poisson_solver/fft/WrapRocDST.cpp

+        result = rocfft_execute(dst_plan.m_plan,
+                                (void**)&(dst_plan.m_expanded_position_array),
+                                (void**)&(dst_plan.m_expanded_fourier_array),
+                                execinfo);


The code compiles well indeed but when trying with this input file I get a segfault in Execute. The segfault disappears when commenting out these lines. Could also be coming from rocfft_plan_create. Note that this has no I/O, I think I get another segfault when I/O are on.

Thanks for testing!
Looking at the in-code comments by @wetzel-desy before these lines, I guess we need to double-check the API contract for rocfft here.

@wetzel-desy do you have any idea of what could go wrong here, and how to fix it?

I honestly took inspiration from the WarpX implementation, which calls rocfft_execute. So there might be a difference between the FFT and DST with regard to the underlying data types of the two arrays that are supplied as input. But as I am not involved with the overall structure, I wouldn't know how to fix this without testing... Sorry.

Does this work?

void* in[2] = {dst_plan.m_expanded_position_array->dataPtr(), nullptr}; void* out[2] = {dst_plan.m_expanded_fourier_array->dataPtr(), nullptr}; result = rocfft_execute(dst_plan.m_plan, in, out, execinfo);

Excellent, that did it indeed!! With the correct profile and submission script, I unfortunately didn't manage to get openPMD working yet (get a runtime error) but adding some in-situ diags with

auto& ptile = m_multi_beam.getBeam(0); const auto& aos = ptile.GetArrayOfStructs(); const auto& pos_structs = aos.begin(); amrex::Gpu::DeviceVector<amrex::Real> dV(1); amrex::Vector<amrex::Real> hV(1); auto pdV = dV.begin(); amrex::ParallelFor( ptile.numParticles(), [=] AMREX_GPU_DEVICE (long ip) { amrex::Real x = pos_structs[ip].pos(0); amrex::Gpu::Atomic::Add(pdV, x*x); }); amrex::Gpu::copy(amrex::Gpu::deviceToHost, dV.begin(), dV.end(), hV.begin()); amrex::Print()<<hV[0]<<'\n';

line 400 in Evolve I could check that the evolution was the same on Spock as on Summit. Thank you all!
.

MaxThevenet · 2021-09-12T13:27:20Z

This PR is now ready for review IMO. Compilation and execution work well, and the results (single GPU, 20 time step, comparing the evolution of the beam width) agree with a simulation on an NVIDIA GPU. The main piece missing is openPMD I/O: I currently get a segfault when running this input file with I/O, with the following Backtrace:

Hipace::Evolve() at ??:?
OpenPMDWriter::InitDiagnostics(int, int, int, int) at ??:?
openPMD::Series::Series(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, openPMD::Access, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at ??:?
openPMD::internal::SeriesInternal::SeriesInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, openPMD::Access, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at ??:?
openPMD::SeriesInterface::init(std::shared_ptr<openPMD::AbstractIOHandler>, std::unique_ptr<openPMD::SeriesInterface::ParsedInput, std::default_delete<openPMD::SeriesInterface::ParsedInput> >) at ??:?
openPMD::SeriesInterface::initDefaults(openPMD::IterationEncoding) at ??:?
bool openPMD::AttributableInterface::setAttribute<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) at ??:?
decltype(auto) mpark::detail::visitation::alt::visit_alt<mpark::detail::dtor, mpark::detail::destructor<mpark::detail::traits<char, unsigned char, short, int, long, long long, unsigned short, unsigned int, unsigned long, unsigned long long, float, double, long double, std::complex<float>, std::complex<double>, std::complex<long double>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<char, std::allocator<char> >, std::vector<short, std::allocator<short> >, std::vector<int, std::allocator<int> >, std::vector<long, std::allocator<long> >, std::vector<long long, std::allocator<long long> >, std::vector<unsigned char, std::allocator<unsigned char> >, std::vector<unsigned short, std::allocator<unsigned short> >, std::vector<unsigned int, std::allocator<unsigned int> >, std::vector<unsigned long, std::allocator<unsigned long> >, std::vector<unsigned long long, std::allocator<unsigned long long> >, std::vector<float, std::allocator<float> >, std::vector<double, std::allocator<double> >, std::vector<long double, std::allocator<long double> >, std::vector<std::complex<float>, std::allocator<std::complex<float> > >, std::vector<std::complex<double>, std::allocator<std::complex<double> > >, std::vector<std::complex<long double>, std::allocator<std::complex<long double> > >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::array<double, 7ul>, bool>, (mpark::detail::Trait)1>&>(mpark::detail::dtor&&, mpark::detail::destructor<mpark::detail::traits<char, unsigned char, short, int, long, long long, unsigned short, unsigned int, unsigned long, unsigned long long, float, double, long double, std::complex<float>, std::complex<double>, std::complex<long double>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<char, std::allocator<char> >, std::vector<short, std::allocator<short> >, std::vector<int, std::allocator<int> >, std::vector<long, std::allocator<long> >, std::vector<long long, std::allocator<long long> >, std::vector<unsigned char, std::allocator<unsigned char> >, std::vector<unsigned short, std::allocator<unsigned short> >, std::vector<unsigned int, std::allocator<unsigned int> >, std::vector<unsigned long, std::allocator<unsigned long> >, std::vector<unsigned long long, std::allocator<unsigned long long> >, std::vector<float, std::allocator<float> >, std::vector<double, std::allocator<double> >, std::vector<long double, std::allocator<long double> >, std::vector<std::complex<float>, std::allocator<std::complex<float> > >, std::vector<std::complex<double>, std::allocator<std::complex<double> > >, std::vector<std::complex<long double>, std::allocator<std::complex<long double> > >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::array<double, 7ul>, bool>, (mpark::detail::Trait)1>&) at ??:?

However there is no rush, I/O can be fixed in a subsequent PR.

MaxThevenet

Looks all good to me, I think this can be merged. Thank you!

ax3l added component: fields About 3D fields and slices, field solvers etc. GPU Related to GPU acceleration CI Continuous integration, checksum and analysis tests, GitHub Actions, etc. labels Aug 10, 2021

ax3l force-pushed the topic-rocFFT branch 5 times, most recently from 96c4e19 to 4217cf3 Compare August 10, 2021 20:47

ax3l commented Aug 10, 2021

View reviewed changes

src/fields/fft_poisson_solver/fft/WrapRocDST.cpp Outdated Show resolved Hide resolved

ax3l requested a review from MaxThevenet August 10, 2021 20:49

ax3l commented Aug 11, 2021

View reviewed changes

src/fields/fft_poisson_solver/fft/RocFFTUtils.cpp Show resolved Hide resolved

src/fields/fft_poisson_solver/fft/RocFFTUtils.cpp Outdated Show resolved Hide resolved

ax3l mentioned this pull request Aug 11, 2021

[WIP] Rocm support #584

Closed

9 tasks

SeverinDiederichs reviewed Aug 11, 2021

View reviewed changes

.github/workflows/hip.yml Outdated Show resolved Hide resolved

ax3l changed the title ~~[WIP] rocFFT~~ rocFFT Aug 11, 2021

ax3l changed the title ~~rocFFT~~ ROCm & rocFFT Aug 11, 2021

ax3l and others added 4 commits August 11, 2021 10:27

AMD: Use rocFFT

723c1c5

Implement rocFFT from equivalent WarpX routines.

CI: HIP 4.1.1

c2b4693

Work-Around: ROCm/rocFFT <=4.3.0

7437a5b

C++ inside `extern C` does not work: https://github.com/ROCmSoftwarePlatform/rocFFT/blob/rocm-4.3.0/library/include/rocfft.h#L36-L42 Fixed in `develop` of rocFFT, but did not land in 4.3.0

Update WrapRocDST with Tim's Impl.

e307d79

Co-authored-by: Maxence Thevenet <maxence.thevenet@desy.de>

ax3l force-pushed the topic-rocFFT branch from c467b2d to e307d79 Compare August 11, 2021 17:27

ax3l commented Aug 11, 2021

View reviewed changes

src/fields/fft_poisson_solver/fft/WrapRocDST.cpp Outdated Show resolved Hide resolved

ax3l commented Aug 11, 2021

View reviewed changes

src/fields/fft_poisson_solver/fft/WrapRocDST.cpp Outdated Show resolved Hide resolved

ax3l commented Aug 11, 2021

View reviewed changes

src/fields/fft_poisson_solver/fft/WrapRocDST.cpp Show resolved Hide resolved

WrapRocDST: Status & Cleanup

d728639

ax3l force-pushed the topic-rocFFT branch from af9365d to d728639 Compare August 11, 2021 18:06

ax3l commented Aug 11, 2021

View reviewed changes

src/fields/fft_poisson_solver/fft/WrapRocDST.cpp Outdated Show resolved Hide resolved

ax3l commented Aug 11, 2021

View reviewed changes

src/fields/fft_poisson_solver/fft/WrapRocDST.cpp Outdated Show resolved Hide resolved

MaxThevenet and others added 2 commits August 11, 2021 14:59

Template Execute for RocDST as in CuDST

9c22341

WrapRocDST: Implement More ToDo's

8ed7982

`Transpose` all but Transpose

WrapRocDST: Copy Transpose From CUDA

13238cc

MaxThevenet reviewed Aug 13, 2021

View reviewed changes

MaxThevenet added 4 commits September 11, 2021 15:55

fix call to rocfft_execute

f85ea07

add doc to compile and run on Spock

7f2230c

Merge branch 'development' into topic-rocFFT

7e466b7

eol

a364e0c

MaxThevenet approved these changes Sep 21, 2021

View reviewed changes

MaxThevenet approved these changes Sep 27, 2021

View reviewed changes

MaxThevenet merged commit f933188 into Hi-PACE:development Sep 27, 2021

ax3l deleted the topic-rocFFT branch November 30, 2021 06:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCm & rocFFT #583

ROCm & rocFFT #583

ax3l commented Aug 10, 2021 •

edited

Loading

ax3l commented Aug 11, 2021 •

edited

Loading

ax3l commented Aug 13, 2021 •

edited

Loading

MaxThevenet Aug 13, 2021 •

edited

Loading

ax3l Aug 13, 2021 •

edited

Loading

MaxThevenet Sep 1, 2021

wetzel-desy Sep 3, 2021

AlexanderSinn Sep 3, 2021

MaxThevenet Sep 11, 2021

MaxThevenet commented Sep 12, 2021 •

edited

Loading

MaxThevenet left a comment

ROCm & rocFFT #583

ROCm & rocFFT #583

Conversation

ax3l commented Aug 10, 2021 • edited Loading

ax3l commented Aug 11, 2021 • edited Loading

ax3l commented Aug 13, 2021 • edited Loading

MaxThevenet Aug 13, 2021 • edited Loading

Choose a reason for hiding this comment

ax3l Aug 13, 2021 • edited Loading

Choose a reason for hiding this comment

MaxThevenet Sep 1, 2021

Choose a reason for hiding this comment

wetzel-desy Sep 3, 2021

Choose a reason for hiding this comment

AlexanderSinn Sep 3, 2021

Choose a reason for hiding this comment

MaxThevenet Sep 11, 2021

Choose a reason for hiding this comment

MaxThevenet commented Sep 12, 2021 • edited Loading

MaxThevenet left a comment

Choose a reason for hiding this comment

ax3l commented Aug 10, 2021 •

edited

Loading

ax3l commented Aug 11, 2021 •

edited

Loading

ax3l commented Aug 13, 2021 •

edited

Loading

MaxThevenet Aug 13, 2021 •

edited

Loading

ax3l Aug 13, 2021 •

edited

Loading

MaxThevenet commented Sep 12, 2021 •

edited

Loading