-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCm & rocFFT #583
ROCm & rocFFT #583
Conversation
96c4e19
to
4217cf3
Compare
@MaxThevenet @wetzel-desy I updated the Please feel free to review. I saw there are a few inline comments left in |
Implement rocFFT from equivalent WarpX routines.
C++ inside `extern C` does not work: https://github.com/ROCmSoftwarePlatform/rocFFT/blob/rocm-4.3.0/library/include/rocfft.h#L36-L42 Fixed in `develop` of rocFFT, but did not land in 4.3.0
Co-authored-by: Maxence Thevenet <maxence.thevenet@desy.de>
`Transpose` all but Transpose
@SeverinDiederichs via Slack: |
result = rocfft_execute(dst_plan.m_plan, | ||
(void**)&(dst_plan.m_expanded_position_array), | ||
(void**)&(dst_plan.m_expanded_fourier_array), | ||
execinfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code compiles well indeed but when trying with this input file I get a segfault in Execute. The segfault disappears when commenting out these lines. Could also be coming from rocfft_plan_create. Note that this has no I/O, I think I get another segfault when I/O are on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for testing!
Looking at the in-code comments by @wetzel-desy before these lines, I guess we need to double-check the API contract for rocfft
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wetzel-desy do you have any idea of what could go wrong here, and how to fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly took inspiration from the WarpX implementation, which calls rocfft_execute. So there might be a difference between the FFT and DST with regard to the underlying data types of the two arrays that are supplied as input. But as I am not involved with the overall structure, I wouldn't know how to fix this without testing... Sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work?
void* in[2] = {dst_plan.m_expanded_position_array->dataPtr(), nullptr};
void* out[2] = {dst_plan.m_expanded_fourier_array->dataPtr(), nullptr};
result = rocfft_execute(dst_plan.m_plan,
in,
out,
execinfo);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, that did it indeed!! With the correct profile and submission script, I unfortunately didn't manage to get openPMD working yet (get a runtime error) but adding some in-situ diags with
auto& ptile = m_multi_beam.getBeam(0);
const auto& aos = ptile.GetArrayOfStructs();
const auto& pos_structs = aos.begin();
amrex::Gpu::DeviceVector<amrex::Real> dV(1);
amrex::Vector<amrex::Real> hV(1);
auto pdV = dV.begin();
amrex::ParallelFor(
ptile.numParticles(),
[=] AMREX_GPU_DEVICE (long ip) {
amrex::Real x = pos_structs[ip].pos(0);
amrex::Gpu::Atomic::Add(pdV, x*x);
});
amrex::Gpu::copy(amrex::Gpu::deviceToHost, dV.begin(), dV.end(), hV.begin());
amrex::Print()<<hV[0]<<'\n';
line 400 in Evolve I could check that the evolution was the same on Spock as on Summit. Thank you all!
.
This PR is now ready for review IMO. Compilation and execution work well, and the results (single GPU, 20 time step, comparing the evolution of the beam width) agree with a simulation on an NVIDIA GPU. The main piece missing is openPMD I/O: I currently get a segfault when running this input file with I/O, with the following Backtrace:
However there is no rush, I/O can be fixed in a subsequent PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all good to me, I think this can be merged. Thank you!
Implement rocFFT from equivalent WarpX routines.
Add HIP CI.
Note: do to an upstream ROCm bug in 4.2.0, you need to build either with ROCm 4.1.0 or ROCm 4.3.0 (and corresponding rocFFT releases), please.
ROCm is still a moving target in AMReX, so don't hesitate to reach out on the AMReX/WarpX channels.
For Cray/HPE software environments with MPI (e.g., OLCF Spock), also take note of this improvement for CMake 3.22+: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/6264 and the work-around posted therein to build in the meantime.
const
isconst
)