Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the number of Hankel transforms per iteration #161

Merged
merged 3 commits into from
Dec 8, 2017

Conversation

RemiLehe
Copy link
Member

@RemiLehe RemiLehe commented Dec 6, 2017

In the current dev branch, we perform 16 Hankel transform and 16 Fourier transform per azimuthal mode and per iteration. Within those 16 transforms, 12 transforms are done because we do a spect2interp and interp2spect of the fields E and B (2 * 2 fields * 3 components).

The reason for performing these back-and-forth transformations is that we have to perform some operations on the fields in spectral space (Maxwell push) and some operations in real space (MPI exchanges and damping of the open boundaries).

However, because the MPI exchanges and damping of the open boundaries is purely along z, updating the spectral fields can be done only by a succession an inverse and forward Fourier transform (no Hankel transform involved). Then updating the fields in interpolation space requires an additional Hankel + Fourier transform.

This effectively replaces the 16 Hankel and 16 Fourier transforms by 10 Hankel and 22 Fourier transforms (per mode, per iteration). Because the Fourier transforms are almost always much faster than the Hankel transforms, this results in a speedup of the simulation.

Note: I am definitely not the first to use this trick! @Hightower has been using it for a while in his code chimera.

@MKirchen
Copy link
Contributor

MKirchen commented Dec 7, 2017

Nice, looking forward to this PR!

@RemiLehe RemiLehe changed the title [WIP] Reduce the number of Hankel transforms per iteration Reduce the number of Hankel transforms per iteration Dec 8, 2017
@@ -705,10 +705,10 @@ def exchange_particles_aperiodic_subdomain(self, species, fld, time ):
add_buffers_to_particles( species, float_recv_left, float_recv_right,
uint_recv_left, uint_recv_right )

def damp_guard_EB( self, interp ):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I renamed this function to be more explicit about what it does (esp. because the guard cells inbetween two MPI processes are not damped by this function)

@RemiLehe
Copy link
Member Author

RemiLehe commented Dec 8, 2017

I performed the usual 2000 x 400 benchmark (timing the complete PIC, not just the transforms) on a GTX 1080 Ti GPU:

  • Current dev: 325 ms/iteration
  • This PR: 275 ms/iteration

@MKirchen
Copy link
Contributor

MKirchen commented Dec 8, 2017

Nice!

@MKirchen MKirchen merged commit 1690d84 into fbpic:dev Dec 8, 2017
@hightower8083
Copy link
Contributor

hightower8083 commented Dec 8, 2017

wow, 18 percent, cool!
But I'm not sure i get how you've got it down to 10 DHTs: 6 to get in J and rho (I guess each mode of rho also needs two + and - components) + 6 to get out E and B after the Maxwell push..?

UPD: oups, my bad -- I've forgot you don't need the + and - components for the scalars..

@RemiLehe RemiLehe deleted the partial_transforms branch December 12, 2017 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants