-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Particle tiling and OpenMP threading #551
Conversation
This PR is ready for review. |
I think we want to use a different algorithm there for the OpenMP case. Instead of using parallel prefix sum, we can use data duplication. It will probably be easier to explain this via a quick zoom meeting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for this PR!
This PR proposes to add logical tiling for plasma particles when running on CPU. The option is controlled with
hipace.do_tiling
(defaulttrue
), and affects field gather (FG) + push and current deposition (CD). The tile size is controlled withplasmas.sort_bin_size
(default32
). When activated, plasma particles are sorted logically (they are not re-ordered in memory, but an index mask is built to allow accessing particles in each tile) and particle operations are done on a per-tile basis:Effect on performance:
In practice, the main changes are:
TileSort.H/cpp
that does the logical tile sort (similar toBinSort.H/cpp
, renamedSliceSort.H/cpp
)Fields
) for the current on 1 tileAdvance
andDeposition
)