Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DepositCurrent: atomicAdd -> lockAdd #1059

Merged
merged 1 commit into from
Jan 23, 2024

Conversation

WeiqunZhang
Copy link
Member

The new amrex::BaseFab::lockAdd function is an optimized version of atomicAdd for OpenMP. In my testing on Frontier CPUs, it's up to 10x faster. I did not test the new function using hipace. But I used https://github.com/WeiqunZhang/amrex-devtests/tree/main/fab_atomicAdd for testing.

  • Small enough (< few 100s of lines), otherwise it should probably be split into smaller PRs
  • Tested (describe the tests in the PR description)
  • Runs on GPU (basic: the code compiles and run well with the new module)
  • Contains an automated test (checksum and/or comparison with theory)
  • Documented: all elements (classes and their members, functions, namespaces, etc.) are documented
  • Constified (All that can be const is const)
  • Code is clean (no unwanted comments, )
  • Style and code conventions are respected at the bottom of https://github.com/Hi-PACE/hipace
  • Proper label and GitHub project, if applicable

The new amrex::BaseFab::lockAdd function is an optimized version of
atomicAdd for OpenMP. In my testing on Frontier CPUs, it's up to 10x faster.
@AlexanderSinn
Copy link
Member

Hipace has AMREX_SPACEDIM == 3 but the array for DepositCurrent is only 2D with a length in the z direction of one. I think lockAdd would then only have one lock which probably would result in bad performance.

@WeiqunZhang
Copy link
Member Author

Yes, I overlooked that. Let me try to handle that in amrex.

Copy link
Member

@AlexanderSinn AlexanderSinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested HiPACE++ running on two 48-core CPUs with 4095*4095*100 cells and a tile size of 256^2. While the lockAdd version is faster, particularly at lower thread counts, it does not solve the bad scaling past 48 threads.
image

WeiqunZhang added a commit to AMReX-Codes/amrex that referenced this pull request Jan 22, 2024
## Summary

In HiPACE++, atomicAdd is used on 2d x & y planes even though
AMREX_SPACEDIM is 3. In that case, we would have all threads competing
for a single lock in the previous implementation of lockAdd. This PR
fixes this use case by having locks associated with the y-direction when
the number of cells in the z-direction is 1.

## Additional background

Hi-PACE/hipace#1059

## Checklist

The proposed changes:
- [x] fix a bug or incorrect behavior in AMReX
- [ ] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate
@AlexanderSinn AlexanderSinn added component: plasma About the plasma species performance optimization, benchmark, profiling, etc. labels Jan 22, 2024
@AlexanderSinn AlexanderSinn merged commit 982c75d into Hi-PACE:development Jan 23, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: plasma About the plasma species performance optimization, benchmark, profiling, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants