DepositCurrent: atomicAdd -> lockAdd #1059

WeiqunZhang · 2024-01-16T05:11:09Z

The new amrex::BaseFab::lockAdd function is an optimized version of atomicAdd for OpenMP. In my testing on Frontier CPUs, it's up to 10x faster. I did not test the new function using hipace. But I used https://github.com/WeiqunZhang/amrex-devtests/tree/main/fab_atomicAdd for testing.

Small enough (< few 100s of lines), otherwise it should probably be split into smaller PRs
Tested (describe the tests in the PR description)
Runs on GPU (basic: the code compiles and run well with the new module)
Contains an automated test (checksum and/or comparison with theory)
Documented: all elements (classes and their members, functions, namespaces, etc.) are documented
Constified (All that can be const is const)
Code is clean (no unwanted comments, )
Style and code conventions are respected at the bottom of https://github.com/Hi-PACE/hipace
Proper label and GitHub project, if applicable

The new amrex::BaseFab::lockAdd function is an optimized version of atomicAdd for OpenMP. In my testing on Frontier CPUs, it's up to 10x faster.

AlexanderSinn · 2024-01-16T15:21:02Z

Hipace has AMREX_SPACEDIM == 3 but the array for DepositCurrent is only 2D with a length in the z direction of one. I think lockAdd would then only have one lock which probably would result in bad performance.

WeiqunZhang · 2024-01-16T16:35:15Z

Yes, I overlooked that. Let me try to handle that in amrex.

AlexanderSinn

I tested HiPACE++ running on two 48-core CPUs with 4095*4095*100 cells and a tile size of 256^2. While the lockAdd version is faster, particularly at lower thread counts, it does not solve the bad scaling past 48 threads.

## Summary In HiPACE++, atomicAdd is used on 2d x & y planes even though AMREX_SPACEDIM is 3. In that case, we would have all threads competing for a single lock in the previous implementation of lockAdd. This PR fixes this use case by having locks associated with the y-direction when the number of cells in the z-direction is 1. ## Additional background Hi-PACE/hipace#1059 ## Checklist The proposed changes: - [x] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate

DepositCurrent: atomicAdd -> lockAdd

1156f57

The new amrex::BaseFab::lockAdd function is an optimized version of atomicAdd for OpenMP. In my testing on Frontier CPUs, it's up to 10x faster.

WeiqunZhang requested a review from AlexanderSinn January 16, 2024 05:11

WeiqunZhang mentioned this pull request Jan 16, 2024

lockAdd: case of 2D plane in 3D AMReX-Codes/amrex#3700

Merged

5 tasks

AlexanderSinn approved these changes Jan 22, 2024

View reviewed changes

AlexanderSinn added component: plasma About the plasma species performance optimization, benchmark, profiling, etc. labels Jan 22, 2024

AlexanderSinn merged commit 982c75d into Hi-PACE:development Jan 23, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DepositCurrent: atomicAdd -> lockAdd #1059

DepositCurrent: atomicAdd -> lockAdd #1059

WeiqunZhang commented Jan 16, 2024

AlexanderSinn commented Jan 16, 2024

WeiqunZhang commented Jan 16, 2024

AlexanderSinn left a comment

DepositCurrent: atomicAdd -> lockAdd #1059

DepositCurrent: atomicAdd -> lockAdd #1059

Conversation

WeiqunZhang commented Jan 16, 2024

AlexanderSinn commented Jan 16, 2024

WeiqunZhang commented Jan 16, 2024

AlexanderSinn left a comment

Choose a reason for hiding this comment