-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Refactor GPU reductions and add unsafe atomic tunings #247
Conversation
Use atomic fucntions directly in Base and Lambda variants instead of using RAJA atomics. Add a variety of util functions.
Use single macro for all single reduction kernels instead of duplicating the code
Now gpu kernels using atomics have a default tuning called atomic and tunings with unsafe atomics have a tuning called unsafeAtomics.
This makes it clear where atomics are being used as part of a reduction or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of changes in this PR. I think I understand them all.
I kept finding things to change... |
#endif | ||
template < size_t block_size > | ||
__launch_bounds__(block_size) | ||
__global__ void reduce_sum_unsafe(Real_ptr x, Real_ptr dsum, Real_type sum_init, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we start to add more tests with atomics we might not want to split these up into separate kernels for compile time concerns but should be OK for now.
return devProp.gcnArchName; | ||
} | ||
|
||
#if defined(__gfx90a__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be able to do away with gfx90a arch check in future Rocm release.
I'm closing this PR for a number of reasons. Its worth differentiating between safe and unsafe atomics as that is a temporary issue. The reducer implementation is difficult to put in a macro and not identically duplicated, so its not a huge gain to abstract it. The lambda variants are not necessary. |
Refactor GPU reductions and add unsafe atomic tunings
This adds a "unsafeAtomic" tuning of each of the kernels with a hip variant using atomics.
This also refactors gpu reductions so the implementation is not duplicated in each kernel with a reduction.
This also adds lambda variants of gpu reduction kernels.