-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scoped atomic_thread_fence #1644
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportBase: 61.68% // Head: 60.08% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #1644 +/- ##
==========================================
- Coverage 61.68% 60.08% -1.61%
==========================================
Files 151 151
Lines 11349 10833 -516
==========================================
- Hits 7001 6509 -492
+ Misses 4348 4324 -24
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Does this need something else before it can be merged as an intermediate solution? I am finding it useful to get |
This shouldn't work with Enzyme, since Enzyme won't understand the assembly inserted, that's one of the reasons I haven't pushed on this further. |
It seems to work with EnzymeAD/Enzyme.jl#511 and in another context I tried, unless I am getting something mixed up or you mean it will only work in specific cases. |
As noticed in EnzymeAD/Enzyme.jl#511 CUDA C++ support a wider selection of memory orders
and emits different assembly for SM_70 and above: https://godbolt.org/z/Y7Pj5G7sK
For now I just added the memory fences necessary to implement the rest.
@tkf @maleadt over the long-term I would be in favor of moving to Atomix.jl instead of
CUDA.@atomic
is there any shared infrastructure we can use? As you see I am defining scope and order here again.