Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scoped atomic_thread_fence #1644

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft

Add scoped atomic_thread_fence #1644

wants to merge 9 commits into from

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Oct 23, 2022

As noticed in EnzymeAD/Enzyme.jl#511 CUDA C++ support a wider selection of memory orders
and emits different assembly for SM_70 and above: https://godbolt.org/z/Y7Pj5G7sK

For now I just added the memory fences necessary to implement the rest.

@tkf @maleadt over the long-term I would be in favor of moving to Atomix.jl instead of CUDA.@atomic
is there any shared infrastructure we can use? As you see I am defining scope and order here again.

@maleadt maleadt added enhancement New feature or request cuda kernels Stuff about writing CUDA kernels. labels Oct 24, 2022
@codecov
Copy link

codecov bot commented Nov 5, 2022

Codecov Report

Base: 61.68% // Head: 60.08% // Decreases project coverage by -1.60% ⚠️

Coverage data is based on head (654870d) compared to base (dcb175e).
Patch has no changes to coverable lines.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1644      +/-   ##
==========================================
- Coverage   61.68%   60.08%   -1.61%     
==========================================
  Files         151      151              
  Lines       11349    10833     -516     
==========================================
- Hits         7001     6509     -492     
+ Misses       4348     4324      -24     
Impacted Files Coverage Δ
lib/cusolver/linalg.jl 49.71% <0.00%> (-36.73%) ⬇️
lib/cublas/CUBLAS.jl 52.25% <0.00%> (-24.07%) ⬇️
lib/cusparse/conversions.jl 79.77% <0.00%> (-14.31%) ⬇️
lib/cusparse/interfaces.jl 58.55% <0.00%> (-13.77%) ⬇️
src/compiler/gpucompiler.jl 82.14% <0.00%> (-11.20%) ⬇️
lib/cusparse/level3.jl 63.85% <0.00%> (-10.85%) ⬇️
lib/cusparse/broadcast.jl 25.92% <0.00%> (-10.35%) ⬇️
lib/cusparse/types.jl 41.89% <0.00%> (-8.11%) ⬇️
lib/cusparse/generic.jl 89.30% <0.00%> (-7.87%) ⬇️
src/utilities.jl 68.57% <0.00%> (-7.75%) ⬇️
... and 71 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jgreener64
Copy link

Does this need something else before it can be merged as an intermediate solution? I am finding it useful to get Atomix.@atomic :monotonic working with Enzyme on GPU (JuliaConcurrent/Atomix.jl#33 and EnzymeAD/Enzyme.jl#511).

@vchuravy
Copy link
Member Author

This shouldn't work with Enzyme, since Enzyme won't understand the assembly inserted, that's one of the reasons I haven't pushed on this further.

@jgreener64
Copy link

It seems to work with EnzymeAD/Enzyme.jl#511 and in another context I tried, unless I am getting something mixed up or you mean it will only work in specific cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda kernels Stuff about writing CUDA kernels. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants