Add scoped atomic_thread_fence #1644

vchuravy · 2022-10-23T00:32:12Z

As noticed in EnzymeAD/Enzyme.jl#511 CUDA C++ support a wider selection of memory orders
and emits different assembly for SM_70 and above: https://godbolt.org/z/Y7Pj5G7sK

For now I just added the memory fences necessary to implement the rest.

@tkf @maleadt over the long-term I would be in favor of moving to Atomix.jl instead of CUDA.@atomic
is there any shared infrastructure we can use? As you see I am defining scope and order here again.

src/device/intrinsics/atomics.jl

codecov · 2022-11-05T20:26:17Z

Codecov Report

Base: 61.68% // Head: 60.08% // Decreases project coverage by -1.60% ⚠️

Coverage data is based on head (654870d) compared to base (dcb175e).
Patch has no changes to coverable lines.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1644      +/-   ##
==========================================
- Coverage   61.68%   60.08%   -1.61%     
==========================================
  Files         151      151              
  Lines       11349    10833     -516     
==========================================
- Hits         7001     6509     -492     
+ Misses       4348     4324      -24

Impacted Files	Coverage Δ
lib/cusolver/linalg.jl	`49.71% <0.00%> (-36.73%)`	⬇️
lib/cublas/CUBLAS.jl	`52.25% <0.00%> (-24.07%)`	⬇️
lib/cusparse/conversions.jl	`79.77% <0.00%> (-14.31%)`	⬇️
lib/cusparse/interfaces.jl	`58.55% <0.00%> (-13.77%)`	⬇️
src/compiler/gpucompiler.jl	`82.14% <0.00%> (-11.20%)`	⬇️
lib/cusparse/level3.jl	`63.85% <0.00%> (-10.85%)`	⬇️
lib/cusparse/broadcast.jl	`25.92% <0.00%> (-10.35%)`	⬇️
lib/cusparse/types.jl	`41.89% <0.00%> (-8.11%)`	⬇️
lib/cusparse/generic.jl	`89.30% <0.00%> (-7.87%)`	⬇️
src/utilities.jl	`68.57% <0.00%> (-7.75%)`	⬇️
... and 71 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

jgreener64 · 2023-01-27T14:24:05Z

Does this need something else before it can be merged as an intermediate solution? I am finding it useful to get Atomix.@atomic :monotonic working with Enzyme on GPU (JuliaConcurrent/Atomix.jl#33 and EnzymeAD/Enzyme.jl#511).

vchuravy · 2023-01-27T16:03:59Z

This shouldn't work with Enzyme, since Enzyme won't understand the assembly inserted, that's one of the reasons I haven't pushed on this further.

jgreener64 · 2023-01-27T17:56:55Z

It seems to work with EnzymeAD/Enzyme.jl#511 and in another context I tried, unless I am getting something mixed up or you mean it will only work in specific cases.

Add scoped atomic_thread_fence

be8f95a

maleadt added enhancement New feature or request cuda kernels Stuff about writing CUDA kernels. labels Oct 24, 2022

maleadt reviewed Oct 26, 2022

View reviewed changes

src/device/intrinsics/atomics.jl Outdated Show resolved Hide resolved

vchuravy added 3 commits November 5, 2022 14:51

fixup! Add scoped atomic_thread_fence

5c2fee1

fixup! fixup! Add scoped atomic_thread_fence

adbd7d6

Mock out load and store

62d0977

vchuravy added 5 commits November 5, 2022 16:27

Finish load and store

7d80632

Add todos

74278ea

stop the copy-pasta

2da2386

exch and cas

e1d259b

fixes

654870d

vchuravy mentioned this pull request Jan 11, 2023

Instruction selection error from loads and stores on CUDA JuliaConcurrent/Atomix.jl#33

Open

jgreener64 mentioned this pull request Jan 13, 2023

CUDA.@atomic error in GPU kernel EnzymeAD/Enzyme.jl#511

Open

vchuravy mentioned this pull request Mar 10, 2023

Use Atomix #1790

Draft

4 tasks

maleadt force-pushed the master branch from 476979e to d53a63e Compare March 16, 2023 12:34

maleadt force-pushed the master branch from c97bc77 to d57e020 Compare September 8, 2023 20:12

maleadt force-pushed the master branch from 1cb1f53 to 1a1d127 Compare September 18, 2023 16:28

maleadt force-pushed the master branch from aef3298 to 4b017c6 Compare January 18, 2024 12:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scoped atomic_thread_fence #1644

Add scoped atomic_thread_fence #1644

vchuravy commented Oct 23, 2022 •

edited by maleadt

Loading

codecov bot commented Nov 5, 2022 •

edited

Loading

jgreener64 commented Jan 27, 2023

vchuravy commented Jan 27, 2023

jgreener64 commented Jan 27, 2023

Add scoped atomic_thread_fence #1644

Are you sure you want to change the base?

Add scoped atomic_thread_fence #1644

Conversation

vchuravy commented Oct 23, 2022 • edited by maleadt Loading

codecov bot commented Nov 5, 2022 • edited Loading

Codecov Report

jgreener64 commented Jan 27, 2023

vchuravy commented Jan 27, 2023

jgreener64 commented Jan 27, 2023

vchuravy commented Oct 23, 2022 •

edited by maleadt

Loading

codecov bot commented Nov 5, 2022 •

edited

Loading