Add Metal atomics backend following the AtomixCUDA package - almost verbatim. #39

anicusan · 2024-10-30T14:41:32Z

Tests pass on my Mac M3. Only changes to AtomixCUDA is the use of Int32 instead of Int - are there any plans to add support for 64-bit atomics, at least when natively supported on the M2 and up?

…batim

maleadt

Thanks!

maleadt · 2024-11-06T13:22:55Z

Tests pass on my Mac M3.

Did you also verify this against KernelAbstractions?

are there any plans to add support for 64-bit atomics, at least when natively supported on the M2 and up?

No concrete plans, but it shouldn't be too hard to generalize the functionality for that in Metal.jl

I've also bumped the version numbers here to v1.0 to get out of the dreaded v0.x regime, so it will take until KA.jl bumps compat for this to actually be installable.

anicusan · 2024-11-06T15:07:02Z

Thanks a lot for going over this! We're waiting on atomics in ImplicitBVH.jl to make it work across the JuliaGPU stacks.

I made a local copy of KernelAbstractions and bumped the [compat] to Atomix = "1.0", and dev-ed Atomix, AtomixMetal, and KernelAbstractions. I tried running the following code:

using KernelAbstractions
using Atomix
using Metal

@kernel cpu=false function atomic_add_ka!(v)
    i = @index(Global)
    Atomix.@atomic v[i] += eltype(v)(1)
end

v = Metal.zeros(Int32, 1000)
atomic_add_ka!(get_backend(v), 128)(v, ndrange=length(v))
@assert all(Array(v) .== 1)

Which gives me the following error:

ERROR: LoadError: Compilation to native code failed; see below for details.
If you think this is a bug, please file an issue and attach /var/folders/gk/pdh0y2f100s3z_kkb9wv50tr0000gn/T/jl_wR4DG9gR0D.metallib
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] macro expansion
    @ ~/.julia/packages/Metal/JtmpJ/src/compiler/compilation.jl:195 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/ObjectiveC/C7BVt/src/os.jl:264 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/Metal/JtmpJ/src/compiler/compilation.jl:178 [inlined]
  [5] (::Metal.var"#171#172"{Bool, GPUCompiler.CompilerJob{…}, @NamedTuple{…}})()
    @ Metal ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:637
  [6] macro expansion
    @ ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:565 [inlined]
  [7] macro expansion
    @ ./lock.jl:273 [inlined]
  [8] ObjectiveC.Foundation.NSAutoreleasePool(f::Metal.var"#171#172"{Bool, GPUCompiler.CompilerJob{…}, @NamedTuple{…}})
    @ ObjectiveC.Foundation ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:557
  [9] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{image::Vector{UInt8}, entry::String}; return_function::Bool)
    @ Metal ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:636
 [10] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/execution.jl:262
 [11] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/execution.jl:151
 [12] macro expansion
    @ ~/.julia/packages/Metal/JtmpJ/src/compiler/execution.jl:189 [inlined]
 [13] macro expansion
    @ ./lock.jl:273 [inlined]
 [14] mtlfunction(f::typeof(gpu_atomic_add_ka!), tt::Type{Tuple{…}}; name::Nothing, kwargs::@Kwargs{})
    @ Metal ~/.julia/packages/Metal/JtmpJ/src/compiler/execution.jl:184
 [15] mtlfunction
    @ ~/.julia/packages/Metal/JtmpJ/src/compiler/execution.jl:182 [inlined]
 [16] macro expansion
    @ ~/.julia/packages/Metal/JtmpJ/src/compiler/execution.jl:85 [inlined]
 [17] (::KernelAbstractions.Kernel{…})(args::MtlVector{…}; ndrange::Int64, workgroupsize::Nothing)
    @ Metal.MetalKernels ~/.julia/packages/Metal/JtmpJ/src/MetalKernels.jl:110
 [18] top-level scope
    @ ~/Prog/Julia/Packages/Atomix.jl-fork/prototype/atomic_add_test.jl:14
 [19] include(fname::String)
    @ Main ./sysimg.jl:38
 [20] top-level scope
    @ REPL[12]:1
 [21] top-level scope
    @ ~/.julia/packages/Metal/JtmpJ/src/initialization.jl:72
in expression starting at /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/atomic_add_test.jl:14

caused by: NSError: Failed to materializeAll. (AGXMetalG15X_M1, code 3)
Stacktrace:
  [1] Metal.MTL.MTLComputePipelineState(dev::Metal.MTL.MTLDeviceInstance, fun::Metal.MTL.MTLFunctionInstance)
    @ Metal.MTL ~/.julia/packages/Metal/JtmpJ/lib/mtl/compute_pipeline.jl:60
  [2] macro expansion
    @ ~/.julia/packages/Metal/JtmpJ/src/compiler/compilation.jl:183 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/ObjectiveC/C7BVt/src/os.jl:264 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/Metal/JtmpJ/src/compiler/compilation.jl:178 [inlined]
  [5] (::Metal.var"#171#172"{Bool, GPUCompiler.CompilerJob{…}, @NamedTuple{…}})()
    @ Metal ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:637
  [6] macro expansion
    @ ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:565 [inlined]
  [7] macro expansion
    @ ./lock.jl:273 [inlined]
  [8] ObjectiveC.Foundation.NSAutoreleasePool(f::Metal.var"#171#172"{Bool, GPUCompiler.CompilerJob{…}, @NamedTuple{…}})
    @ ObjectiveC.Foundation ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:557
  [9] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{image::Vector{UInt8}, entry::String}; return_function::Bool)
    @ Metal ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:636
 [10] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/execution.jl:262
 [11] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/execution.jl:151
 [12] macro expansion
    @ ~/.julia/packages/Metal/JtmpJ/src/compiler/execution.jl:189 [inlined]
 [13] macro expansion
    @ ./lock.jl:273 [inlined]
 [14] mtlfunction(f::typeof(gpu_atomic_add_ka!), tt::Type{Tuple{…}}; name::Nothing, kwargs::@Kwargs{})
    @ Metal ~/.julia/packages/Metal/JtmpJ/src/compiler/execution.jl:184
 [15] mtlfunction
    @ ~/.julia/packages/Metal/JtmpJ/src/compiler/execution.jl:182 [inlined]
 [16] macro expansion
    @ ~/.julia/packages/Metal/JtmpJ/src/compiler/execution.jl:85 [inlined]
 [17] (::KernelAbstractions.Kernel{…})(args::MtlVector{…}; ndrange::Int64, workgroupsize::Nothing)
    @ Metal.MetalKernels ~/.julia/packages/Metal/JtmpJ/src/MetalKernels.jl:110
 [18] top-level scope
    @ ~/Prog/Julia/Packages/Atomix.jl-fork/prototype/atomic_add_test.jl:14
 [19] include(fname::String)
    @ Main ./sysimg.jl:38
 [20] top-level scope
    @ REPL[12]:1
 [21] top-level scope
    @ ~/.julia/packages/Metal/JtmpJ/src/initialization.jl:72
Some type information was truncated. Use `show(err)` to see complete types.

When running @macroexpand on the kernel I see:

julia> @macroexpand @kernel cpu=false function atomic_add_ka!(v)
           i = @index(Global)
           Atomix.@atomic v[i] += eltype(v)(1)
       end
quote
    function gpu_atomic_add_ka!(__ctx__, v; )
        let
            #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:96 =#
            if (KernelAbstractions.__validindex)(__ctx__)
                #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:97 =#
                begin
                    #= REPL[15]:1 =#
                    #= REPL[15]:2 =#
                    i = KernelAbstractions.__index_Global_Linear(__ctx__)
                    #= REPL[15]:3 =#
                    ((Atomix.Internal.Atomix).modify!((Atomix.Internal.referenceable(v))[i], +, (eltype(v))(1), UnsafeAtomics.seq_cst))[2]
                end
            end
            #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:99 =#
            return nothing
        end
    end
    begin
        #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:54 =#
        if !($(Expr(:isdefined, :atomic_add_ka!)))
            #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:55 =#
            begin
                $(Expr(:meta, :doc))
                atomic_add_ka!(dev) = begin
                        #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:55 =#
                        atomic_add_ka!(dev, (KernelAbstractions.NDIteration.DynamicSize)(), (KernelAbstractions.NDIteration.DynamicSize)())
                    end
            end
            #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:56 =#
            atomic_add_ka!(dev, size) = begin
                    #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:56 =#
                    atomic_add_ka!(dev, (KernelAbstractions.NDIteration.StaticSize)(size), (KernelAbstractions.NDIteration.DynamicSize)())
                end
            #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:57 =#
            atomic_add_ka!(dev, size, range) = begin
                    #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:57 =#
                    atomic_add_ka!(dev, (KernelAbstractions.NDIteration.StaticSize)(size), (KernelAbstractions.NDIteration.StaticSize)(range))
                end
            #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:58 =#
            function atomic_add_ka!(dev::Dev, sz::S, range::NDRange) where {Dev, S <: KernelAbstractions.NDIteration._Size, NDRange <: KernelAbstractions.NDIteration._Size}
                #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:58 =#
                #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:59 =#
                if (KernelAbstractions.isgpu)(dev)
                    #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:60 =#
                    return (KernelAbstractions.construct)(dev, sz, range, gpu_atomic_add_ka!)
                else
                    #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:62 =#
                    if false
                        #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:63 =#
                        return (KernelAbstractions.construct)(dev, sz, range, cpu_atomic_add_ka!)
                    else
                        #= /Users/anicusan/Prog/Julia/Packages/Atomix.jl-fork/prototype/KernelAbstractions.jl/src/macros.jl:65 =#
                        error("This kernel is unavailable for backend CPU")
                    end
                end
            end
        end
    end
end

Metal.versioninfo() gives me:

julia> Metal.versioninfo()
macOS 15.0.1, Darwin 24.0.0

Toolchain:
- Julia: 1.11.1
- LLVM: 16.0.6

Julia packages: 
- Metal.jl: 1.4.2
- GPUArrays: 10.3.1
- GPUCompiler: 0.27.8
- KernelAbstractions: 0.9.29
- ObjectiveC: 3.1.0
- LLVM: 9.1.3
- LLVMDowngrader_jll: 0.3.0+2

1 device:
- Apple M3 Max (464.000 KiB allocated)

And versioninfo() gives me:

julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 14 × Apple M3 Max
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m3)
Threads: 1 default, 0 interactive, 1 GC (on 10 virtual cores)

I do not know enough about the KernelAbstractions/JuliaGPU compilation pipeline to debug this further - @vchuravy or @maleadt , would you have any pointers?

anicusan · 2024-11-06T15:11:26Z

If we make this work, I will implement AtomixoneAPI too, so Atomix v1.0 can work on all KernelAbstractions backends. Is AMDGPU using the same Atomix backend for atomics? If so, I can write AtomixAMDGPU in a similar way, but I don't have an AMD GPU on hand to test it.

maleadt · 2024-11-06T15:16:55Z

Hmm, that doesn't bode well. Some bad LLVM IR is being generated here, causing the Metal back-end compiler to abort. Can you upload the metallib from the error message here?

anicusan · 2024-11-06T15:21:50Z

Here's a zip with the jl_tgfgHYNGNa.metallib generated:
jl_tgfgHYNGNa.zip

tgymnich · 2024-11-06T21:08:30Z

are there any plans to add support for 64-bit atomics, at least when natively supported on the M2 and up?

added an issue to track progress on this: JuliaGPU/Metal.jl#477

maleadt · 2024-11-07T11:06:44Z

Here's a zip with the jl_tgfgHYNGNa.metallib generated:
jl_tgfgHYNGNa.zip

Invalid record (Producer: 'LLVM16.0.6' Reader: 'LLVM 16.0.6jl')

Are you using a non-official version of Julia? If so, it's strongly recommended to use juliaup with official builds.

anicusan · 2024-11-07T11:09:47Z

Hi, no, I am using the official Julia distribution installed via Juliaup - see my versioninfo in the previous comment. Then again, normal Metal kernels do work - it's just atomics that produce the error above.

maleadt · 2024-11-07T11:25:38Z

It looks like the IR is actually corrupt -- the producer/reader mismatch is only a red herring. Probably an issue with our IR downgrader. I'll take a closer look.

maleadt · 2024-11-07T12:14:59Z

The invalid IR comes from atomicrmw not being supported by the downgrader. I can fix that, but it won't help you, as Metal doesn't support LLVM's native atomics. The fact that an atomicrmw is being emitted by Atomix.jl, indicates that this PR is incomplete. I guess it should use atomic_fetch_OP_explicit instead, see https://github.com/JuliaGPU/Metal.jl/blob/9019b56c05055db3e5dcd93ca0d08bf264c908cd/src/device/intrinsics/atomics.jl#L205-L238 for how Metal.jl implements this for Metal.@atomic.

EDIT: The unsupported IR in question:

define void @kernel(ptr %ptr) {
  %1 = atomicrmw add ptr %ptr, i32 0 monotonic, align 4
  %2 = cmpxchg ptr %ptr, i32 0, i32 1 monotonic monotonic
  ret void
}

anicusan · 2024-11-07T12:22:38Z

That's odd - why did the tests (copied verbatim from AtomixCUDA) work then? I'm asking to 1) understand it, and 2) then write a test covering this failed case.

The operations in Atomix.modify! already use atomic_fetch_OP_explicit as defined in Metal.jl - this is what I wrote:

@inline function Atomix.modify!(ref::MtlIndexableRef, op::OP, x, order) where {OP}
    x = convert(eltype(ref), x)
    ptr = Atomix.pointer(ref)
    begin
        old = if op === (+)
            Metal.atomic_fetch_add_explicit(ptr, x)
        elseif op === (-)
            Metal.atomic_fetch_sub_explicit(ptr, x)
        elseif op === (&)
            Metal.atomic_fetch_and_explicit(ptr, x)
        elseif op === (|)
            Metal.atomic_fetch_or_explicit(ptr, x)
        elseif op === xor
            Metal.atomic_fetch_xor_explicit(ptr, x)
        elseif op === min
            Metal.atomic_fetch_min_explicit(ptr, x)
        elseif op === max
            Metal.atomic_fetch_max_explicit(ptr, x)
        else
            error("not implemented")
        end
    end
    return old => op(old, x)
end

maleadt · 2024-11-07T13:30:23Z

That's odd - why did the tests (copied verbatim from AtomixCUDA) work then?

Presumably because those tests didn't trigger atomicrmw emission? In CUDA, the LLVM atomics are partially supported, explaining why this isn't needed for CUDA.jl.

Here's where the atomicrmw comes from (you can see this using @device_code_llvm):

; │┌ @ /Users/tim/Julia/pkg/Atomix/src/core.jl:30 within `modify!`
; ││┌ @ /Users/tim/Julia/pkg/Atomix/src/references.jl:99 within `pointer` @ /Users/tim/Julia/pkg/Metal/src/device/array.jl:64
; │││┌ @ abstractarray.jl:1236 within `_memory_offset`
; ││││┌ @ int.jl:88 within `*`
       %16 = shl nuw nsw i64 %14, 2
       %17 = add nsw i64 %16, -4
; │││└└
; │││┌ @ /Users/tim/.julia/packages/LLVM/wMjUU/src/interop/pointer.jl:147 within `+`
; ││││┌ @ /Users/tim/.julia/packages/LLVM/wMjUU/src/interop/pointer.jl:114 within `add_ptr`
; │││││┌ @ /Users/tim/.julia/packages/LLVM/wMjUU/src/interop/pointer.jl:114 within `macro expansion` @ /Users/tim/.julia/packages/LLVM/wMjUU/src/interop/base.jl:39
        %18 = getelementptr i8, i8 addrspace(1)* %.unpack, i64 %17
; ││└└└└
; ││ @ /Users/tim/Julia/pkg/Atomix/src/core.jl:33 within `modify!` @ /Users/tim/.julia/packages/UnsafeAtomicsLLVM/LPqS5/src/internal.jl:23 @ /Users/tim/.julia/packages/UnsafeAtomicsLLVM/LPqS5/src/internal.jl:23
; ││┌ @ /Users/tim/.julia/packages/UnsafeAtomicsLLVM/LPqS5/src/atomics.jl:399 within `atomic_pointermodify`
; │││┌ @ /Users/tim/.julia/packages/UnsafeAtomicsLLVM/LPqS5/src/atomics.jl:260 within `llvm_atomic_op`
; ││││┌ @ /Users/tim/.julia/packages/UnsafeAtomicsLLVM/LPqS5/src/atomics.jl:260 within `macro expansion` @ /Users/tim/.julia/packages/LLVM/wMjUU/src/interop/base.jl:39
       %19 = bitcast i8 addrspace(1)* %18 to i32 addrspace(1)*
       %20 = atomicrmw add i32 addrspace(1)* %19, i32 1 seq_cst, align 4

Interestingly though, after fixing the downgrader your example does just work. I guess Apple has recently added some support for native LLVM atomics to Metal? Something to look into, but if you want to be sure I'd try to use the explicit AIR atomic intrinsics where possible for now.

anicusan · 2024-11-07T14:49:55Z

First, thank you again for taking the time to investigate all this.

The AtomixMetal tests do end up using our implementation of Atomix.modify!(ref::MtlIndexableRef, ...), which forwards the calls to Metal.atomic_fetch_OP_explicit (these are the explicit AIR intrinsics you mentioned, right?).

The odd part is that KernelAbstractions kernels using atomics end up emitting LLVM IR for them, and not anything via AtomixMetal - as seen in your call stack, the IR comes from UnsafeAtomicsLLVM, completely circumventing AtomixMetal. If CUDA does support LLVM IR, then that explains why it was working in CUDA, even if it was not actually using AtomixCUDA, just UnsafeAtomicsLLVM directly.

But wasn't KernelAbstractions use of Atomix supposed to use the right Atomix{backend} package? Or are they not needed anymore?

If it seems we now accidentally have Metal atomics in KA because of LLVM IR I'm happy, but I'm still curious about the AtomixMetal / AtomixCUDA stacks, which may still be needed for other backends in the future.

maleadt · 2024-11-07T14:53:33Z

But wasn't KernelAbstractions use of Atomix supposed to use the right Atomix{backend} package?

I thought so as well; cc @vchuravy.

Note that a better design would be to use LLVM atomics everywhere and do the lowering to backend-specific intrinsics (like AIR's) in GPUCompiler.jl, but that's a redesign I don't have the time for (JuliaGPU/GPUCompiler.jl#479).

pxl-th · 2024-11-07T16:27:10Z

Is AMDGPU using the same Atomix backend for atomics?

AMDGPU.jl already uses Atomix for atomics and it does not need any special handling, since we rely directly on LLVM atomics for this.
The only special bit is that we specify syncscope to enable hardware FP atomics.

christiangnrd · 2024-11-07T18:55:49Z

using KernelAbstractions
using Atomix
using Metal

@kernel cpu=false function atomic_add_ka!(v)
    i = @index(Global)
    Atomix.@atomic v[i] += eltype(v)(1)
end

v = Metal.zeros(Int32, 1000)
atomic_add_ka!(get_backend(v), 128)(v, ndrange=length(v))
@assert all(Array(v) .== 1)

I believe this fails because you forgot to using AtomixMetal, whereas the tests pass because they do load it.

I think both AtomixCUDA and AtomixMetal should be deprecated and converted to extensions. It might even make more sense to have those extensions live in their respective repositories instead of here. Although it may be easier for CI if they live here.

maleadt · 2024-11-07T19:03:15Z

I believe this fails because you forgot to using AtomixMetal, whereas the tests pass because they do load it.

That seems to be correct.

; │┌ @ /Users/tim/Julia/pkg/Atomix/lib/AtomixMetal/src/AtomixMetal.jl:35 within `modify!`
; ││┌ @ /Users/tim/Julia/pkg/Atomix/src/references.jl:99 within `pointer` @ /Users/tim/Julia/pkg/Metal/src/device/array.jl:64
; │││┌ @ abstractarray.jl:1236 within `_memory_offset`
; ││││┌ @ int.jl:88 within `*`
       %16 = shl nuw nsw i64 %14, 2
       %17 = add nsw i64 %16, -4
; │││└└
; │││┌ @ /Users/tim/.julia/packages/LLVM/wMjUU/src/interop/pointer.jl:147 within `+`
; ││││┌ @ /Users/tim/.julia/packages/LLVM/wMjUU/src/interop/pointer.jl:114 within `add_ptr`
; │││││┌ @ /Users/tim/.julia/packages/LLVM/wMjUU/src/interop/pointer.jl:114 within `macro expansion` @ /Users/tim/.julia/packages/LLVM/wMjUU/src/interop/base.jl:39
        %18 = getelementptr i8, i8 addrspace(1)* %.unpack, i64 %17
; ││└└└└
; ││ @ /Users/tim/Julia/pkg/Atomix/lib/AtomixMetal/src/AtomixMetal.jl:38 within `modify!`
; ││┌ @ /Users/tim/Julia/pkg/Metal/src/device/intrinsics/atomics.jl:84 within `atomic_fetch_add_explicit`
; │││┌ @ /Users/tim/.julia/packages/LLVM/wMjUU/src/interop/pointer.jl:344 within `macro expansion`
; ││││┌ @ /Users/tim/.julia/packages/LLVM/wMjUU/src/interop/pointer.jl:182 within `_typed_llvmcall`
; │││││┌ @ /Users/tim/.julia/packages/LLVM/wMjUU/src/interop/pointer.jl:182 within `macro expansion` @ /Users/tim/.julia/packages/LLVM/wMjUU/src/interop/base.jl:39
        %19 = bitcast i8 addrspace(1)* %18 to i32 addrspace(1)*
        %20 = call i32 @air.atomic.global.add.s.i32(i32 addrspace(1)* %19, i32 1, i32 0, i32 2, i1 true)

christiangnrd · 2024-11-07T23:39:15Z

I opened JuliaGPU/CUDA.jl#2549. If it works out, I think we should do the same with the Metal backend instead of a new package.

maleadt · 2024-11-08T12:47:44Z

I suggest we merge this, and then focus on converting the current subpackages to extensions (as proposed by @christiangnrd) before releasing v1.0.

christiangnrd · 2024-11-08T14:29:15Z

I'll have a PR for extensions within a couple hours.

Unless it's already being worked on. Then I won't bother finishing it.

maleadt · 2024-11-08T15:40:28Z

I'll have a PR for extensions within a couple hours.

Awesome! I won't be able to get to it this week, so feel free 🙂

anicusan · 2024-11-08T22:05:29Z

Thanks for all the help in this conversation! Yes, adding using AtomixMetal explicitly fixes it:

using KernelAbstractions
using Atomix
using AtomixMetal
using Metal

# Have two threads concurrently increment each element
@kernel cpu=false function atomic_add_ka!(v)
    i = @index(Global)
    Atomix.@atomic v[(i - 1) ÷ 2 + 1] += eltype(v)(1)
end

v = Metal.zeros(Int32, 1000)
atomic_add_ka!(get_backend(v), 128)(v, ndrange=length(v) * 2)
@assert all(Array(v) .== 2)

This is a bit surprising - @christiangnrd 's PR will be very useful in codebases using KernelAbstractions with atomics.

Finally - does the oneAPI backend support UnsafeAtomicsLLVM directly, or do we need a similar AtomixoneAPI? (my ancient Intel machine decided to kick the bucket today and I can't test it)

maleadt · 2024-11-11T07:59:14Z

Finally - does the oneAPI backend support UnsafeAtomicsLLVM directly, or do we need a similar AtomixoneAPI? (my ancient Intel machine decided to kick the bucket today and I can't test it)

oneAPI.jl and OpenCL.jl use SPIRVIntrinsics.jl, which (contrary to its name) currently rely on OpenCL-style atomics, which are explicit function calls detected by the back-end: https://github.com/JuliaGPU/OpenCL.jl/blob/master/lib/intrinsics/src/atomic.jl
So we'll need a specific Atomix extension as well.

christiangnrd · 2024-11-12T12:52:37Z

I'll have a PR for extensions within a couple hours.

Unless it's already being worked on. Then I won't bother finishing it.

Ok didn't have the free time I expected to work on this and I don't know when I'll be able to get to it, so someone else should probably pick this up if they need this.

anicusan · 2024-11-12T13:56:38Z

@christiangnrd I made a pull request for this: #42
Refactored the libs into extensions and added a oneAPI backend; the GPU CI passes.

added Metal atomics backend following AtomixCUDA package - almost ver…

3f5b2c4

…batim

anicusan mentioned this pull request Oct 30, 2024

KernelAbstractions: add Atomix back-end JuliaGPU/Metal.jl#218

Closed

vchuravy requested a review from maleadt November 4, 2024 14:24

maleadt added 6 commits November 6, 2024 13:56

Drop nightly.

ff668f6

Improve CI.

afe889d

Add Buildkite pipeline.

847fde1

Ignore coverage files.

213c2a1

Fix GH:A.

12db8dd

Fix docs.

0cd4e56

maleadt added the enhancement New feature or request label Nov 6, 2024

Bump version.

5c6dda8

maleadt force-pushed the main branch from f916f0b to 5c6dda8 Compare November 6, 2024 13:20

maleadt approved these changes Nov 6, 2024

View reviewed changes

This was referenced Nov 7, 2024

Native LLVM atomics JuliaGPU/Metal.jl#478

Open

LLVM: Bump downgrader to v0.4. JuliaPackaging/Yggdrasil#9754

Merged

christiangnrd mentioned this pull request Nov 7, 2024

Atomix Extension JuliaGPU/CUDA.jl#2549

Closed

maleadt merged commit bb8edda into JuliaConcurrent:main Nov 8, 2024
5 of 8 checks passed

CarloLucibello mentioned this pull request Nov 10, 2024

Support Metal JuliaGraphs/GraphNeuralNetworks.jl#344

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Metal atomics backend following the AtomixCUDA package - almost verbatim. #39

Add Metal atomics backend following the AtomixCUDA package - almost verbatim. #39

anicusan commented Oct 30, 2024 •

edited

Loading

maleadt left a comment

maleadt commented Nov 6, 2024

anicusan commented Nov 6, 2024

anicusan commented Nov 6, 2024

maleadt commented Nov 6, 2024

anicusan commented Nov 6, 2024

tgymnich commented Nov 6, 2024 •

edited by maleadt

Loading

maleadt commented Nov 7, 2024

anicusan commented Nov 7, 2024

maleadt commented Nov 7, 2024

maleadt commented Nov 7, 2024 •

edited

Loading

anicusan commented Nov 7, 2024

maleadt commented Nov 7, 2024

anicusan commented Nov 7, 2024

maleadt commented Nov 7, 2024

pxl-th commented Nov 7, 2024 •

edited

Loading

christiangnrd commented Nov 7, 2024

maleadt commented Nov 7, 2024

christiangnrd commented Nov 7, 2024

maleadt commented Nov 8, 2024

christiangnrd commented Nov 8, 2024 •

edited

Loading

maleadt commented Nov 8, 2024

anicusan commented Nov 8, 2024

maleadt commented Nov 11, 2024

christiangnrd commented Nov 12, 2024 •

edited

Loading

anicusan commented Nov 12, 2024

Add Metal atomics backend following the AtomixCUDA package - almost verbatim. #39

Add Metal atomics backend following the AtomixCUDA package - almost verbatim. #39

Conversation

anicusan commented Oct 30, 2024 • edited Loading

maleadt left a comment

Choose a reason for hiding this comment

maleadt commented Nov 6, 2024

anicusan commented Nov 6, 2024

anicusan commented Nov 6, 2024

maleadt commented Nov 6, 2024

anicusan commented Nov 6, 2024

tgymnich commented Nov 6, 2024 • edited by maleadt Loading

maleadt commented Nov 7, 2024

anicusan commented Nov 7, 2024

maleadt commented Nov 7, 2024

maleadt commented Nov 7, 2024 • edited Loading

anicusan commented Nov 7, 2024

maleadt commented Nov 7, 2024

anicusan commented Nov 7, 2024

maleadt commented Nov 7, 2024

pxl-th commented Nov 7, 2024 • edited Loading

christiangnrd commented Nov 7, 2024

maleadt commented Nov 7, 2024

christiangnrd commented Nov 7, 2024

maleadt commented Nov 8, 2024

christiangnrd commented Nov 8, 2024 • edited Loading

maleadt commented Nov 8, 2024

anicusan commented Nov 8, 2024

maleadt commented Nov 11, 2024

christiangnrd commented Nov 12, 2024 • edited Loading

anicusan commented Nov 12, 2024

anicusan commented Oct 30, 2024 •

edited

Loading

tgymnich commented Nov 6, 2024 •

edited by maleadt

Loading

maleadt commented Nov 7, 2024 •

edited

Loading

pxl-th commented Nov 7, 2024 •

edited

Loading

christiangnrd commented Nov 8, 2024 •

edited

Loading

christiangnrd commented Nov 12, 2024 •

edited

Loading