Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt to GPUCompiler 0.18 #673

Merged
merged 10 commits into from
Apr 4, 2023
Merged

Adapt to GPUCompiler 0.18 #673

merged 10 commits into from
Apr 4, 2023

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy force-pushed the vc/gpucompiler_0.18 branch from c1d0ba9 to db587cf Compare March 15, 2023 00:53
@wsmoses wsmoses force-pushed the vc/gpucompiler_0.18 branch 2 times, most recently from ccb3047 to 2fba470 Compare March 27, 2023 04:34
@wsmoses wsmoses force-pushed the vc/gpucompiler_0.18 branch from 2fba470 to b4085a1 Compare March 27, 2023 04:36
@wsmoses wsmoses force-pushed the vc/gpucompiler_0.18 branch 5 times, most recently from e04cf32 to c506bd0 Compare March 27, 2023 05:27
@wsmoses wsmoses force-pushed the vc/gpucompiler_0.18 branch from c506bd0 to 6e2e1d7 Compare March 27, 2023 05:29
@wsmoses
Copy link
Member

wsmoses commented Mar 27, 2023

This is now mostly functioning, but performance-wise there remain some issues:


julia> @code_typed Enzyme.autodiff(Reverse, sin, Active(2.0))
CodeInfo(
1 ─ %1 = Core.getfield(args, 1)::Active{Float64}
└──      goto #3 if not true
2 ─      nothing::Nothing
3 ┄ %4 = φ (#2 => 0x0000000000008294)::UInt64
│   %5 = (Core.Compiler.return_type)(Tuple{typeof(sin), Float64}, %4)::Type
│   %6 = Enzyme.guess_activity(%5, $(QuoteNode(EnzymeCore.ReverseMode{false}())))::Any
│   %7 = Enzyme.autodiff::typeof(autodiff)
│   %8 = Core.kwcall($(QuoteNode((world = 0x0000000000008294,))), %7, $(QuoteNode(EnzymeCore.ReverseMode{false}())), $(QuoteNode(Const{typeof(sin)}(sin))), %6, %1)::Any
└──      goto #4
4 ─      goto #5
5 ─      goto #6
6 ─      return %8
) => Any


@vchuravy

@wsmoses
Copy link
Member

wsmoses commented Mar 27, 2023

@vtjnash Valentin tells me that return type with world can only be inferred on main -- which is causing breakages as we adapt to world.

Any ideas (and or possibility of getting that inferred elsewhere)?

@vtjnash
Copy link

vtjnash commented Mar 27, 2023

Oh, I had already written that comment, and github didn't post it: JuliaGPU/GPUCompiler.jl#394 (comment)

@wsmoses
Copy link
Member

wsmoses commented Mar 27, 2023 via email

@vtjnash
Copy link

vtjnash commented Mar 27, 2023

Yeah, the compiler is forbidden from propagating anything about world values, since those are not constants.

@wsmoses
Copy link
Member

wsmoses commented Mar 27, 2023

I don't understand , code_typed shows a constant int, no?

@vtjnash
Copy link

vtjnash commented Mar 27, 2023

Only because the generator lied to inference

@wsmoses
Copy link
Member

wsmoses commented Mar 27, 2023

Trialing out in the EnzymeInterpreter, @vtjnash is there a reason why this is considered harmful?

It partially resolves the issue (at least in nested AD, where we can use the enforced interpreter).

# call where the function is known exactly
function Core.Compiler.abstract_call_known(interp::EnzymeInterpreter, @nospecialize(f),
        arginfo::Core.Compiler.ArgInfo, si::Core.Compiler.StmtInfo, sv::Union{InferenceState, Core.Compiler.IRCode},
        max_methods::Int = isa(sv, InferenceState) ? get_max_methods(f, sv.mod, interp) : 0)
    (; fargs, argtypes) = arginfo
    la = length(argtypes)

    if Core.Compiler.is_return_type(f)
        wc = Base.get_world_counter()
        @show f, argtypes, interp.world, wc
        if all(x->isa(x, Core.Const), argtypes)
            if length(argtypes) == 4 && isa(argtypes[4].val, UInt64)
                world = argtypes[4].val
                if world <= wc
                    res = Core.Compiler.return_type(argtypes[2].val, argtypes[3].val, world)
                    @show res
                    info = Core.Compiler.verbose_stmt_info(interp) ? Core.Compiler.MethodResultPure(ReturnTypeCallInfo(call.info)) : Core.Compiler.MethodResultPure()
                    return Core.Compiler.CallMeta(Core.Const(res), Core.Compiler.EFFECTS_TOTAL, info)
                end
            end
            if length(argtypes) == 3 && isa(argtypes[3].val, UInt64)
                world = argtypes[3].val
                if world <= wc
                    res = Core.Compiler.return_type(argtypes[2].val, world)
                    @show res
                    info = Core.Compiler.verbose_stmt_info(interp) ? Core.Compiler.MethodResultPure(ReturnTypeCallInfo(call.info)) : Core.Compiler.MethodResultPure()
                    return Core.Compiler.CallMeta(Core.Const(res), Core.Compiler.EFFECTS_TOTAL, info)
                end
            end
        end
    end

    return Base.@invoke Core.Compiler.abstract_call_known(interp::AbstractInterpreter,
        f::Any, arginfo::Core.Compiler.ArgInfo, si::Core.Compiler.StmtInfo, sv::Union{InferenceState, Core.Compiler.IRCode}, max_methods::Int)
end

@vchuravy
Copy link
Member Author

At least wc = Base.get_world_counter() must come from the AbstractInterpreter, no?

@vtjnash
Copy link

vtjnash commented Mar 27, 2023

The information conveyed by world is not consistent, so it cannot be propagated in inference without causing mistakes in the results. As @vchuravy correctly pointed out, it must come from the AbstractInterpreter context.

@wsmoses
Copy link
Member

wsmoses commented Apr 4, 2023

works except:

mul_kernel: Error During Test at /home/wmoses/git/Enzyme.jl/test/cuda.jl:18
  Got exception outside of a @test
  MethodError: no method matching Value(::Nothing; ctx::Context)
  
  Closest candidates are:
    Value(::Metadata; ctx)
     @ LLVM ~/.julia/packages/LLVM/HykgZ/src/core/metadata.jl:53
    Value(::Ptr{LLVM.API.LLVMOpaqueValue}) got unsupported keyword argument "ctx"
     @ LLVM ~/.julia/packages/LLVM/HykgZ/src/core/value.jl:33
  
  Stacktrace:
    [1] add_kernel_state!(mod::LLVM.Module)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/irgen.jl:562
    [2] module_pass_callback(ptr::Ptr{Nothing}, data::Ptr{Nothing})
      @ LLVM ~/.julia/packages/LLVM/HykgZ/src/pass.jl:19
    [3] LLVMRunPassManager
      @ ~/.julia/packages/LLVM/HykgZ/lib/13/libLLVM_h.jl:4898 [inlined]
    [4] run!
      @ ~/.julia/packages/LLVM/HykgZ/src/passmanager.jl:39 [inlined]
    [5] macro expansion
      @ ~/.julia/packages/GPUCompiler/anMCs/src/optim.jl:241 [inlined]
    [6] macro expansion
      @ ~/.julia/packages/LLVM/HykgZ/src/base.jl:102 [inlined]
    [7] optimize!(job::CompilerJob, mod::LLVM.Module)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/optim.jl:185
    [8] macro expansion
      @ ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:366 [inlined]
    [9] macro expansion
      @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
   [10] macro expansion
      @ ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:365 [inlined]
   [11] macro expansion
      @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
   [12] macro expansion
      @ ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:355 [inlined]
   [13] emit_llvm(job::CompilerJob, method_instance::Any; libraries::Bool, deferred_codegen::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool, ctx::ThreadSafeContext)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/utils.jl:83
   [14] emit_llvm
      @ ~/.julia/packages/GPUCompiler/anMCs/src/utils.jl:77 [inlined]
   [15] compile(job::CompilerJob, ctx::ThreadSafeContext)
      @ CUDA ~/.julia/packages/CUDA/q3GG0/src/compiler/compilation.jl:106
   [16] #203
      @ ~/.julia/packages/CUDA/q3GG0/src/compiler/compilation.jl:100 [inlined]
   [17] ThreadSafeContext(f::CUDA.var"#203#204"{CompilerJob{PTXCompilerTarget, CUDA.CUDACompilerParams}})
      @ LLVM ~/.julia/packages/LLVM/HykgZ/src/executionengine/ts_module.jl:14
   [18] JuliaContext(f::CUDA.var"#203#204"{CompilerJob{PTXCompilerTarget, CUDA.CUDACompilerParams}})
      @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:74
   [19] compile
      @ ~/.julia/packages/CUDA/q3GG0/src/compiler/compilation.jl:99 [inlined]
   [20] actual_compilation(cache::Dict{UInt64, Any}, key::UInt64, cfg::CompilerConfig{PTXCompilerTarget, CUDA.CUDACompilerParams}, ft::Type, tt::Type, world::UInt64, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
      @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/cache.jl:184
   [21] cached_compilation(cache::Dict{UInt64, Any}, cfg::CompilerConfig{PTXCompilerTarget, CUDA.CUDACompilerParams}, ft::Type, tt::Type, compiler::Function, linker::Function)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/cache.jl:163
   [22] macro expansion
      @ ~/.julia/packages/CUDA/q3GG0/src/compiler/execution.jl:310 [inlined]
   [23] macro expansion
      @ ./lock.jl:267 [inlined]
   [24] cufunction(f::typeof(grad_mul_kernel), tt::Type{Tuple{CuDeviceVector{Float32, 1}, CuDeviceVector{Float32, 1}}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
      @ CUDA ~/.julia/packages/CUDA/q3GG0/src/compiler/execution.jl:306
   [25] cufunction(f::typeof(grad_mul_kernel), tt::Type{Tuple{CuDeviceVector{Float32, 1}, CuDeviceVector{Float32, 1}}})
      @ CUDA ~/.julia/packages/CUDA/q3GG0/src/compiler/execution.jl:303
   [26] macro expansion
      @ ~/.julia/packages/CUDA/q3GG0/src/compiler/execution.jl:104 [inlined]
   [27] macro expansion
      @ ~/git/Enzyme.jl/test/cuda.jl:24 [inlined]
   [28] macro expansion
      @ ~/git/Enzyme.jl/julia-1.9.0-rc1/share/julia/stdlib/v1.9/Test/src/Test.jl:1498 [inlined]
   [29] top-level scope
      @ ~/git/Enzyme.jl/test/cuda.jl:19
   [30] include(fname::String)
      @ Base.MainInclude ./client.jl:478
   [31] top-level scope
      @ ~/git/Enzyme.jl/test/runtests.jl:1909
   [32] include(fname::String)
      @ Base.MainInclude ./client.jl:478
   [33] top-level scope
      @ none:6
   [34] eval
      @ ./boot.jl:370 [inlined]
   [35] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:280
   [36] _start()
      @ Base ./client.jl:522
Test Summary: | Error  Total   Time
mul_kernel    |     1      1  10.4s

@vchuravy

@wsmoses wsmoses marked this pull request as ready for review April 4, 2023 15:40
@wsmoses wsmoses merged commit 939f9b4 into main Apr 4, 2023
@wsmoses wsmoses deleted the vc/gpucompiler_0.18 branch April 4, 2023 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants