Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

permutedims errors on high dimensional tensors #333

Closed
GiggleLiu opened this issue Nov 9, 2020 · 4 comments · Fixed by JuliaLang/julia#40468
Closed

permutedims errors on high dimensional tensors #333

GiggleLiu opened this issue Nov 9, 2020 · 4 comments · Fixed by JuliaLang/julia#40468

Comments

@GiggleLiu
Copy link
Contributor

GiggleLiu commented Nov 9, 2020

The line errors

function LinearAlgebra.permutedims!(dest::AbstractGPUArray, src::AbstractGPUArray, perm) where N

julia> CUDA.permutedims(CuArray(randn(fill(2, 20)...)), randperm(20));
ERROR: InvalidIRError: compiling kernel #41(CUDA.CuKernelContext, CuDeviceArray{Float64,20,1}, CuDeviceArray{Float64,20,1}, NTuple{20,Int64}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to getindex)
Stacktrace:
 [1] macro expansion at /home/jgliu/.julia/packages/GPUArrays/ZxsKE/src/device/indexing.jl:81
 [2] #41 at /home/jgliu/.julia/packages/GPUArrays/ZxsKE/src/host/linalg.jl:203
Reason: unsupported dynamic function invocation (call to getindex)
Stacktrace:
 [1] #41 at /home/jgliu/.julia/packages/GPUArrays/ZxsKE/src/host/linalg.jl:204
Reason: unsupported dynamic function invocation (call to genperm)
Stacktrace:
 [1] #41 at /home/jgliu/.julia/packages/GPUArrays/ZxsKE/src/host/linalg.jl:204
Reason: unsupported dynamic function invocation (call to setindex!(A::AbstractArray, v, I...) in Base at abstractarray.jl:1150)
Stacktrace:
 [1] #41 at /home/jgliu/.julia/packages/GPUArrays/ZxsKE/src/host/linalg.jl:204
Reason: unsupported dynamic function invocation (call to setindex!)
Stacktrace:
 [1] #41 at /home/jgliu/.julia/packages/GPUArrays/ZxsKE/src/host/linalg.jl:204
Reason: unsupported call through a literal pointer (call to jl_alloc_array_1d)
Stacktrace:
 [1] Array at boot.jl:406
 [2] map at tuple.jl:168
 [3] axes at abstractarray.jl:75
 [4] CartesianIndices at multidimensional.jl:264
 [5] macro expansion at /home/jgliu/.julia/packages/GPUArrays/ZxsKE/src/device/indexing.jl:81
 [6] #41 at /home/jgliu/.julia/packages/GPUArrays/ZxsKE/src/host/linalg.jl:203
Reason: unsupported call to the Julia runtime (call to jl_f__apply_iterate)
Stacktrace:
 [1] map at tuple.jl:172
 [2] axes at abstractarray.jl:75
 [3] CartesianIndices at multidimensional.jl:264
 [4] macro expansion at /home/jgliu/.julia/packages/GPUArrays/ZxsKE/src/device/indexing.jl:81
 [5] #41 at /home/jgliu/.julia/packages/GPUArrays/ZxsKE/src/host/linalg.jl:203
Reason: unsupported dynamic function invocation (call to CartesianIndices)
Stacktrace:
 [1] CartesianIndices at multidimensional.jl:264
 [2] macro expansion at /home/jgliu/.julia/packages/GPUArrays/ZxsKE/src/device/indexing.jl:81
 [3] #41 at /home/jgliu/.julia/packages/GPUArrays/ZxsKE/src/host/linalg.jl:203
Stacktrace:
 [1] check_ir(::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget,CUDA.CUDACompilerParams}, ::LLVM.Module) at /home/jgliu/.julia/packages/GPUCompiler/uTpNx/src/validation.jl:123
 [2] macro expansion at /home/jgliu/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:239 [inlined]
 [3] macro expansion at /home/jgliu/.julia/packages/TimerOutputs/ZmKD7/src/TimerOutput.jl:206 [inlined]
 [4] codegen(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at /home/jgliu/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:237
 [5] compile(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at /home/jgliu/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:39
 [6] compile at /home/jgliu/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:35 [inlined]
 [7] cufunction_compile(::GPUCompiler.FunctionSpec; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/jgliu/.julia/packages/CUDA/0p5fn/src/compiler/execution.jl:310
 [8] cufunction_compile(::GPUCompiler.FunctionSpec) at /home/jgliu/.julia/packages/CUDA/0p5fn/src/compiler/execution.jl:305
 [9] check_cache(::Dict{UInt64,Any}, ::Any, ::Any, ::GPUCompiler.FunctionSpec{GPUArrays.var"#41#42",Tuple{CUDA.CuKernelContext,CuDeviceArray{Float64,20,1},CuDeviceArray{Float64,20,1},NTuple{20,Int64}}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/jgliu/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:40
 [10] #41 at /home/jgliu/.julia/packages/GPUArrays/ZxsKE/src/host/linalg.jl:203 [inlined]
 [11] cached_compilation at /home/jgliu/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:65 [inlined]
 [12] cufunction(::GPUArrays.var"#41#42", ::Type{Tuple{CUDA.CuKernelContext,CuDeviceArray{Float64,20,1},CuDeviceArray{Float64,20,1},NTuple{20,Int64}}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/jgliu/.julia/packages/CUDA/0p5fn/src/compiler/execution.jl:297
 [13] cufunction(::GPUArrays.var"#41#42", ::Type{Tuple{CUDA.CuKernelContext,CuDeviceArray{Float64,20,1},CuDeviceArray{Float64,20,1},NTuple{20,Int64}}}) at /home/jgliu/.julia/packages/CUDA/0p5fn/src/compiler/execution.jl:294
 [14] #launch_heuristic#852 at /home/jgliu/.julia/packages/CUDA/0p5fn/src/gpuarrays.jl:19 [inlined]
 [15] launch_heuristic at /home/jgliu/.julia/packages/CUDA/0p5fn/src/gpuarrays.jl:17 [inlined]
 [16] gpu_call(::GPUArrays.var"#41#42", ::CuArray{Float64,20}, ::CuArray{Float64,20}, ::NTuple{20,Int64}; target::CuArray{Float64,20}, total_threads::Nothing, threads::Nothing, blocks::Nothing, name::String) at /home/jgliu/.julia/packages/GPUArrays/ZxsKE/src/device/execution.jl:61
 [17] permutedims!(::CuArray{Float64,20}, ::CuArray{Float64,20}, ::Array{Int64,1}) at /home/jgliu/.julia/packages/GPUArrays/ZxsKE/src/host/linalg.jl:202
 [18] permutedims(::CuArray{Float64,20}, ::Array{Int64,1}) at ./multidimensional.jl:1381
 [19] top-level scope at REPL[12]:1

I think this is the 16 size tuple issue. Is there some easy approach to circumvent this error?

@GiggleLiu
Copy link
Contributor Author

GiggleLiu commented Nov 9, 2020

I fixed the above error. The good news is it does not error on 16-18 dimensional arrays anymore. The bad news is we got a new error

julia> CUDA.permutedims(CuArray(randn(fill(2, 18)...)), randperm(18));

julia> CUDA.permutedims(CuArray(randn(fill(2, 20)...)), randperm(20));

julia> CUDA.permutedims(CuArray(randn(fill(2, 20)...)), randperm(20));
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
 [1] throw_api_error(::CUDA.cudaError_enum) at /home/jgliu/.julia/dev/CUDA/lib/cudadrv/error.jl:97
 [2] macro expansion at /home/jgliu/.julia/dev/CUDA/lib/cudadrv/error.jl:104 [inlined]
 [3] cuMemcpyHtoD_v2(::CuPtr{Float64}, ::Ptr{Float64}, ::Int64) at /home/jgliu/.julia/dev/CUDA/lib/utils/call.jl:93
 [4] #unsafe_copyto!#7 at /home/jgliu/.julia/dev/CUDA/lib/cudadrv/memory.jl:395 [inlined]
 [5] unsafe_copyto! at /home/jgliu/.julia/dev/CUDA/lib/cudadrv/memory.jl:388 [inlined]
 [6] unsafe_copyto!(::CuArray{Float64,20}, ::Int64, ::Array{Float64,20}, ::Int64, ::Int64) at /home/jgliu/.julia/dev/CUDA/src/array.jl:290
 [7] copyto!(::CuArray{Float64,20}, ::Int64, ::Array{Float64,20}, ::Int64, ::Int64) at /home/jgliu/.julia/dev/CUDA/src/array.jl:254
 [8] copyto! at /home/jgliu/.julia/dev/CUDA/src/array.jl:258 [inlined]
 [9] CuArray at /home/jgliu/.julia/dev/CUDA/src/array.jl:191 [inlined]
 [10] CuArray(::Array{Float64,20}) at /home/jgliu/.julia/dev/CUDA/src/array.jl:198
 [11] top-level scope at REPL[5]:1

When exiting julia REPL, I see the following error

error in running finalizer: CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc), meta=nothing)

This error message is very similar to
JuliaGPU/CUDA.jl#94

I have submitted a WIP PR #334 to better inspect this issue.

@GiggleLiu
Copy link
Contributor Author

The memory check

(base) ➜  project git:(random-tensor) ✗ cuda-memcheck julia -e "using TropicalTensors, CuYao, CUDA, Random; CUDA.permutedims(CuArray(randn(fill(2, 20)...)), randperm(20));"
========= CUDA-MEMCHECK
========= Out-of-range Shared or Local Address
=========     at 0x000000e0 in __cuda_syscall_mc_dyn_globallock_check
=========     by thread (192,0,0) in block (120,0,0)
=========     Device Frame:julia_permutedims_(CuKernelContext, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, Tuple<Int64, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>, CuDeviceArray<Float64, int=20, int=1>>) ($_Z18julia_permutedims_15CuKernelContext13CuDeviceArrayI7Float64Li20ELi1EES0_IS1_Li20ELi1EE5TupleI5Int64S3_S3_S3_S3_S3_S3_S3_S3_S3_S3_S3_S3_S3_S3_S3_S3_S
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel_ptsz + 0x346) [0x2bcb86]
=========     Host Frame:[0x7f0200e1642d]
=========     Host Frame:[0x7f0200e1673d]
=========     Host Frame:[0x7f0200e16b0f]
=========     Host Frame:[0x7f0200e16bc2]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 (jl_apply_generic + 0x228) [0xb7c58]
=========     Host Frame:[0x7f022b0c8542]
=========     Host Frame:[0x7f022b0c8d1c]
=========     Host Frame:[0x7f022b0c8f39]
=========     Host Frame:[0x7f022b0c8fc9]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 (jl_apply_generic + 0x228) [0xb7c58]
=========     Host Frame:[0x7f022b0c7cec]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 (jl_apply_generic + 0x228) [0xb7c58]
=========     Host Frame:[0x7f022b0c76d6]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 (jl_apply_generic + 0x228) [0xb7c58]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 [0xd1426]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 [0xd1070]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 [0xd1a74]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 [0xd2cd8]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 [0xefeea]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 [0xef9ea]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 [0xef9ea]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 (jl_toplevel_eval_in + 0xb1) [0xf1161]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/lib/julia/sys.so [0xb3ef52]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 (jl_apply_generic + 0x6f3) [0xb8123]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/lib/julia/sys.so [0x1039b0b]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/lib/julia/sys.so [0xa1a9d1]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/lib/julia/sys.so [0xa1ab26]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 (jl_apply_generic + 0x6f3) [0xb8123]
=========     Host Frame:julia [0x1932]
=========     Host Frame:julia [0x1534]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:julia [0x15d5]
=========
========= Program hit CUDA_ERROR_LAUNCH_FAILED (error 719) due to "unspecified launch failure" on CUDA API call to cuModuleUnload.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuModuleUnload + 0x184) [0x2ab584]
=========     Host Frame:[0x7f0200e16f73]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 (jl_apply_generic + 0x6f3) [0xb8123]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 [0x100d7f]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 [0x1018c0]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 [0x10312d]
=========     Host Frame:/home/jgliu/packages/julias/julia-1.5/bin/../lib/libjulia.so.1 (jl_atexit_hook + 0x12b) [0xd4d8b]
=========     Host Frame:julia [0x153d]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:julia [0x15d5]
=========
error in running finalizer: CUDA.CuError(code=CUDA.cudaError_enum(0x000002cf), meta=nothing)
========= ERROR SUMMARY: 2 errors

@GiggleLiu
Copy link
Contributor Author

GiggleLiu commented Nov 9, 2020

I defined something like the following, vectorize the arrays first, after permutation, I reshape it back.

function LinearAlgebra.permutedims!(dest::GPUArrays.AbstractGPUArray, src::GPUArrays.AbstractGPUArray, perm) where N
    perm isa Tuple || (perm = Tuple(perm))
    size_dest = size(dest)
    size_src = size(src)
    CUDA.gpu_call(vec(dest), vec(src), perm; name="permutedims!") do ctx, dest, src, perm
        i = @linearidx src
        I = l2c(size_src, i)
        @inbounds dest[c2l(size_dest, GPUArrays.genperm(I, perm))] = src[i]
        return
    end
    return reshape(dest, size(dest))
end

@kshyatt
Copy link
Contributor

kshyatt commented Apr 13, 2021

Just to note that I'm hitting the same problem. MWE:

sizing = [2 for ii in 1:20];
a = CUDA.rand(sizing...);
b = similar(a);
permutedims!(b, a, collect(reverse(1:18)))

leads to:

ERROR: InvalidIRError: compiling kernel permutedims_kernel(CUDA.CuKernelContext, CuDeviceArray{Float32, 20, 1}, CuDeviceArray{Float32, 20, 1}, Val{(20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)}) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to )
Stacktrace:
 [1] Array
   @ boot.jl:448
 [2] map
   @ tuple.jl:224
 [3] axes
   @ abstractarray.jl:89
 [4] CartesianIndices
   @ multidimensional.jl:279
 [5] macro expansion
   @ ~/.julia/packages/GPUArrays/bjw3g/src/device/indexing.jl:81
 [6] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/bjw3g/src/host/linalg.jl:200
Reason: unsupported call to the Julia runtime (call to jl_f__apply_iterate)
Stacktrace:
 [1] map
   @ tuple.jl:228
 [2] axes
   @ abstractarray.jl:89
 [3] CartesianIndices
   @ multidimensional.jl:279
 [4] macro expansion
   @ ~/.julia/packages/GPUArrays/bjw3g/src/device/indexing.jl:81
 [5] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/bjw3g/src/host/linalg.jl:200
Reason: unsupported dynamic function invocation (call to CartesianIndices)
Stacktrace:
 [1] CartesianIndices
   @ multidimensional.jl:279
 [2] macro expansion
   @ ~/.julia/packages/GPUArrays/bjw3g/src/device/indexing.jl:81
 [3] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/bjw3g/src/host/linalg.jl:200
Reason: unsupported dynamic function invocation (call to afoldl(op, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, qs...) in Base at operators.jl:529)
Stacktrace:
 [1] *
   @ operators.jl:540
 [2] prod
   @ tuple.jl:480
 [3] length
   @ ~/.julia/dev/CUDA/src/device/array.jl:60
 [4] macro expansion
   @ ~/.julia/packages/GPUArrays/bjw3g/src/device/indexing.jl:67
 [5] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/bjw3g/src/host/linalg.jl:200
Reason: unsupported dynamic function invocation (call to getindex)
Stacktrace:
 [1] macro expansion
   @ ~/.julia/packages/GPUArrays/bjw3g/src/device/indexing.jl:81
 [2] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/bjw3g/src/host/linalg.jl:200
Reason: unsupported call to the Julia runtime (call to jl_f_apply_type)
Stacktrace:
 [1] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/bjw3g/src/host/linalg.jl:202
Reason: unsupported call to the Julia runtime (call to jl_new_structv)
Stacktrace:
 [1] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/bjw3g/src/host/linalg.jl:202
Reason: unsupported dynamic function invocation (call to map)
Stacktrace:
 [1] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/bjw3g/src/host/linalg.jl:202
Reason: unsupported dynamic function invocation (call to CartesianIndex)
Stacktrace:
 [1] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/bjw3g/src/host/linalg.jl:202
Reason: unsupported dynamic function invocation (call to getindex)
Stacktrace:
 [1] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/bjw3g/src/host/linalg.jl:203
Reason: unsupported dynamic function invocation (call to setindex!)
Stacktrace:
 [1] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/bjw3g/src/host/linalg.jl:203
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#permutedims_kernel#44", Tuple{CUDA.CuKernelContext, CuDeviceArray{Float32, 20, 1}, CuDeviceArray{Float32, 20, 1}, Val{(20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)}}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/dev/GPUCompiler/src/validation.jl:108
  [2] macro expansion
    @ ~/.julia/dev/GPUCompiler/src/driver.jl:298 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/4QAIk/src/TimerOutput.jl:206 [inlined]
  [4] macro expansion
    @ ~/.julia/dev/GPUCompiler/src/driver.jl:296 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module, kernel::LLVM.Function; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/dev/GPUCompiler/src/utils.jl:61
  [6] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/dev/CUDA/src/compiler/execution.jl:304
  [7] check_cache
    @ ~/.julia/dev/GPUCompiler/src/cache.jl:44 [inlined]
  [8] cached_compilation
    @ ~/.julia/packages/GPUArrays/bjw3g/src/host/linalg.jl:200 [inlined]
  [9] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#permutedims_kernel#44", Tuple{CUDA.CuKernelContext, CuDeviceArray{Float32, 20, 1}, CuDeviceArray{Float32, 20, 1}, Val{(20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)}}}}, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/dev/GPUCompiler/src/cache.jl:0
 [10] cufunction(f::GPUArrays.var"#permutedims_kernel#44", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceArray{Float32, 20, 1}, CuDeviceArray{Float32, 20, 1}, Val{(20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)}}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/dev/CUDA/src/compiler/execution.jl:292
 [11] cufunction
    @ ~/.julia/dev/CUDA/src/compiler/execution.jl:286 [inlined]
 [12] macro expansion
    @ ~/.julia/dev/CUDA/src/compiler/execution.jl:102 [inlined]
 [13] #launch_heuristic#303
    @ ~/.julia/dev/CUDA/src/gpuarrays.jl:17 [inlined]
 [14] launch_heuristic
    @ ~/.julia/dev/CUDA/src/gpuarrays.jl:17 [inlined]
 [15] gpu_call(::GPUArrays.var"#permutedims_kernel#44", ::CuArray{Float32, 20}, ::CuArray{Float32, 20}, ::Val{(20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)}; target::CuArray{Float32, 20}, total_threads::Nothing, threads::Nothing, blocks::Nothing, name::Nothing)
    @ GPUArrays ~/.julia/packages/GPUArrays/bjw3g/src/device/execution.jl:61
 [16] gpu_call(::GPUArrays.var"#permutedims_kernel#44", ::CuArray{Float32, 20}, ::CuArray{Float32, 20}, ::Val{(20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)})
    @ GPUArrays ~/.julia/packages/GPUArrays/bjw3g/src/device/execution.jl:46
 [17] permutedims!(dest::CuArray{Float32, 20}, src::CuArray{Float32, 20}, perm::NTuple{20, Int64})
    @ GPUArrays ~/.julia/packages/GPUArrays/bjw3g/src/host/linalg.jl:207
 [18] permutedims!(dest::CuArray{Float32, 20}, src::CuArray{Float32, 20}, perm::Vector{Int64})
    @ GPUArrays ~/.julia/packages/GPUArrays/bjw3g/src/host/linalg.jl:212
 [19] top-level scope
    @ REPL[9]:1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants