GPU example throws errors #608

ccasert · 2021-08-19T15:59:04Z

I'm running into issues when trying out the following GPU example: https://diffeqflux.sciml.ai/dev/GPUs/.
At the line

# Make the data into a GPU-based array if the user has a GPU  
ode_data = gpu(solve(prob_trueode, Tsit5(), saveat = tsteps))

I get this error

ERROR: MethodError: Cannot `convert` an object of type ODESolution{Float32, 2, Vector{Vector{Float32}}, Nothing, Nothing, Vector{Float32}, Vector{Vector{Vector{Float32}}}, ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, SciMLBase.NullParameters, ODEFunction{true, typeof(trueODEfunc), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{true, typeof(trueODEfunc), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{Vector{Float32}}, Vector{Float32}, Vector{Vector{Vector{Float32}}}, OrdinaryDiffEq.Tsit5Cache{Vector{Float32}, Vector{Float32}, Vector{Float32}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}}}, DiffEqBase.DEStats} to an object of type Float32
Closest candidates are:
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:250
  convert(::Type{T}, ::AbstractChar) where T<:Number at char.jl:180
  convert(::Type{T}, ::CartesianIndex{1}) where T<:Number at multidimensional.jl:136
  ...
Stacktrace:
  [1] setindex!(A::Matrix{Float32}, x::ODESolution{Float32, 2, Vector{Vector{Float32}}, Nothing, Nothing, Vector{Float32}, Vector{Vector{Vector{Float32}}}, ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, SciMLBase.NullParameters, ODEFunction{true, typeof(trueODEfunc), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{true, typeof(trueODEfunc), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{Vector{Float32}}, Vector{Float32}, Vector{Vector{Vector{Float32}}}, OrdinaryDiffEq.Tsit5Cache{Vector{Float32}, Vector{Float32}, Vector{Float32}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}}}, DiffEqBase.DEStats}, i1::Int64)
    @ Base ./array.jl:839
  [2] copyto_unaliased!
    @ ./abstractarray.jl:976 [inlined]
  [3] copyto!(dest::Matrix{Float32}, src::ODESolution{Float32, 2, Vector{Vector{Float32}}, Nothing, Nothing, Vector{Float32}, Vector{Vector{Vector{Float32}}}, ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, SciMLBase.NullParameters, ODEFunction{true, typeof(trueODEfunc), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{true, typeof(trueODEfunc), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{Vector{Float32}}, Vector{Float32}, Vector{Vector{Vector{Float32}}}, OrdinaryDiffEq.Tsit5Cache{Vector{Float32}, Vector{Float32}, Vector{Float32}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}}}, DiffEqBase.DEStats})
    @ Base ./abstractarray.jl:950
  [4] copyto_axcheck!
    @ ./abstractarray.jl:1056 [inlined]
  [5] Array
    @ ./array.jl:540 [inlined]
  [6] Array
    @ ./boot.jl:473 [inlined]
  [7] convert
    @ ./array.jl:532 [inlined]
  [8] CuArray
    @ /data/packages/CUDA/VGl9W/src/array.jl:271 [inlined]
  [9] adapt_storage(#unused#::CUDA.CuArrayAdaptor{CUDA.Mem.DeviceBuffer}, xs::ODESolution{Float32, 2, Vector{Vector{Float32}}, Nothing, Nothing, Vector{Float32}, Vector{Vector{Vector{Float32}}}, ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, SciMLBase.NullParameters, ODEFunction{true, typeof(trueODEfunc), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{true, typeof(trueODEfunc), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{Vector{Float32}}, Vector{Float32}, Vector{Vector{Vector{Float32}}}, OrdinaryDiffEq.Tsit5Cache{Vector{Float32}, Vector{Float32}, Vector{Float32}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}}}, DiffEqBase.DEStats})
    @ CUDA /data/packages/CUDA/VGl9W/src/array.jl:429
 [10] adapt_structure
    @ /data/packages/Adapt/RGNRk/src/Adapt.jl:42 [inlined]
 [11] adapt
    @ /data/packages/Adapt/RGNRk/src/Adapt.jl:40 [inlined]
 [12] #cu#184
    @ /data/packages/CUDA/VGl9W/src/array.jl:439 [inlined]
 [13] cu
    @ /data/packages/CUDA/VGl9W/src/array.jl:439 [inlined]
 [14] fmap(f::typeof(cu), x::ODESolution{Float32, 2, Vector{Vector{Float32}}, Nothing, Nothing, Vector{Float32}, Vector{Vector{Vector{Float32}}}, ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, SciMLBase.NullParameters, ODEFunction{true, typeof(trueODEfunc), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{true, typeof(trueODEfunc), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{Vector{Float32}}, Vector{Float32}, Vector{Vector{Vector{Float32}}}, OrdinaryDiffEq.Tsit5Cache{Vector{Float32}, Vector{Float32}, Vector{Float32}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}}}, DiffEqBase.DEStats}; exclude::typeof(Flux._isbitsarray), walk::typeof(Functors._default_walk), cache::IdDict{Any, Any})
    @ Functors /data/packages/Functors/l7uZ3/src/functor.jl:90
 [15] gpu(x::ODESolution{Float32, 2, Vector{Vector{Float32}}, Nothing, Nothing, Vector{Float32}, Vector{Vector{Vector{Float32}}}, ODEProblem{Vector{Float32}, Tuple{Float32, Float32}, true, SciMLBase.NullParameters, ODEFunction{true, typeof(trueODEfunc), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{true, typeof(trueODEfunc), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{Vector{Float32}}, Vector{Float32}, Vector{Vector{Vector{Float32}}}, OrdinaryDiffEq.Tsit5Cache{Vector{Float32}, Vector{Float32}, Vector{Float32}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}}}, DiffEqBase.DEStats})
    @ Flux /data/packages/Flux/Zz9RI/src/functor.jl:71
 [16] top-level scope
    @ REPL[11]:2
 [17] top-level scope
    @ /data/packages/CUDA/VGl9W/src/initialization.jl:66

These are my package versions:

      Status `/data/environments/v1.6/Project.toml`
  [6e4b80f9] BenchmarkTools v1.1.3
  [052768ef] CUDA v3.4.1
  [aae7a2af] DiffEqFlux v1.43.0
  [41bf760c] DiffEqSensitivity v6.57.0
  [0c46a032] DifferentialEquations v6.19.0
  [31c24e10] Distributions v0.25.11
  [587475ba] Flux v0.12.6
  [a75be94c] GalacticOptim v2.0.3
  [429524aa] Optim v1.4.1
  [1dea7af3] OrdinaryDiffEq v5.63.1
  [91a5bcdd] Plots v1.20.1

and this is my CUDA version:

CUDA toolkit 10.2.89, artifact installation
CUDA driver 10.2.0
NVIDIA driver 440.82.0

Libraries: 
- CUBLAS: 10.2.2
- CURAND: 10.1.2
- CUFFT: 10.1.2
- CUSOLVER: 10.3.0
- CUSPARSE: 10.3.1
- CUPTI: 12.0.0
- NVML: 10.0.0+440.82
- CUDNN: 8.20.2 (for CUDA 10.2.0)
- CUTENSOR: 1.3.0 (for CUDA 10.2.0)

Toolchain:
- Julia: 1.6.2
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5
- Device capability support: sm_30, sm_32, sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75

1 device:
  0: TITAN V (sm_70, 11.472 GiB / 11.784 GiB available)

The code works without any errors when running it on CPU. What could be the problem here?
Thanks!

The text was updated successfully, but these errors were encountered:

ChrisRackauckas · 2021-08-22T20:05:04Z

The workaround of course is to just convert directly to a CuArray:

using DiffEqFlux, OrdinaryDiffEq, Flux, Optim, Plots, CUDA, DiffEqSensitivity
CUDA.allowscalar(false) # Makes sure no slow operations are occuring

# Generate Data
u0 = Float32[2.0; 0.0]
datasize = 30
tspan = (0.0f0, 1.5f0)
tsteps = range(tspan[1], tspan[2], length = datasize)
function trueODEfunc(du, u, p, t)
    true_A = [-0.1 2.0; -2.0 -0.1]
    du .= ((u.^3)'true_A)'
end
prob_trueode = ODEProblem(trueODEfunc, u0, tspan)
# Make the data into a GPU-based array if the user has a GPU
ode_data = CuArray(solve(prob_trueode, Tsit5(), saveat = tsteps))


dudt2 = FastChain((x, p) -> x.^3,
                  FastDense(2, 50, tanh),
                  FastDense(50, 2))
u0 = Float32[2.0; 0.0] |> gpu
p = initial_params(dudt2) |> gpu
prob_neuralode = NeuralODE(dudt2, tspan, Tsit5(), saveat = tsteps)

function predict_neuralode(p)
  CuArray(prob_neuralode(u0,p))
end
function loss_neuralode(p)
    pred = predict_neuralode(p)
    loss = sum(abs2, ode_data .- pred)
    return loss, pred
end
# Callback function to observe training
list_plots = []
iter = 0
callback = function (p, l, pred; doplot = false)
  global list_plots, iter
  if iter == 0
    list_plots = []
  end
  iter += 1
  display(l)
  # plot current prediction against data
  plt = scatter(tsteps, Array(ode_data[1,:]), label = "data")
  scatter!(plt, tsteps, Array(pred[1,:]), label = "prediction")
  push!(list_plots, plt)
  if doplot
    display(plot(plt))
  end
  return false
end
result_neuralode = DiffEqFlux.sciml_train(loss_neuralode, p,
                                          ADAM(0.05), cb = callback,
                                          maxiters = 300)

@DhairyaLGandhi @mcabbott do you know why gpu is now going through a bunch of functor stuff and is not AbstractArray generic? The MWE is:

using RecursiveArrayTools, CUDA
x = VectorOfArray([rand(Float32,4),rand(Float32,4)])
CuArray(x) # works
gpu(x) # fails

and it used to just call CuArray?

DhairyaLGandhi · 2021-08-23T03:05:55Z

What's the zygote version? We haven't done a functors release but zygote was released recently. I'm not sure if that would explain it entirely, but surely this is a bug. Is it a recent RecursiveArrayTools release?

DhairyaLGandhi · 2021-08-23T05:37:17Z

Found the issue - it stems from the new CUDA v3.4 release series.

with CUDA v3.4.1

x = VectorOfArray([rand(Float32,4),rand(Float32,4)]);

julia> Flux.CUDA.cu(x)
ERROR: MethodError: Cannot `convert` an object of type Vector{Float32} to an object of type Float32
Closest candidates are:
  convert(::Type{T}, ::ColorTypes.Gray24) where T<:Real at /home/dhairyalgandhi/.julia/packages/ColorTypes/6m8P7/src/conversions.jl:114
  convert(::Type{T}, ::ColorTypes.Gray) where T<:Real at /home/dhairyalgandhi/.julia/packages/ColorTypes/6m8P7/src/conversions.jl:113
  convert(::Type{T}, ::Static.StaticFloat64{N}) where {N, T<:AbstractFloat} at /home/dhairyalgandhi/.julia/packages/Static/lCOFN/src/float.jl:26

with CUDA v3.3.6

julia> x = VectorOfArray([rand(Float32,4),rand(Float32,4)]);

julia> Flux.CUDA.cu(x)
4×2 CUDA.CuArray{Float32, 2}:
 0.840366   0.391562
 0.682369   0.146678
 0.15754    0.320327
 0.0135089  0.0619746

@maleadt have we changed something in CUDA.cu?

ccasert · 2021-08-24T10:48:46Z

Thanks! Reverting to CUDA v3.3.6 allows me to run that example on GPU. However, there are still issues with other functions, e.g. normalizing flows on GPU:

using DiffEqFlux, DifferentialEquations, GalacticOptim, Distributions

nn = Chain(
    Dense(1, 3, tanh),
    Dense(3, 1, tanh),
) |> gpu

tspan = (0.0f0, 10.0f0)
ffjord_mdl = FFJORD(nn, tspan, Tsit5())

data_dist = Normal(6.0f0, 0.7f0)
train_data = gpu(rand(data_dist, 1, 100))

function loss(θ)
    logpx, λ₁, λ₂ = ffjord_mdl(train_data, θ)
    -mean(logpx)
end

loss(ffjord_mdl.p)

gives the following error

ERROR: GPU compilation of kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(*), Tuple{Base.Broadcast.Extruded{Matrix{Float32}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(conj), Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(-), Tuple{Int64, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(Base.literal_pow), Tuple{CUDA.CuRefValue{typeof(^)}, Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, CUDA.CuRefValue{Val{2}}}}}}}}}}, Int64) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(*), Tuple{Base.Broadcast.Extruded{Matrix{Float32}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(conj), Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(-), Tuple{Int64, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(Base.literal_pow), Tuple{CUDA.CuRefValue{typeof(^)}, Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, CUDA.CuRefValue{Val{2}}}}}}}}}}, which is not isbits:
  .args is of type Tuple{Base.Broadcast.Extruded{Matrix{Float32}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(conj), Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(-), Tuple{Int64, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(Base.literal_pow), Tuple{CUDA.CuRefValue{typeof(^)}, Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, CUDA.CuRefValue{Val{2}}}}}}}}} which is not isbits.
    .1 is of type Base.Broadcast.Extruded{Matrix{Float32}, Tuple{Bool, Bool}, Tuple{Int64, Int64}} which is not isbits.
      .x is of type Matrix{Float32} which is not isbits.


Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler /data/packages/GPUCompiler/fG3xK/src/validation.jl:66
  [2] macro expansion
    @ /data/packages/GPUCompiler/fG3xK/src/driver.jl:318 [inlined]
  [3] macro expansion
    @ /data/packages/TimerOutputs/ZQ0rt/src/TimerOutput.jl:236 [inlined]
  [4] macro expansion
    @ /data/packages/GPUCompiler/fG3xK/src/driver.jl:317 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler /data/packages/GPUCompiler/fG3xK/src/utils.jl:62
  [6] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA /data/packages/CUDA/DL5Zo/src/compiler/execution.jl:317
  [7] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler /data/packages/GPUCompiler/fG3xK/src/cache.jl:89
  [8] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(*), Tuple{Base.Broadcast.Extruded{Matrix{Float32}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(conj), Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(-), Tuple{Int64, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(Base.literal_pow), Tuple{CUDA.CuRefValue{typeof(^)}, Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, CUDA.CuRefValue{Val{2}}}}}}}}}}, Int64}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA /data/packages/CUDA/DL5Zo/src/compiler/execution.jl:288
  [9] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(*), Tuple{Base.Broadcast.Extruded{Matrix{Float32}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(conj), Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(-), Tuple{Int64, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(Base.literal_pow), Tuple{CUDA.CuRefValue{typeof(^)}, Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, CUDA.CuRefValue{Val{2}}}}}}}}}}, Int64}})
    @ CUDA /data/packages/CUDA/DL5Zo/src/compiler/execution.jl:282
 [10] macro expansion
    @ /data/packages/CUDA/DL5Zo/src/compiler/execution.jl:102 [inlined]
 [11] #launch_heuristic#241
    @ /data/packages/CUDA/DL5Zo/src/gpuarrays.jl:17 [inlined]
 [12] copyto!
    @ /data/packages/GPUArrays/UBzTm/src/host/broadcast.jl:65 [inlined]
 [13] copyto!
    @ ./broadcast.jl:936 [inlined]
 [14] copy
    @ /data/packages/GPUArrays/UBzTm/src/host/broadcast.jl:47 [inlined]
 [15] materialize
    @ ./broadcast.jl:883 [inlined]
 [16] (::Zygote.var"#1076#1077"{CuArray{Float32, 2}})(ȳ::Matrix{Float32})
    @ Zygote /data/packages/Zygote/TaBlo/src/lib/broadcast.jl:105
 [17] #3920#back
    @ /data/packages/ZygoteRules/OjfTt/src/adjoint.jl:59 [inlined]
 [18] Pullback
    @ /data/packages/Flux/Zz9RI/src/layers/basic.jl:148 [inlined]
 [19] (::typeof(∂(λ)))(Δ::Matrix{Float32})
    @ Zygote /data/packages/Zygote/TaBlo/src/compiler/interface2.jl:0
 [20] Pullback
    @ /data/packages/Flux/Zz9RI/src/layers/basic.jl:37 [inlined]
 [21] (::typeof(∂(applychain)))(Δ::Matrix{Float32})
    @ Zygote /data/packages/Zygote/TaBlo/src/compiler/interface2.jl:0
 [22] Pullback
    @ /data/packages/Flux/Zz9RI/src/layers/basic.jl:37 [inlined]
 [23] (::typeof(∂(applychain)))(Δ::Matrix{Float32})
    @ Zygote /data/packages/Zygote/TaBlo/src/compiler/interface2.jl:0
 [24] Pullback
    @ /data/packages/Flux/Zz9RI/src/layers/basic.jl:39 [inlined]
 [25] (::typeof(∂(λ)))(Δ::Matrix{Float32})
    @ Zygote /data/packages/Zygote/TaBlo/src/compiler/interface2.jl:0
 [26] (::Zygote.var"#46#47"{typeof(∂(λ))})(Δ::Matrix{Float32})
    @ Zygote /data/packages/Zygote/TaBlo/src/compiler/interface.jl:41
 [27] ffjord(u::CuArray{Float32, 2}, p::CuArray{Float32, 1}, t::Float32, re::Flux.var"#60#62"{Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}, Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}}}}, e::Matrix{Float32}; regularize::Bool, monte_carlo::Bool)
    @ DiffEqFlux /data/packages/DiffEqFlux/N7blG/src/ffjord.jl:179
 [28] #59
    @ /data/packages/DiffEqFlux/N7blG/src/ffjord.jl:194 [inlined]
 [29] ODEFunction
    @ /data/packages/SciMLBase/UIp7W/src/scimlfunctions.jl:334 [inlined]
 [30] initialize!(integrator::OrdinaryDiffEq.ODEIntegrator{Tsit5, false, CuArray{Float32, 2}, Nothing, Float32, CuArray{Float32, 1}, Float32, Float32, Float32, Float32, Vector{CuArray{Float32, 2}}, ODESolution{Float32, 3, Vector{CuArray{Float32, 2}}, Nothing, Nothing, Vector{Float32}, Vector{Vector{CuArray{Float32, 2}}}, ODEProblem{CuArray{Float32, 2}, Tuple{Float32, Float32}, false, CuArray{Float32, 1}, ODEFunction{false, DiffEqFlux.var"#59#64"{Bool, Bool, FFJORD{Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}, Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}}}, CuArray{Float32, 1}, Flux.var"#60#62"{Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}, Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}}}}, FullNormal, Tuple{Float32, Float32}, Tuple{Tsit5}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, Matrix{Float32}}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{false, DiffEqFlux.var"#59#64"{Bool, Bool, FFJORD{Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}, Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}}}, CuArray{Float32, 1}, Flux.var"#60#62"{Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}, Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}}}}, FullNormal, Tuple{Float32, Float32}, Tuple{Tsit5}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, Matrix{Float32}}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{CuArray{Float32, 2}}, Vector{Float32}, Vector{Vector{CuArray{Float32, 2}}}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}}, DiffEqBase.DEStats}, ODEFunction{false, DiffEqFlux.var"#59#64"{Bool, Bool, FFJORD{Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}, Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}}}, CuArray{Float32, 1}, Flux.var"#60#62"{Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}, Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}}}}, FullNormal, Tuple{Float32, Float32}, Tuple{Tsit5}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, Matrix{Float32}}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32}, OrdinaryDiffEq.DEOptions{Float32, Float32, Float32, Float32, PIController{Rational{Int64}}, typeof(DiffEqBase.ODE_DEFAULT_NORM), typeof(LinearAlgebra.opnorm), Nothing, CallbackSet{Tuple{}, Tuple{}}, typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), typeof(DiffEqBase.ODE_DEFAULT_UNSTABLE_CHECK), DataStructures.BinaryHeap{Float32, DataStructures.FasterForward}, DataStructures.BinaryHeap{Float32, DataStructures.FasterForward}, Nothing, Nothing, Int64, Tuple{}, Tuple{}, Tuple{}}, CuArray{Float32, 2}, Float32, Nothing, OrdinaryDiffEq.DefaultInit}, cache::OrdinaryDiffEq.Tsit5ConstantCache{Float32, Float32})
    @ OrdinaryDiffEq /data/packages/OrdinaryDiffEq/PZbGY/src/perform_step/low_order_rk_perform_step.jl:565
 [31] __init(prob::ODEProblem{CuArray{Float32, 2}, Tuple{Float32, Float32}, false, CuArray{Float32, 1}, ODEFunction{false, DiffEqFlux.var"#59#64"{Bool, Bool, FFJORD{Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}, Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}}}, CuArray{Float32, 1}, Flux.var"#60#62"{Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}, Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}}}}, FullNormal, Tuple{Float32, Float32}, Tuple{Tsit5}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, Matrix{Float32}}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::Tsit5, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::Tuple{}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float32, dtmin::Nothing, dtmax::Float32, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Nothing, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Int64, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Int64, internalnorm::typeof(DiffEqBase.ODE_DEFAULT_NORM), internalopnorm::typeof(LinearAlgebra.opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::typeof(DiffEqBase.ODE_DEFAULT_UNSTABLE_CHECK), verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ OrdinaryDiffEq /data/packages/OrdinaryDiffEq/PZbGY/src/solve.jl:456
 [32] __init(prob::ODEProblem{CuArray{Float32, 2}, Tuple{Float32, Float32}, false, CuArray{Float32, 1}, ODEFunction{false, DiffEqFlux.var"#59#64"{Bool, Bool, FFJORD{Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}, Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}}}, CuArray{Float32, 1}, Flux.var"#60#62"{Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}, Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}}}}, FullNormal, Tuple{Float32, Float32}, Tuple{Tsit5}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, Matrix{Float32}}, LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::Tsit5, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}) (repeats 5 times)
    @ OrdinaryDiffEq /data/packages/OrdinaryDiffEq/PZbGY/src/solve.jl:67
 [33] #__solve#471
    @ /data/packages/OrdinaryDiffEq/PZbGY/src/solve.jl:4 [inlined]
 [34] __solve
    @ /data/packages/OrdinaryDiffEq/PZbGY/src/solve.jl:4 [inlined]
 [35] #solve_call#42
    @ /data/packages/DiffEqBase/Rmj4o/src/solve.jl:61 [inlined]
 [36] solve_call
    @ /data/packages/DiffEqBase/Rmj4o/src/solve.jl:48 [inlined]
 [37] #solve_up#44
    @ /data/packages/DiffEqBase/Rmj4o/src/solve.jl:87 [inlined]
 [38] solve_up
    @ /data/packages/DiffEqBase/Rmj4o/src/solve.jl:78 [inlined]
 [39] #solve#43
    @ /data/packages/DiffEqBase/Rmj4o/src/solve.jl:73 [inlined]
 [40] (::FFJORD{Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}, Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}}}, CuArray{Float32, 1}, Flux.var"#60#62"{Chain{Tuple{Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}, Dense{typeof(tanh), CuArray{Float32, 2}, CuArray{Float32, 1}}}}}, FullNormal, Tuple{Float32, Float32}, Tuple{Tsit5}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}})(x::CuArray{Float32, 2}, p::CuArray{Float32, 1}, e::Matrix{Float32}; regularize::Bool, monte_carlo::Bool)
    @ DiffEqFlux /data/packages/DiffEqFlux/N7blG/src/ffjord.jl:208
 [41] FFJORD (repeats 2 times)
    @ /data/packages/DiffEqFlux/N7blG/src/ffjord.jl:192 [inlined]
 [42] loss(θ::CuArray{Float32, 1})
    @ Main ./REPL[22]:2
 [43] top-level scope
    @ REPL[23]:1
 [44] top-level scope
    @ /data/packages/CUDA/DL5Zo/src/initialization.jl:52

DhairyaLGandhi · 2021-08-24T10:57:31Z

That trace suggests that something isn't on the gpu notice the Matrix{Float32} in there.

ChrisRackauckas · 2021-08-24T11:01:15Z

Yeah, any issues with ffjord is probably a separate issue. Let's solve this in this issue ASAP and keep ffjord for another day.

maleadt · 2021-08-24T11:05:20Z

Regarding the VectorOfArray issue: Calling cu(x) calls adapt(CuArray{Float32}, x), which descends into types (thanks to Adapt.jl) and ultimately calls CuArray{Float32}(...) on the leaves: https://github.com/JuliaGPU/CUDA.jl/blob/78379e1786dba80e396ca362a7546fc6d7b488e1/src/array.jl#L429-L430. And VectorOfArray doesn't support being converted to an array with an element type:

julia> Array(VectorOfArray([rand(Float32,4),rand(Float32,4)]))
4×2 Matrix{Float32}:
 0.412813  0.752041
 0.642892  0.894659
 0.477681  0.466676
 0.082933  0.0945987

julia> Array{Float32}(VectorOfArray([rand(Float32,4),rand(Float32,4)]))
ERROR: MethodError: Cannot `convert` an object of type Vector{Float32} to an object of type Float32
Closest candidates are:
  convert(::Type{T}, ::ColorTypes.Gray24) where T<:Real at /home/tim/Julia/depot/packages/ColorTypes/6m8P7/src/conversions.jl:114
  convert(::Type{T}, ::ColorTypes.Gray) where T<:Real at /home/tim/Julia/depot/packages/ColorTypes/6m8P7/src/conversions.jl:113
  convert(::Type{T}, ::LLVM.GenericValue, ::LLVM.LLVMType) where T<:AbstractFloat at /home/tim/Julia/depot/packages/LLVM/FrlPu/src/execution.jl:39

ChrisRackauckas · 2021-08-24T11:08:05Z

How did it work until the update though? Was cu calling CuArray instead of adapt before?

https://github.com/SciML/RecursiveArrayTools.jl/blob/master/src/init.jl#L23-L30

maleadt · 2021-08-24T11:33:30Z

cu has been using Adapt for ages. Not sure what changed, as CuArray(....) was calling Array on 3.3.6 too: https://github.com/JuliaGPU/CUDA.jl/blob/964893c8cd2c7c8f73de0df1c48d3237d2e07414/src/array.jl#L240-L244
Anyway, there's clearly a method missing for VectorOfArray, can we not add it there? This should do it:

Base.Array{U}(VA::AbstractVectorOfArray{T,N,A}) where {T,U,N,A <: AbstractVector{<:AbstractVector}} = reduce(hcat,map(x->U.(x), VA.u))

ChrisRackauckas · 2021-08-24T11:36:24Z

Yeah, I was just having trouble figuring out which overload was missing, so I was trying to track down what the change was. That seems to fix it all locally, so the new patch should handle this fine. Thanks @ccasert !

ToucheSir · 2021-08-30T22:13:17Z

Commenting here from the Flux issue, could it be that JuliaGPU/GPUArrays.jl#368 (which was in GPUArrays 8/CUDA 3.4 but not GPUArrays 7/CUDA 3.3) changed things?

ChrisRackauckas closed this as completed in SciML/RecursiveArrayTools.jl@196ecd7 Aug 24, 2021

DhairyaLGandhi mentioned this issue Aug 30, 2021

Problems with a mixed CPU/GPU model FluxML/Flux.jl#1695

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU example throws errors #608

GPU example throws errors #608

ccasert commented Aug 19, 2021

ChrisRackauckas commented Aug 22, 2021

DhairyaLGandhi commented Aug 23, 2021 •

edited

Loading

DhairyaLGandhi commented Aug 23, 2021

ccasert commented Aug 24, 2021

DhairyaLGandhi commented Aug 24, 2021

ChrisRackauckas commented Aug 24, 2021

maleadt commented Aug 24, 2021

ChrisRackauckas commented Aug 24, 2021

maleadt commented Aug 24, 2021 •

edited

Loading

ChrisRackauckas commented Aug 24, 2021

ToucheSir commented Aug 30, 2021

GPU example throws errors #608

GPU example throws errors #608

Comments

ccasert commented Aug 19, 2021

ChrisRackauckas commented Aug 22, 2021

DhairyaLGandhi commented Aug 23, 2021 • edited Loading

DhairyaLGandhi commented Aug 23, 2021

ccasert commented Aug 24, 2021

DhairyaLGandhi commented Aug 24, 2021

ChrisRackauckas commented Aug 24, 2021

maleadt commented Aug 24, 2021

ChrisRackauckas commented Aug 24, 2021

maleadt commented Aug 24, 2021 • edited Loading

ChrisRackauckas commented Aug 24, 2021

ToucheSir commented Aug 30, 2021

DhairyaLGandhi commented Aug 23, 2021 •

edited

Loading

maleadt commented Aug 24, 2021 •

edited

Loading