GPU and Zygote in a posteriori training #123

luisaforozco · 2024-11-22T09:07:35Z

In the branch gpu_in_hpc I've code to move elements to GPU.
I got this error in the AD of Zygote trough the loss function of a-posteriori. It seems to be a foreign call but not sure where exactly, take a look at the stacktrace is not very clear where Zygote fails. Also, this doesn't happen with CPU.

ERROR: LoadError: Can't differentiate foreigncall expression $(Expr(:foreigncall, :(:jl_eqtable_get), Any, svec(Any, Any, Any), 0, :(:ccall), %5, %3, %4)).
You might want to check the Zygote limitations documentation.
https://fluxml.ai/Zygote.jl/latest/limitations

Stacktrace (Click to expand)

Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] get
    @ ./iddict.jl:102 [inlined]
  [3] (::Zygote.Pullback{Tuple{typeof(get), IdDict{Any, Any}, Symbol, Nothing}, Any})(Δ::Nothing)
    @ Zygote /var/scratch/lorozco/.julia/packages/Zygote/nyzjS/src/compiler/interface2.jl:0
  [4] loss_function
    @ /var/scratch/lorozco/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:208 [inlined]
  [5] (::Zygote.Pullback{Tuple{CoupledNODE.var"#loss_function#92"{Tsit5{typeof(OrdinaryDiffEqCore.trivial_limiter!), typeof(OrdinaryDiffEqCore.trivial_limiter!), Static.False}, @Kwargs{}, CoupledNODE.NavierStokes.var"#right_hand_side#2"{@NamedTuple{grid::@NamedTuple{xlims::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, dimension::IncompressibleNavierStokes.Dimension{2}, N::Tuple{Int64, Int64}, Nu::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}, Np::Tuple{Int64, Int64}, Iu::Tuple{CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Ip::CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, x::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, xu::Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, xp::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δ::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δu::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, A::Tuple{Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}}}, boundary_conditions::Tuple{Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}, Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}}, Re::Float32, bodyforce::Nothing, issteadybodyforce::Bool, closure_model::Nothing, backend::CUDABackend, workgroupsize::Int64, temperature::Nothing}, IncompressibleNavierStokes.var"#psolve!#123"{CUDA.CUFFT.CuFFTPlan{ComplexF32, Float32, -1, false, 2}, CuArray{Float32, 2, CUDA.DeviceMemory}, CuArray{ComplexF32, 2, CUDA.DeviceMemory}, Int64, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Lux.Chain{@NamedTuple{layer_1::Lux.WrappedFunction{typeof(CoupledNODE.collocate)}, layer_2::Lux.Conv{typeof(tanh), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_3::Lux.Conv{typeof(identity), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_4::Lux.WrappedFunction{typeof(CoupledNODE.decollocate)}}, Nothing}, @NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}, layer_3::@NamedTuple{}, layer_4::@NamedTuple{}}}, MLDataDevices.CUDADevice{Nothing}}, Lux.Chain{@NamedTuple{layer_1::Lux.WrappedFunction{typeof(CoupledNODE.collocate)}, layer_2::Lux.Conv{typeof(tanh), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_3::Lux.Conv{typeof(identity), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_4::Lux.WrappedFunction{typeof(CoupledNODE.decollocate)}}, Nothing}, ComponentArrays.ComponentVector{Float32, CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{ComponentArrays.Axis{(layer_1 = 1:0, layer_2 = ViewAxis(1:196, Axis(weight = ViewAxis(1:196, ShapedAxis((7, 7, 2, 2))),)), layer_3 = ViewAxis(197:392, Axis(weight = ViewAxis(1:196, ShapedAxis((7, 7, 2, 2))),)), layer_4 = 393:392)}}}, @NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}, layer_3::@NamedTuple{}, layer_4::@NamedTuple{}}, @NamedTuple{u::SubArray{Float32, 4, Array{Float32, 5}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}, Int64, UnitRange{Int64}}, false}, t::CuArray{Float32, 1, CUDA.DeviceMemory}}}, Any})(Δ::Tuple{Float32, Nothing, Nothing})
    @ Zygote /var/scratch/lorozco/.julia/packages/Zygote/nyzjS/src/compiler/interface2.jl:0
  [6] (::Zygote.var"#78#79"{Zygote.Pullback{Tuple{CoupledNODE.var"#loss_function#92"{Tsit5{typeof(OrdinaryDiffEqCore.trivial_limiter!), typeof(OrdinaryDiffEqCore.trivial_limiter!), Static.False}, @Kwargs{}, CoupledNODE.NavierStokes.var"#right_hand_side#2"{@NamedTuple{grid::@NamedTuple{xlims::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, dimension::IncompressibleNavierStokes.Dimension{2}, N::Tuple{Int64, Int64}, Nu::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}, Np::Tuple{Int64, Int64}, Iu::Tuple{CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Ip::CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, x::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, xu::Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, xp::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δ::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δu::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, A::Tuple{Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}}}, boundary_conditions::Tuple{Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}, Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}}, Re::Float32, bodyforce::Nothing, issteadybodyforce::Bool, closure_model::Nothing, backend::CUDABackend, workgroupsize::Int64, temperature::Nothing}, IncompressibleNavierStokes.var"#psolve!#123"{CUDA.CUFFT.CuFFTPlan{ComplexF32, Float32, -1, false, 2}, CuArray{Float32, 2, CUDA.DeviceMemory}, CuArray{ComplexF32, 2, CUDA.DeviceMemory}, Int64, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Lux.Chain{@NamedTuple{layer_1::Lux.WrappedFunction{typeof(CoupledNODE.collocate)}, layer_2::Lux.Conv{typeof(tanh), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_3::Lux.Conv{typeof(identity), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_4::Lux.WrappedFunction{typeof(CoupledNODE.decollocate)}}, Nothing}, @NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}, layer_3::@NamedTuple{}, layer_4::@NamedTuple{}}}, MLDataDevices.CUDADevice{Nothing}}, Lux.Chain{@NamedTuple{layer_1::Lux.WrappedFunction{typeof(CoupledNODE.collocate)}, layer_2::Lux.Conv{typeof(tanh), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_3::Lux.Conv{typeof(identity), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_4::Lux.WrappedFunction{typeof(CoupledNODE.decollocate)}}, Nothing}, ComponentArrays.ComponentVector{Float32, CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{ComponentArrays.Axis{(layer_1 = 1:0, layer_2 = ViewAxis(1:196, Axis(weight = ViewAxis(1:196, ShapedAxis((7, 7, 2, 2))),)), layer_3 = ViewAxis(197:392, Axis(weight = ViewAxis(1:196, ShapedAxis((7, 7, 2, 2))),)), layer_4 = 393:392)}}}, @NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}, layer_3::@NamedTuple{}, layer_4::@NamedTuple{}}, @NamedTuple{u::SubArray{Float32, 4, Array{Float32, 5}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}, Int64, UnitRange{Int64}}, false}, t::CuArray{Float32, 1, CUDA.DeviceMemory}}}, Any}})(Δ::Tuple{Float32, Nothing, Nothing})
    @ Zygote /var/scratch/lorozco/.julia/packages/Zygote/nyzjS/src/compiler/interface.jl:91
  [7] compute_gradients_impl(::ADTypes.AutoZygote, objective_function::CoupledNODE.var"#loss_function#92"{Tsit5{typeof(OrdinaryDiffEqCore.trivial_limiter!), typeof(OrdinaryDiffEqCore.trivial_limiter!), Static.False}, @Kwargs{}, CoupledNODE.NavierStokes.var"#right_hand_side#2"{@NamedTuple{grid::@NamedTuple{xlims::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, dimension::IncompressibleNavierStokes.Dimension{2}, N::Tuple{Int64, Int64}, Nu::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}, Np::Tuple{Int64, Int64}, Iu::Tuple{CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Ip::CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, x::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, xu::Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, xp::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δ::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δu::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, A::Tuple{Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}}}, boundary_conditions::Tuple{Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}, Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}}, Re::Float32, bodyforce::Nothing, issteadybodyforce::Bool, closure_model::Nothing, backend::CUDABackend, workgroupsize::Int64, temperature::Nothing}, IncompressibleNavierStokes.var"#psolve!#123"{CUDA.CUFFT.CuFFTPlan{ComplexF32, Float32, -1, false, 2}, CuArray{Float32, 2, CUDA.DeviceMemory}, CuArray{ComplexF32, 2, CUDA.DeviceMemory}, Int64, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Lux.Chain{@NamedTuple{layer_1::Lux.WrappedFunction{typeof(CoupledNODE.collocate)}, layer_2::Lux.Conv{typeof(tanh), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_3::Lux.Conv{typeof(identity), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_4::Lux.WrappedFunction{typeof(CoupledNODE.decollocate)}}, Nothing}, @NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}, layer_3::@NamedTuple{}, layer_4::@NamedTuple{}}}, MLDataDevices.CUDADevice{Nothing}}, data::@NamedTuple{u::SubArray{Float32, 4, Array{Float32, 5}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}, Int64, UnitRange{Int64}}, false}, t::CuArray{Float32, 1, CUDA.DeviceMemory}}, ts::Lux.Training.TrainState{Nothing, Nothing, Lux.Chain{@NamedTuple{layer_1::Lux.WrappedFunction{typeof(CoupledNODE.collocate)}, layer_2::Lux.Conv{typeof(tanh), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_3::Lux.Conv{typeof(identity), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_4::Lux.WrappedFunction{typeof(CoupledNODE.decollocate)}}, Nothing}, ComponentArrays.ComponentVector{Float32, CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{ComponentArrays.Axis{(layer_1 = 1:0, layer_2 = ViewAxis(1:196, Axis(weight = ViewAxis(1:196, ShapedAxis((7, 7, 2, 2))),)), layer_3 = ViewAxis(197:392, Axis(weight = ViewAxis(1:196, ShapedAxis((7, 7, 2, 2))),)), layer_4 = 393:392)}}}, @NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}, layer_3::@NamedTuple{}, layer_4::@NamedTuple{}}, Optimisers.Adam, Optimisers.Leaf{Optimisers.Adam, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{Float32, Float32}}}})
    @ LuxZygoteExt /var/scratch/lorozco/.julia/packages/Lux/JbRSn/ext/LuxZygoteExt/training.jl:5
  [8] compute_gradients
    @ /var/scratch/lorozco/.julia/packages/Lux/JbRSn/src/helpers/training.jl:198 [inlined]
  [9] single_train_step_impl!(backend::ADTypes.AutoZygote, obj_fn::CoupledNODE.var"#loss_function#92"{Tsit5{typeof(OrdinaryDiffEqCore.trivial_limiter!), typeof(OrdinaryDiffEqCore.trivial_limiter!), Static.False}, @Kwargs{}, CoupledNODE.NavierStokes.var"#right_hand_side#2"{@NamedTuple{grid::@NamedTuple{xlims::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, dimension::IncompressibleNavierStokes.Dimension{2}, N::Tuple{Int64, Int64}, Nu::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}, Np::Tuple{Int64, Int64}, Iu::Tuple{CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Ip::CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, x::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, xu::Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, xp::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δ::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δu::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, A::Tuple{Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}}}, boundary_conditions::Tuple{Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}, Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}}, Re::Float32, bodyforce::Nothing, issteadybodyforce::Bool, closure_model::Nothing, backend::CUDABackend, workgroupsize::Int64, temperature::Nothing}, IncompressibleNavierStokes.var"#psolve!#123"{CUDA.CUFFT.CuFFTPlan{ComplexF32, Float32, -1, false, 2}, CuArray{Float32, 2, CUDA.DeviceMemory}, CuArray{ComplexF32, 2, CUDA.DeviceMemory}, Int64, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Lux.Chain{@NamedTuple{layer_1::Lux.WrappedFunction{typeof(CoupledNODE.collocate)}, layer_2::Lux.Conv{typeof(tanh), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_3::Lux.Conv{typeof(identity), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_4::Lux.WrappedFunction{typeof(CoupledNODE.decollocate)}}, Nothing}, @NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}, layer_3::@NamedTuple{}, layer_4::@NamedTuple{}}}, MLDataDevices.CUDADevice{Nothing}}, data::@NamedTuple{u::SubArray{Float32, 4, Array{Float32, 5}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}, Int64, UnitRange{Int64}}, false}, t::CuArray{Float32, 1, CUDA.DeviceMemory}}, ts::Lux.Training.TrainState{Nothing, Nothing, Lux.Chain{@NamedTuple{layer_1::Lux.WrappedFunction{typeof(CoupledNODE.collocate)}, layer_2::Lux.Conv{typeof(tanh), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_3::Lux.Conv{typeof(identity), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_4::Lux.WrappedFunction{typeof(CoupledNODE.decollocate)}}, Nothing}, ComponentArrays.ComponentVector{Float32, CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{ComponentArrays.Axis{(layer_1 = 1:0, layer_2 = ViewAxis(1:196, Axis(weight = ViewAxis(1:196, ShapedAxis((7, 7, 2, 2))),)), layer_3 = ViewAxis(197:392, Axis(weight = ViewAxis(1:196, ShapedAxis((7, 7, 2, 2))),)), layer_4 = 393:392)}}}, @NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}, layer_3::@NamedTuple{}, layer_4::@NamedTuple{}}, Optimisers.Adam, Optimisers.Leaf{Optimisers.Adam, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{Float32, Float32}}}})
    @ Lux.Training /var/scratch/lorozco/.julia/packages/Lux/JbRSn/src/helpers/training.jl:301
 [10] single_train_step!(backend::ADTypes.AutoZygote, obj_fn::CoupledNODE.var"#loss_function#92"{Tsit5{typeof(OrdinaryDiffEqCore.trivial_limiter!), typeof(OrdinaryDiffEqCore.trivial_limiter!), Static.False}, @Kwargs{}, CoupledNODE.NavierStokes.var"#right_hand_side#2"{@NamedTuple{grid::@NamedTuple{xlims::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, dimension::IncompressibleNavierStokes.Dimension{2}, N::Tuple{Int64, Int64}, Nu::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}, Np::Tuple{Int64, Int64}, Iu::Tuple{CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Ip::CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, x::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, xu::Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, xp::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δ::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δu::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, A::Tuple{Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}}}, boundary_conditions::Tuple{Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}, Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}}, Re::Float32, bodyforce::Nothing, issteadybodyforce::Bool, closure_model::Nothing, backend::CUDABackend, workgroupsize::Int64, temperature::Nothing}, IncompressibleNavierStokes.var"#psolve!#123"{CUDA.CUFFT.CuFFTPlan{ComplexF32, Float32, -1, false, 2}, CuArray{Float32, 2, CUDA.DeviceMemory}, CuArray{ComplexF32, 2, CUDA.DeviceMemory}, Int64, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Lux.Chain{@NamedTuple{layer_1::Lux.WrappedFunction{typeof(CoupledNODE.collocate)}, layer_2::Lux.Conv{typeof(tanh), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_3::Lux.Conv{typeof(identity), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_4::Lux.WrappedFunction{typeof(CoupledNODE.decollocate)}}, Nothing}, @NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}, layer_3::@NamedTuple{}, layer_4::@NamedTuple{}}}, MLDataDevices.CUDADevice{Nothing}}, data::@NamedTuple{u::SubArray{Float32, 4, Array{Float32, 5}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}, Int64, UnitRange{Int64}}, false}, t::CuArray{Float32, 1, CUDA.DeviceMemory}}, ts::Lux.Training.TrainState{Nothing, Nothing, Lux.Chain{@NamedTuple{layer_1::Lux.WrappedFunction{typeof(CoupledNODE.collocate)}, layer_2::Lux.Conv{typeof(tanh), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_3::Lux.Conv{typeof(identity), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_4::Lux.WrappedFunction{typeof(CoupledNODE.decollocate)}}, Nothing}, ComponentArrays.ComponentVector{Float32, CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{ComponentArrays.Axis{(layer_1 = 1:0, layer_2 = ViewAxis(1:196, Axis(weight = ViewAxis(1:196, ShapedAxis((7, 7, 2, 2))),)), layer_3 = ViewAxis(197:392, Axis(weight = ViewAxis(1:196, ShapedAxis((7, 7, 2, 2))),)), layer_4 = 393:392)}}}, @NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}, layer_3::@NamedTuple{}, layer_4::@NamedTuple{}}, Optimisers.Adam, Optimisers.Leaf{Optimisers.Adam, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{Float32, Float32}}}})
    @ Lux.Training /var/scratch/lorozco/.julia/packages/Lux/JbRSn/src/helpers/training.jl:276
 [11] train(model::Lux.Chain{@NamedTuple{layer_1::Lux.WrappedFunction{typeof(CoupledNODE.collocate)}, layer_2::Lux.Conv{typeof(tanh), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_3::Lux.Conv{typeof(identity), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_4::Lux.WrappedFunction{typeof(CoupledNODE.decollocate)}}, Nothing}, ps::ComponentArrays.ComponentVector{Float32, CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{ComponentArrays.Axis{(layer_1 = 1:0, layer_2 = ViewAxis(1:196, Axis(weight = ViewAxis(1:196, ShapedAxis((7, 7, 2, 2))),)), layer_3 = ViewAxis(197:392, Axis(weight = ViewAxis(1:196, ShapedAxis((7, 7, 2, 2))),)), layer_4 = 393:392)}}}, st::@NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}, layer_3::@NamedTuple{}, layer_4::@NamedTuple{}}, train_dataloader::CoupledNODE.NavierStokes.var"#dataloader#13"{Int64, Random.Xoshiro, @NamedTuple{u::Array{Float32, 5}, t::Matrix{Float32}}}, loss_function::CoupledNODE.var"#loss_function#92"{Tsit5{typeof(OrdinaryDiffEqCore.trivial_limiter!), typeof(OrdinaryDiffEqCore.trivial_limiter!), Static.False}, @Kwargs{}, CoupledNODE.NavierStokes.var"#right_hand_side#2"{@NamedTuple{grid::@NamedTuple{xlims::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, dimension::IncompressibleNavierStokes.Dimension{2}, N::Tuple{Int64, Int64}, Nu::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}, Np::Tuple{Int64, Int64}, Iu::Tuple{CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Ip::CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, x::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, xu::Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, xp::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δ::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δu::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, A::Tuple{Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}}}, boundary_conditions::Tuple{Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}, Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}}, Re::Float32, bodyforce::Nothing, issteadybodyforce::Bool, closure_model::Nothing, backend::CUDABackend, workgroupsize::Int64, temperature::Nothing}, IncompressibleNavierStokes.var"#psolve!#123"{CUDA.CUFFT.CuFFTPlan{ComplexF32, Float32, -1, false, 2}, CuArray{Float32, 2, CUDA.DeviceMemory}, CuArray{ComplexF32, 2, CUDA.DeviceMemory}, Int64, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Lux.Chain{@NamedTuple{layer_1::Lux.WrappedFunction{typeof(CoupledNODE.collocate)}, layer_2::Lux.Conv{typeof(tanh), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_3::Lux.Conv{typeof(identity), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_4::Lux.WrappedFunction{typeof(CoupledNODE.decollocate)}}, Nothing}, @NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}, layer_3::@NamedTuple{}, layer_4::@NamedTuple{}}}, MLDataDevices.CUDADevice{Nothing}}; nepochs::Int64, ad_type::ADTypes.AutoZygote, alg::Optimisers.Adam, cpu::Bool, kwargs::@Kwargs{callback::CoupledNODE.var"#callback#98"{Bool, Bool, Int64, CoupledNODE.NavierStokes.var"#right_hand_side#2"{@NamedTuple{grid::@NamedTuple{xlims::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, dimension::IncompressibleNavierStokes.Dimension{2}, N::Tuple{Int64, Int64}, Nu::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}, Np::Tuple{Int64, Int64}, Iu::Tuple{CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Ip::CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, x::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, xu::Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, xp::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δ::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δu::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, A::Tuple{Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}}}, boundary_conditions::Tuple{Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}, Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}}, Re::Float32, bodyforce::Nothing, issteadybodyforce::Bool, closure_model::Nothing, backend::CUDABackend, workgroupsize::Int64, temperature::Nothing}, IncompressibleNavierStokes.var"#psolve!#123"{CUDA.CUFFT.CuFFTPlan{ComplexF32, Float32, -1, false, 2}, CuArray{Float32, 2, CUDA.DeviceMemory}, CuArray{ComplexF32, 2, CUDA.DeviceMemory}, Int64, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Lux.Chain{@NamedTuple{layer_1::Lux.WrappedFunction{typeof(CoupledNODE.collocate)}, layer_2::Lux.Conv{typeof(tanh), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_3::Lux.Conv{typeof(identity), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_4::Lux.WrappedFunction{typeof(CoupledNODE.decollocate)}}, Nothing}, @NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}, layer_3::@NamedTuple{}, layer_4::@NamedTuple{}}}, ComponentArrays.ComponentVector{Float32, CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{ComponentArrays.Axis{(layer_1 = 1:0, layer_2 = ViewAxis(1:196, Axis(weight = ViewAxis(1:196, ShapedAxis((7, 7, 2, 2))),)), layer_3 = ViewAxis(197:392, Axis(weight = ViewAxis(1:196, ShapedAxis((7, 7, 2, 2))),)), layer_4 = 393:392)}}}, CoupledNODE.var"#loss_function#92"{Tsit5{typeof(OrdinaryDiffEqCore.trivial_limiter!), typeof(OrdinaryDiffEqCore.trivial_limiter!), Static.False}, @Kwargs{}, CoupledNODE.NavierStokes.var"#right_hand_side#2"{@NamedTuple{grid::@NamedTuple{xlims::Tuple{Tuple{Float32, Float32}, Tuple{Float32, Float32}}, dimension::IncompressibleNavierStokes.Dimension{2}, N::Tuple{Int64, Int64}, Nu::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}, Np::Tuple{Int64, Int64}, Iu::Tuple{CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Ip::CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, x::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, xu::Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, xp::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δ::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Δu::Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, A::Tuple{Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}, Tuple{Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}}}, boundary_conditions::Tuple{Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}, Tuple{IncompressibleNavierStokes.PeriodicBC, IncompressibleNavierStokes.PeriodicBC}}, Re::Float32, bodyforce::Nothing, issteadybodyforce::Bool, closure_model::Nothing, backend::CUDABackend, workgroupsize::Int64, temperature::Nothing}, IncompressibleNavierStokes.var"#psolve!#123"{CUDA.CUFFT.CuFFTPlan{ComplexF32, Float32, -1, false, 2}, CuArray{Float32, 2, CUDA.DeviceMemory}, CuArray{ComplexF32, 2, CUDA.DeviceMemory}, Int64, CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}, Lux.Chain{@NamedTuple{layer_1::Lux.WrappedFunction{typeof(CoupledNODE.collocate)}, layer_2::Lux.Conv{typeof(tanh), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_3::Lux.Conv{typeof(identity), Int64, Int64, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64, CoupledNODE.var"#glorot_uniform_T#5"{DataType}, Nothing, Static.False, Static.False}, layer_4::Lux.WrappedFunction{typeof(CoupledNODE.decollocate)}}, Nothing}, @NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}, layer_3::@NamedTuple{}, layer_4::@NamedTuple{}}}, MLDataDevices.CUDADevice{Nothing}}, @NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}, layer_3::@NamedTuple{}, layer_4::@NamedTuple{}}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}, SubArray{Float32, 4, Array{Float32, 5}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}, Int64, UnitRange{Int64}}, false}}})
    @ CoupledNODE /var/scratch/lorozco/CoupledNODE.jl/src/train.jl:19
 [12] top-level scope
    @ ./timing.jl:503 [inlined]
 [13] top-level scope
    @ /var/scratch/lorozco/CoupledNODE.jl/simulations/NavierStokes_2D/scripts/train_posteriori.jl:0
in expression starting at /var/scratch/lorozco/CoupledNODE.jl/simulations/NavierStokes_2D/scripts/train_posteriori.jl:74

I'll be complementing this info with new findings.

Other notes:
I've modified the INS.Setup to have backend that can be either: CUDA.CUDABackend() or INS.CPU() 9exposed from KernelAbstractions. In version v2.0.1 of INS the backed is defined in INS.Setup so that other functions that we use in the rhs like: project or apply_pressure can use the GPU otherwise the operations cannot take place between variables not being in the same device.

The text was updated successfully, but these errors were encountered:

luisaforozco · 2024-11-22T14:01:22Z

Solved! It was because the use of macro CUDA.@allowscalar, Zygote doesn't allow differentiating trough macros.

luisaforozco closed this as completed Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU and Zygote in a posteriori training #123

GPU and Zygote in a posteriori training #123

luisaforozco commented Nov 22, 2024

luisaforozco commented Nov 22, 2024

GPU and Zygote in a posteriori training #123

GPU and Zygote in a posteriori training #123

Comments

luisaforozco commented Nov 22, 2024

luisaforozco commented Nov 22, 2024