Can't load a Fluxml trained & saved model. Getting ERROR: CUDA error: invalid device context (code 201, ERROR_INVALID_CONTEXT) #2461

georgegrosu1 · 2024-06-19T19:51:27Z

So context is simple. I train a model in Fluxml using my GPU (NVIDIA GeForce RTX 3080 notebook), I save the model states using JLD2, all good until here. But when I try to load the model using saved states in this manner I get the error:

julia> using Flux, JLD2, CUDA

julia> include("src/utilities/cfg_parse.jl")
parse_terminal_args (generic function with 1 method)

julia> include("src/nets/net_build.jl")
admm_restoration_model (generic function with 1 method)

julia> cfg = fetch_json_data("train_cfg.json")
Dict{String, Any} with 9 entries:
  "epochs"          => 130
  "lr_rate"         => 0.0004
  "im_shape"        => Any[256, 256]
  "use_iso"         => true
  "branches"        => 2
  "model_save_path" => "/models_weights"
  "train_data"      => Dict{String, Any}("x_path"=>"D:/Projects/ISETC2022/dcnn-deblur/dataset/GOPRO_Large/xt_256_0p8blur_10noise", "y_path"=>"D:/Projects/ISETC2022/dcnn-deblur/dataset/GOPRO_Large/xt_256_0p8blur_10noise")
  "batch_size"      => 3
  "eval_data"       => Dict{String, Any}("x_path"=>"D:/Projects/ISETC2022/dcnn-deblur/dataset/GOPRO_Large/xt_256_0p8blur_10noise", "y_path"=>"D:/Projects/ISETC2022/dcnn-deblur/dataset/GOPRO_Large/xt_256_0p8blur_10noise")

julia> model = admm_restoration_model(cfg)


MODEL SIZE (#parameters): 3581088Chain(
  Parallel(
    chcat,
    Chain(
      ADMMDeconv{typeof(relu6), Array{Float32, 4}, Vector{Float32}, Bool, Vector{Float32}, Int64, Bool, Float32}(NNlib.relu6, Float32[0.035917997 0.079589315 … 0.06321434 -0.08105426; 0.010757283 -0.061283972 … 0.040873725 -0.11001465; … ; 0.07982061 -0.09194836 … 0.091350466 -0.14958367; -0.053058878 0.097258545 … 0.1495896 -0.14328365;;;;], false, Float32[0.0006751783], Float32[0.5695928], 50, true, 0.0f0),  # 102 parameters
      ConvTranspose((38, 38), 3 => 18),  # 77_994 parameters
      Conv((19, 19), 18 => 18),         # 116_982 parameters
      AdaptiveMaxPool((256, 256)),
      BatchNorm(18, relu6),             # 36 parameters, plus 36
      ConvTranspose((20, 20), 18 => 32),  # 230_432 parameters
      Conv((10, 10), 32 => 32),         # 102_432 parameters
      AdaptiveMaxPool((256, 256)),
      BatchNorm(32, relu6),             # 64 parameters, plus 64
      ConvTranspose((16, 16), 32 => 64),  # 524_352 parameters
      Conv((8, 8), 64 => 64),           # 262_208 parameters
      AdaptiveMaxPool((256, 256)),
      BatchNorm(64, relu6),             # 128 parameters, plus 128
      ConvTranspose((16, 16), 64 => 64),  # 1_048_640 parameters
      Conv((8, 8), 64 => 64),           # 262_208 parameters
      AdaptiveMaxPool((256, 256)),
      BatchNorm(64, relu6),             # 128 parameters, plus 128
    ),
    Chain(
      ADMMDeconv{typeof(relu6), Array{Float32, 4}, Vector{Float32}, Bool, Vector{Float32}, Int64, Bool, Float32}(NNlib.relu6, Float32[0.075822905 -0.050852973 … 0.08122373 -0.039612506; 0.026294839 -0.009715072 … 0.03403802 0.015126286; … ; 0.052702498 -0.0404368 
… 0.037942544 -0.005757671; 0.08515987 -0.02476077 … 0.06367684 -0.004382413;;;;], false, Float32[1.0818578], Float32[0.14859931], 50, true, 0.0f0),  # 402 parameters
      ConvTranspose((38, 38), 3 => 3),  # 12_999 parameters
      BatchNorm(3, relu6),              # 6 parameters, plus 6
      ADMMDeconv{typeof(relu6), Array{Float32, 4}, Vector{Float32}, Bool, Vector{Float32}, Int64, Bool, Float32}(NNlib.relu6, Float32[-0.020457862 0.124111876 … -0.096539654 0.029231917; 0.13135242 0.052027464 … 0.024933446 -0.14350384; … ; 0.15128526 0.010382508 
… -0.050241567 -0.096333385; -0.030062137 0.0784706 … -0.029577373 0.13084307;;;;], false, Float32[0.034583375], Float32[1.2101591], 50, true, 0.0f0),  # 102 parameters
      Conv((19, 19), 3 => 18),          # 19_512 parameters
      BatchNorm(18, relu6),             # 36 parameters, plus 36
      Conv((10, 10), 18 => 18),         # 32_418 parameters
      BatchNorm(18, relu6),             # 36 parameters, plus 36
      Conv((8, 8), 18 => 18),           # 20_754 parameters
      BatchNorm(18, relu6),             # 36 parameters, plus 36
      AdaptiveMaxPool((256, 256)),
      ConvTranspose((16, 16), 18 => 18),  # 82_962 parameters
      BatchNorm(18, relu6),             # 36 parameters, plus 36
      ConvTranspose((20, 20), 18 => 32),  # 230_432 parameters
      BatchNorm(32, relu6),             # 64 parameters, plus 64
      ConvTranspose((16, 16), 32 => 64),  # 524_352 parameters
      BatchNorm(64, relu6),             # 128 parameters, plus 128
      AdaptiveMaxPool((256, 256)),
    ),
  ),
  ConvTranspose((9, 9), 128 => 3, relu6),  # 31_107 parameters
  AdaptiveMaxPool((256, 256)),
)         # Total: 63 trainable arrays, 3_581_088 parameters,
          # plus 22 non-trainable, 698 parameters, summarysize 13.679 MiB.

julia> model_state = JLD2.load("D:/Projects/admm-deconv/trained_models/plm/plm-ep_4-vloss_0.4733-psnr_5.8388-mse_0.2607.jld2", "model_state");

julia> Flux.loadmodel!(model, model_state)
ERROR: CUDA error: invalid device context (code 201, ERROR_INVALID_CONTEXT)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA C:\Users\George\.julia\packages\CUDA\75aiI\lib\cudadrv\libcuda.jl:30
  [2] isvalid(ctx::CuContext)
    @ CUDA C:\Users\George\.julia\packages\CUDA\75aiI\lib\cudadrv\context.jl:75
  [3] #context!#990
    @ C:\Users\George\.julia\packages\CUDA\75aiI\lib\cudadrv\state.jl:165 [inlined]
  [4] context!
    @ C:\Users\George\.julia\packages\CUDA\75aiI\lib\cudadrv\state.jl:163 [inlined]
  [5] unsafe_copyto!(dest::Vector{Float32}, doffs::Int64, src::CuArray{Float32, 1, CUDA.DeviceMemory}, soffs::Int64, n::Int64)
    @ CUDA C:\Users\George\.julia\packages\CUDA\75aiI\src\array.jl:550
  [6] copyto!
    @ C:\Users\George\.julia\packages\CUDA\75aiI\src\array.jl:503 [inlined]
  [7] copyto!
    @ C:\Users\George\.julia\packages\CUDA\75aiI\src\array.jl:507 [inlined]
  [8] loadleaf!(dst::Vector{Float32}, src::CuArray{Float32, 1, CUDA.DeviceMemory})
    @ Flux C:\Users\George\.julia\packages\Flux\CUn7U\src\loading.jl:22
  [9] loadmodel!(dst::ADMMDeconv{typeof(relu6), Array{…}, Vector{…}, Bool, Vector{…}, Int64, Bool, Float32}, src::@NamedTuple{weight::CuArray{…}, bias::Bool, λ::CuArray{…}, ρ::CuArray{…}}; filter::Function, cache::Base.IdSet{Any})
    @ Flux C:\Users\George\.julia\packages\Flux\CUn7U\src\loading.jl:103
 [10] loadmodel!(dst::Tuple{…}, src::Tuple{…}; filter::Function, cache::Base.IdSet{…})
    @ Flux C:\Users\George\.julia\packages\Flux\CUn7U\src\loading.jl:105
 [11] loadmodel!(dst::Chain{Tuple{…}}, src::@NamedTuple{layers::Tuple{…}}; filter::Function, cache::Base.IdSet{Any})
    @ Flux C:\Users\George\.julia\packages\Flux\CUn7U\src\loading.jl:105
 [12] loadmodel!(dst::Tuple{Chain{Tuple{…}}, Chain{Tuple{…}}}, src::Tuple{@NamedTuple{layers::Tuple{…}}, @NamedTuple{layers::Tuple{…}}}; filter::Function, cache::Base.IdSet{Any})
    @ Flux C:\Users\George\.julia\packages\Flux\CUn7U\src\loading.jl:105
 [13] loadmodel!(dst::Parallel{typeof(chcat), Tuple{Chain{Tuple{…}}, Chain{Tuple{…}}}}, src::@NamedTuple{connection::Tuple{}, layers::Tuple{@NamedTuple{layers::Tuple{…}}, @NamedTuple{layers::Tuple{…}}}}; filter::Function, cache::Base.IdSet{Any})
    @ Flux C:\Users\George\.julia\packages\Flux\CUn7U\src\loading.jl:105
 [14] loadmodel!(dst::Tuple{Parallel{…}, ConvTranspose{…}, AdaptiveMaxPool{…}}, src::Tuple{@NamedTuple{…}, @NamedTuple{…}, Tuple{}}; filter::Function, cache::Base.IdSet{Any})
    @ Flux C:\Users\George\.julia\packages\Flux\CUn7U\src\loading.jl:105
 [15] loadmodel!(dst::Chain{Tuple{Parallel{…}, ConvTranspose{…}, AdaptiveMaxPool{…}}}, src::@NamedTuple{layers::Tuple{@NamedTuple{…}, @NamedTuple{…}, Tuple{}}}; filter::Function, cache::Base.IdSet{Any})
    @ Flux C:\Users\George\.julia\packages\Flux\CUn7U\src\loading.jl:105
 [16] loadmodel!(dst::Chain{Tuple{Parallel{…}, ConvTranspose{…}, AdaptiveMaxPool{…}}}, src::@NamedTuple{layers::Tuple{@NamedTuple{…}, @NamedTuple{…}, Tuple{}}})
    @ Flux C:\Users\George\.julia\packages\Flux\CUn7U\src\loading.jl:90
 [17] top-level scope
    @ REPL[7]:1
Some type information was truncated. Use `show(err)` to see complete types.

Note that here I use a custom made layer. Thought that might be because of it. I tried training a model without this custom layer, using only base ones. Still, the same error persists

The text was updated successfully, but these errors were encountered:

georgegrosu1 · 2024-06-21T16:31:17Z

When I was training the model on the GPU, I saved the state of the model directly from GPU, instead of moving it on the CPU and then saving, as recommended in the documentation for GPU Support. Changed the way how the states are saved and solved the problem.

georgegrosu1 closed this as completed Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't load a Fluxml trained & saved model. Getting ERROR: CUDA error: invalid device context (code 201, ERROR_INVALID_CONTEXT) #2461

Can't load a Fluxml trained & saved model. Getting ERROR: CUDA error: invalid device context (code 201, ERROR_INVALID_CONTEXT) #2461

georgegrosu1 commented Jun 19, 2024 •

edited

Loading

georgegrosu1 commented Jun 21, 2024

Can't load a Fluxml trained & saved model. Getting ERROR: CUDA error: invalid device context (code 201, ERROR_INVALID_CONTEXT) #2461

Can't load a Fluxml trained & saved model. Getting ERROR: CUDA error: invalid device context (code 201, ERROR_INVALID_CONTEXT) #2461

Comments

georgegrosu1 commented Jun 19, 2024 • edited Loading

georgegrosu1 commented Jun 21, 2024

georgegrosu1 commented Jun 19, 2024 •

edited

Loading