Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Front page example broken #17

Closed
maxfreu opened this issue May 9, 2022 · 5 comments
Closed

Front page example broken #17

maxfreu opened this issue May 9, 2022 · 5 comments

Comments

@maxfreu
Copy link

maxfreu commented May 9, 2022

Hi! Thanks for this interesting work! I just tried the front page example and it turned out not to work for me. Taking the gradient fails with:

julia> gs = gradient(p -> sum(Lux.apply(model, x, p, st)[1]), ps)[1]
ERROR: Compiling Tuple{NNlibCUDA.var"##cudnnBNForward!#87", Nothing, Float32, Float32, Float32, Bool, Bool, Bool, typeof(NNlibCUDA.cudnnBNForward!), CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32}: try/catch is not supported.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] instrument(ir::IRTools.Inner.IR)
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/reverse.jl:121
  [3] #Primal#19
    @ ~/.julia/packages/Zygote/Y6SC4/src/compiler/reverse.jl:202 [inlined]
  [4] Zygote.Adjoint(ir::IRTools.Inner.IR; varargs::Nothing, normalise::Bool)
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/reverse.jl:315
  [5] _generate_pullback_via_decomposition(T::Type)
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/emit.jl:101
  [6] #s3043#1206
    @ ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:28 [inlined]
  [7] var"#s3043#1206"(::Any, ctx::Any, f::Any, args::Any)
    @ Zygote ./none:0
  [8] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
    @ Core ./boot.jl:580
  [9] _pullback
    @ ~/.julia/packages/NNlibCUDA/i1IW9/src/cudnn/batchnorm.jl:48 [inlined]
 [10] _pullback(::Zygote.Context, ::NNlibCUDA.var"#cudnnBNForward!##kw", ::NamedTuple{(:eps, :training), Tuple{Float32, Bool}}, ::typeof(NNlibCUDA.cudnnBNForward!), ::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, ::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, ::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::Float32)
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [11] _pullback (repeats 2 times)
    @ ~/.julia/packages/NNlibCUDA/i1IW9/src/cudnn/batchnorm.jl:37 [inlined]
 [12] _pullback
    @ ~/.julia/packages/NNlibCUDA/i1IW9/src/cudnn/batchnorm.jl:31 [inlined]
 [13] _pullback(::Zygote.Context, ::NNlibCUDA.var"##batchnorm#85", ::Base.Pairs{Symbol, Real, Tuple{Symbol, Symbol}, NamedTuple{(:eps, :training), Tuple{Float32, Bool}}}, ::typeof(NNlibCUDA.batchnorm), ::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::Float32)
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [14] _pullback
    @ ~/.julia/packages/NNlibCUDA/i1IW9/src/cudnn/batchnorm.jl:30 [inlined]
 [15] _pullback
    @ ~/.julia/packages/Lux/HkXlk/src/layers/normalize.jl:114 [inlined]
 [16] _pullback(::Zygote.Context, ::BatchNorm{true, true, typeof(identity), typeof(Lux.zeros32), typeof(Lux.ones32), Float32}, ::CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ::NamedTuple{(:γ, :β), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, ::NamedTuple{(:μ, :σ², :training), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Bool}})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [17] macro expansion
    @ ~/.julia/packages/Lux/HkXlk/src/layers/basic.jl:0 [inlined]
 [18] _pullback
    @ ~/.julia/packages/Lux/HkXlk/src/layers/basic.jl:330 [inlined]
 [19] _pullback(::Zygote.Context, ::typeof(Lux.applychain), ::NamedTuple{(:layer_1, :layer_2, :layer_3, :layer_4, :layer_5), Tuple{BatchNorm{true, true, typeof(identity), typeof(Lux.zeros32), typeof(Lux.ones32), Float32}, Dense{true, typeof(NNlib.tanh_fast), typeof(Lux.glorot_uniform), typeof(Lux.zeros32)}, BatchNorm{true, true, typeof(identity), typeof(Lux.zeros32), typeof(Lux.ones32), Float32}, Dense{true, typeof(NNlib.tanh_fast), typeof(Lux.glorot_uniform), typeof(Lux.zeros32)}, Dense{true, typeof(identity), typeof(Lux.glorot_uniform), typeof(Lux.zeros32)}}}, ::CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ::NamedTuple{(:layer_1, :layer_2, :layer_3, :layer_4, :layer_5), Tuple{NamedTuple{(:γ, :β), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:γ, :β), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}}, ::NamedTuple{(:layer_1, :layer_2, :layer_3, :layer_4, :layer_5), Tuple{NamedTuple{(:μ, :σ², :training), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Bool}}, NamedTuple{(), Tuple{}}, NamedTuple{(:μ, :σ², :training), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Bool}}, NamedTuple{(), Tuple{}}, NamedTuple{(), Tuple{}}}})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [20] _pullback
    @ ~/.julia/packages/Lux/HkXlk/src/layers/basic.jl:328 [inlined]
 [21] _pullback(::Zygote.Context, ::Chain{NamedTuple{(:layer_1, :layer_2, :layer_3, :layer_4, :layer_5), Tuple{BatchNorm{true, true, typeof(identity), typeof(Lux.zeros32), typeof(Lux.ones32), Float32}, Dense{true, typeof(NNlib.tanh_fast), typeof(Lux.glorot_uniform), typeof(Lux.zeros32)}, BatchNorm{true, true, typeof(identity), typeof(Lux.zeros32), typeof(Lux.ones32), Float32}, Dense{true, typeof(NNlib.tanh_fast), typeof(Lux.glorot_uniform), typeof(Lux.zeros32)}, Dense{true, typeof(identity), typeof(Lux.glorot_uniform), typeof(Lux.zeros32)}}}}, ::CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ::NamedTuple{(:layer_1, :layer_2, :layer_3, :layer_4, :layer_5), Tuple{NamedTuple{(:γ, :β), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:γ, :β), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}}, ::NamedTuple{(:layer_1, :layer_2, :layer_3, :layer_4, :layer_5), Tuple{NamedTuple{(:μ, :σ², :training), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Bool}}, NamedTuple{(), Tuple{}}, NamedTuple{(:μ, :σ², :training), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Bool}}, NamedTuple{(), Tuple{}}, NamedTuple{(), Tuple{}}}})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [22] _pullback
    @ ~/.julia/packages/Lux/HkXlk/src/core.jl:61 [inlined]
 [23] _pullback(::Zygote.Context, ::typeof(Lux.apply), ::Chain{NamedTuple{(:layer_1, :layer_2, :layer_3, :layer_4, :layer_5), Tuple{BatchNorm{true, true, typeof(identity), typeof(Lux.zeros32), typeof(Lux.ones32), Float32}, Dense{true, typeof(NNlib.tanh_fast), typeof(Lux.glorot_uniform), typeof(Lux.zeros32)}, BatchNorm{true, true, typeof(identity), typeof(Lux.zeros32), typeof(Lux.ones32), Float32}, Dense{true, typeof(NNlib.tanh_fast), typeof(Lux.glorot_uniform), typeof(Lux.zeros32)}, Dense{true, typeof(identity), typeof(Lux.glorot_uniform), typeof(Lux.zeros32)}}}}, ::CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ::NamedTuple{(:layer_1, :layer_2, :layer_3, :layer_4, :layer_5), Tuple{NamedTuple{(:γ, :β), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:γ, :β), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}}, ::NamedTuple{(:layer_1, :layer_2, :layer_3, :layer_4, :layer_5), Tuple{NamedTuple{(:μ, :σ², :training), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Bool}}, NamedTuple{(), Tuple{}}, NamedTuple{(:μ, :σ², :training), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Bool}}, NamedTuple{(), Tuple{}}, NamedTuple{(), Tuple{}}}})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [24] _pullback
    @ ./REPL[10]:1 [inlined]
 [25] _pullback(ctx::Zygote.Context, f::var"#1#2", args::NamedTuple{(:layer_1, :layer_2, :layer_3, :layer_4, :layer_5), Tuple{NamedTuple{(, ), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(, ), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [26] _pullback(f::Function, args::NamedTuple{(:layer_1, :layer_2, :layer_3, :layer_4, :layer_5), Tuple{NamedTuple{(, ), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(, ), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface.jl:34
 [27] pullback(f::Function, args::NamedTuple{(:layer_1, :layer_2, :layer_3, :layer_4, :layer_5), Tuple{NamedTuple{(, ), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(, ), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface.jl:40
 [28] gradient(f::Function, args::NamedTuple{(:layer_1, :layer_2, :layer_3, :layer_4, :layer_5), Tuple{NamedTuple{(, ), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(, ), Tuple{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface.jl:75
 [29] top-level scope
    @ REPL[10]:1
 [30] top-level scope
    @ ~/.julia/packages/CUDA/qAl31/src/initialization.jl:52

Package status output & version:

  [6e4b80f9] BenchmarkTools v1.3.1
  [587475ba] Flux v0.13.0
  [bdcacae8] LoopVectorization v0.12.108
  [b2108857] Lux v0.3.0 `git@github.com:avik-pal/Lux.jl.git#main`
  [356022a1] NamedDims v0.2.47
  [3bd65402] Optimisers v0.2.3
  [c46f51b8] ProfileView v1.5.1
  [94979ff8] RSPointMatching v0.1.0 `~/projects/RSPointMatching`
  [90137ffa] StaticArrays v1.4.4
  [e88e6eb3] Zygote v0.6.39
  [9a3f8284] Random

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen 7 1700X Eight-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, znver1)
Environment:
  JULIA_PKG_DEVDIR = projects/
@avik-pal
Copy link
Member

avik-pal commented May 9, 2022

Seems like something broke in NNlibCUDA for 2D batchnorm.

MWE:

using Lux, NNlibCUDA, Zygote

x = randn(Float32, 2, 2) |> gpu;
z = randn(Float32, 2) |> gpu

gradient(sum  NNlibCUDA.batchnorm, z, z, x, z, z, 0.1f0)

@ToucheSir are you aware of any recent change that might break this?

@ToucheSir
Copy link
Contributor

Not that I'm aware of. However, I don't think that MWE should work at all because NNlibCUDA does not define a rrule for its batchnorm function (this is awkwardly handled in Flux at present, but should really be in NNlib).

@avik-pal
Copy link
Member

avik-pal commented May 9, 2022

I see.

@maxfreu this should be fixed on main.

@avik-pal avik-pal closed this as completed May 9, 2022
@maxfreu
Copy link
Author

maxfreu commented May 9, 2022

Ah nice. But to me it looks like this should go into NNlib, indeed.

@ToucheSir
Copy link
Contributor

FluxML/NNlib.jl#19 is nearing on a half-decade, so perhaps we should get it done before then 😆. PRs very much welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants