You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to train a multi-label model on GPUs, but I consistently get the following error when using the built-in logitbinarycrossentropy function (see gpu_broken.jl). The code works fine on a CPU (comment out line 6). If I introduce my own function (below), the code works fine:
The only real difference is the location of the broadcast. If there is interest, I can submit a PR to address this problem or if I am doing something wrong, please let me know.
Note: I'm aware that my example is not multi-label, but I get the same error for my multi-label problem.
Error message:
┌ Warning: calls to Base intrinsics might be GPU incompatible
│ exception =
│ You called exp(x::T) where T<:Union{Float32, Float64} in Base.Math at special/exp.jl:75, maybe you intended to call exp(x::Float32) in CUDAnative at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/device/libdevice.jl:90 instead?
│ Stacktrace:
│ [1] exp at special/exp.jl:75
│ [2] #25 at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
└ @ CUDAnative ~/.julia/packages/CUDAnative/AGfq2/src/compiler/irgen.jl:111
┌ Warning: calls to Base intrinsics might be GPU incompatible
│ exception =
│ You called exp(x::T) where T<:Union{Float32, Float64} in Base.Math at special/exp.jl:75, maybe you intended to call exp(x::Float32) in CUDAnative at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/device/libdevice.jl:90 instead?
│ Stacktrace:
│ [1] exp at special/exp.jl:75
│ [2] #25 at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
└ @ CUDAnative ~/.julia/packages/CUDAnative/AGfq2/src/compiler/irgen.jl:111
┌ Warning: calls to Base intrinsics might be GPU incompatible
│ exception =
│ You called log(x::Float32) in Base.Math at special/log.jl:290, maybe you intended to call log(x::Float32) in CUDAnative at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/device/libdevice.jl:66 instead?
│ Stacktrace:
│ [1] log at special/log.jl:290
│ [2] #25 at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
└ @ CUDAnative ~/.julia/packages/CUDAnative/AGfq2/src/compiler/irgen.jl:111
ERROR: LoadError: GPU compilation of #25(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(logitbinarycrossentropy),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Flux.OneHotMatrix{CUDAnative.CuDeviceArray{Flux.OneHotVector,1,CUDAnative.AS.Global}},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}) failed
KernelError: recursion is currently not supported
Try inspecting the generated code with any of the @device_code_... macros.
Stacktrace:
[1] #IOBuffer#300 at iobuffer.jl:112
[2] Type at none:0
[3] print_to_string at strings/io.jl:112
[4] #IOBuffer#299 at iobuffer.jl:91
[5] #IOBuffer#300 at iobuffer.jl:112
[6] print_to_string at strings/io.jl:112
[7] throw_complex_domainerror at math.jl:31
[8] log at special/log.jl:290
[9] #25 at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
Stacktrace:
[1] (::getfield(CUDAnative, Symbol("#hook_emit_function#58")){CUDAnative.CompilerContext,Array{Core.MethodInstance,1}})(::Core.MethodInstance, ::Core.CodeInfo, ::UInt64) at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/compiler/irgen.jl:97
[2] irgen(::CUDAnative.CompilerContext) at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/compiler/irgen.jl:133
[3] #compile_function#78(::Bool, ::Function, ::CUDAnative.CompilerContext) at ./logging.jl:308
[4] compile_function at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/compiler/driver.jl:56 [inlined]
[5] #cufunction#77(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::CUDAdrv.CuDevice, ::Any, ::Any) at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/compiler/driver.jl:22
[6] cufunction at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/compiler/driver.jl:10 [inlined]
[7] macro expansion at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/execution.jl:180 [inlined]
[8] _cuda(::getfield(GPUArrays, Symbol("##25#26")), ::Tuple{}, ::NamedTuple{(:blocks, :threads),Tuple{Tuple{Int64},Tuple{Int64}}}, ::CuArrays.CuKernelState, ::CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(logitbinarycrossentropy),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Flux.OneHotMatrix{CUDAnative.CuDeviceArray{Flux.OneHotVector,1,CUDAnative.AS.Global}},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}) at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/execution.jl:139
[9] _gpu_call(::CuArrays.CuArrayBackend, ::Function, ::CuArray{Float32,2}, ::Tuple{CuArray{Float32,2},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(logitbinarycrossentropy),Tuple{Base.Broadcast.Extruded{CuArray{Float32,2},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at ./gcutils.jl:87
[10] gpu_call(::Function, ::CuArray{Float32,2}, ::Tuple{CuArray{Float32,2},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(logitbinarycrossentropy),Tuple{Base.Broadcast.Extruded{CuArray{Float32,2},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}, ::Int64) at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/abstract_gpu_interface.jl:151
[11] gpu_call at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/abstract_gpu_interface.jl:128 [inlined]
[12] copyto! at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:57 [inlined]
[13] copyto! at ./broadcast.jl:792 [inlined]
[14] copy at ./broadcast.jl:768 [inlined]
[15] materialize at ./broadcast.jl:748 [inlined]
[16] broadcast(::typeof(logitbinarycrossentropy), ::CuArray{Float32,2}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at ./broadcast.jl:702
[17] ∇broadcast at /home/jonathan_fine/.julia/packages/Flux/xMoJh/src/tracker/array.jl:390 [inlined]
[18] materialize(::Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(logitbinarycrossentropy),Tuple{TrackedArray{…,CuArray{Float32,2}},Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}}}) at /home/jonathan_fine/.julia/packages/Flux/xMoJh/src/tracker/array.jl:421
[19] loss(::CuArray{Float32,2}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at /home/jonathan_fine/projects/0035_spectra_analysis/my_turn/gpu_broken.jl:24
[20] #train!#121(::getfield(Flux, Symbol("#throttled#18")){getfield(Flux, Symbol("##throttled#10#14")){Bool,Bool,getfield(Main, Symbol("##3#4")),Int64}}, ::Function, ::Function, ::Base.Iterators.Take{Base.Iterators.Repeated{Tuple{CuArray{Float32,2},Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}}}}, ::getfield(Flux.Optimise, Symbol("##43#47"))) at /home/jonathan_fine/.julia/packages/Juno/46C8i/src/progress.jl:109
[21] (::getfield(Flux.Optimise, Symbol("#kw##train!")))(::NamedTuple{(:cb,),Tuple{getfield(Flux, Symbol("#throttled#18")){getfield(Flux, Symbol("##throttled#10#14")){Bool,Bool,getfield(Main, Symbol("##3#4")),Int64}}}}, ::typeof(Flux.Optimise.train!), ::Function, ::Base.Iterators.Take{Base.Iterators.Repeated{Tuple{CuArray{Float32,2},Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}}}}, ::Function) at ./none:0
[22] top-level scope at none:0
[23] include at ./boot.jl:317 [inlined]
[24] include_relative(::Module, ::String) at ./loading.jl:1041
[25] include(::Module, ::String) at ./sysimg.jl:29
[26] exec_options(::Base.JLOptions) at ./client.jl:229
[27] _start() at ./client.jl:421
940: Fix logitbinarycrossentropy on CuArrays r=MikeInnes a=matsueushi
The issue of logitbinarycrossentropy on GPU #464 can be also fixed by @janEbert's approach #926.
Co-authored-by: matsueushi <matsueushi@gmail.com>
I'm trying to train a multi-label model on GPUs, but I consistently get the following error when using the built-in
logitbinarycrossentropy
function (see gpu_broken.jl). The code works fine on a CPU (comment out line 6). If I introduce my own function (below), the code works fine:The only real difference is the location of the broadcast. If there is interest, I can submit a PR to address this problem or if I am doing something wrong, please let me know.
Note: I'm aware that my example is not multi-label, but I get the same error for my multi-label problem.
Error message:
nvidia-smi:
gpu_broken.jl:
The text was updated successfully, but these errors were encountered: