Binary cross entropy does not work on GPUs #464

frodofine · 2018-10-26T15:30:50Z

I'm trying to train a multi-label model on GPUs, but I consistently get the following error when using the built-in logitbinarycrossentropy function (see gpu_broken.jl). The code works fine on a CPU (comment out line 6). If I introduce my own function (below), the code works fine:

mylogitbinarycrossentropy(logŷ, y) = (1 .- y).*logŷ .- logσ.(logŷ)
loss(x, y) = mean(mylogitbinarycrossentropy(m(x), y))

The only real difference is the location of the broadcast. If there is interest, I can submit a PR to address this problem or if I am doing something wrong, please let me know.

Note: I'm aware that my example is not multi-label, but I get the same error for my multi-label problem.

Error message:

┌ Warning: calls to Base intrinsics might be GPU incompatible
│   exception =
│    You called exp(x::T) where T<:Union{Float32, Float64} in Base.Math at special/exp.jl:75, maybe you intended to call exp(x::Float32) in CUDAnative at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/device/libdevice.jl:90 instead?
│    Stacktrace:
│     [1] exp at special/exp.jl:75
│     [2] #25 at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
└ @ CUDAnative ~/.julia/packages/CUDAnative/AGfq2/src/compiler/irgen.jl:111
┌ Warning: calls to Base intrinsics might be GPU incompatible
│   exception =
│    You called exp(x::T) where T<:Union{Float32, Float64} in Base.Math at special/exp.jl:75, maybe you intended to call exp(x::Float32) in CUDAnative at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/device/libdevice.jl:90 instead?
│    Stacktrace:
│     [1] exp at special/exp.jl:75
│     [2] #25 at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
└ @ CUDAnative ~/.julia/packages/CUDAnative/AGfq2/src/compiler/irgen.jl:111
┌ Warning: calls to Base intrinsics might be GPU incompatible
│   exception =
│    You called log(x::Float32) in Base.Math at special/log.jl:290, maybe you intended to call log(x::Float32) in CUDAnative at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/device/libdevice.jl:66 instead?
│    Stacktrace:
│     [1] log at special/log.jl:290
│     [2] #25 at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
└ @ CUDAnative ~/.julia/packages/CUDAnative/AGfq2/src/compiler/irgen.jl:111
ERROR: LoadError: GPU compilation of #25(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(logitbinarycrossentropy),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Flux.OneHotMatrix{CUDAnative.CuDeviceArray{Flux.OneHotVector,1,CUDAnative.AS.Global}},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}) failed
KernelError: recursion is currently not supported

Try inspecting the generated code with any of the @device_code_... macros.

Stacktrace:
 [1] #IOBuffer#300 at iobuffer.jl:112
 [2] Type at none:0
 [3] print_to_string at strings/io.jl:112
 [4] #IOBuffer#299 at iobuffer.jl:91
 [5] #IOBuffer#300 at iobuffer.jl:112
 [6] print_to_string at strings/io.jl:112
 [7] throw_complex_domainerror at math.jl:31
 [8] log at special/log.jl:290
 [9] #25 at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
Stacktrace:
 [1] (::getfield(CUDAnative, Symbol("#hook_emit_function#58")){CUDAnative.CompilerContext,Array{Core.MethodInstance,1}})(::Core.MethodInstance, ::Core.CodeInfo, ::UInt64) at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/compiler/irgen.jl:97
 [2] irgen(::CUDAnative.CompilerContext) at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/compiler/irgen.jl:133
 [3] #compile_function#78(::Bool, ::Function, ::CUDAnative.CompilerContext) at ./logging.jl:308
 [4] compile_function at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/compiler/driver.jl:56 [inlined]
 [5] #cufunction#77(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::CUDAdrv.CuDevice, ::Any, ::Any) at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/compiler/driver.jl:22
 [6] cufunction at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/compiler/driver.jl:10 [inlined]
 [7] macro expansion at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/execution.jl:180 [inlined]
 [8] _cuda(::getfield(GPUArrays, Symbol("##25#26")), ::Tuple{}, ::NamedTuple{(:blocks, :threads),Tuple{Tuple{Int64},Tuple{Int64}}}, ::CuArrays.CuKernelState, ::CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(logitbinarycrossentropy),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Flux.OneHotMatrix{CUDAnative.CuDeviceArray{Flux.OneHotVector,1,CUDAnative.AS.Global}},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}) at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/execution.jl:139
 [9] _gpu_call(::CuArrays.CuArrayBackend, ::Function, ::CuArray{Float32,2}, ::Tuple{CuArray{Float32,2},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(logitbinarycrossentropy),Tuple{Base.Broadcast.Extruded{CuArray{Float32,2},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at ./gcutils.jl:87
 [10] gpu_call(::Function, ::CuArray{Float32,2}, ::Tuple{CuArray{Float32,2},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(logitbinarycrossentropy),Tuple{Base.Broadcast.Extruded{CuArray{Float32,2},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}, ::Int64) at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/abstract_gpu_interface.jl:151
 [11] gpu_call at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/abstract_gpu_interface.jl:128 [inlined]
 [12] copyto! at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:57 [inlined]
 [13] copyto! at ./broadcast.jl:792 [inlined]
 [14] copy at ./broadcast.jl:768 [inlined]
 [15] materialize at ./broadcast.jl:748 [inlined]
 [16] broadcast(::typeof(logitbinarycrossentropy), ::CuArray{Float32,2}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at ./broadcast.jl:702
 [17] ∇broadcast at /home/jonathan_fine/.julia/packages/Flux/xMoJh/src/tracker/array.jl:390 [inlined]
 [18] materialize(::Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(logitbinarycrossentropy),Tuple{TrackedArray{…,CuArray{Float32,2}},Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}}}) at /home/jonathan_fine/.julia/packages/Flux/xMoJh/src/tracker/array.jl:421
 [19] loss(::CuArray{Float32,2}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at /home/jonathan_fine/projects/0035_spectra_analysis/my_turn/gpu_broken.jl:24
 [20] #train!#121(::getfield(Flux, Symbol("#throttled#18")){getfield(Flux, Symbol("##throttled#10#14")){Bool,Bool,getfield(Main, Symbol("##3#4")),Int64}}, ::Function, ::Function, ::Base.Iterators.Take{Base.Iterators.Repeated{Tuple{CuArray{Float32,2},Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}}}}, ::getfield(Flux.Optimise, Symbol("##43#47"))) at /home/jonathan_fine/.julia/packages/Juno/46C8i/src/progress.jl:109
 [21] (::getfield(Flux.Optimise, Symbol("#kw##train!")))(::NamedTuple{(:cb,),Tuple{getfield(Flux, Symbol("#throttled#18")){getfield(Flux, Symbol("##throttled#10#14")){Bool,Bool,getfield(Main, Symbol("##3#4")),Int64}}}}, ::typeof(Flux.Optimise.train!), ::Function, ::Base.Iterators.Take{Base.Iterators.Repeated{Tuple{CuArray{Float32,2},Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}}}}, ::Function) at ./none:0
 [22] top-level scope at none:0
 [23] include at ./boot.jl:317 [inlined]
 [24] include_relative(::Module, ::String) at ./loading.jl:1041
 [25] include(::Module, ::String) at ./sysimg.jl:29
 [26] exec_options(::Base.JLOptions) at ./client.jl:229
 [27] _start() at ./client.jl:421

nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.57                 Driver Version: 410.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:01:00.0  On |                  N/A |
| 27%   40C    P2    37W / 180W |   1970MiB /  8117MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

gpu_broken.jl:

using Flux, Flux.Data.MNIST, Statistics
using Flux: onehotbatch, onecold, logitbinarycrossentropy, throttle
using Base.Iterators: repeated
using NNlib: logσ
using CuArrays

# Classify MNIST digits with a simple multi-layer-perceptron

imgs = MNIST.images()
# Stack images into one large batch
X = hcat(float.(reshape.(imgs, :))...) |> gpu

labels = MNIST.labels()
# One-hot-encode the labels
Y = onehotbatch(labels, 0:9) |> gpu

m = Chain(
  Dense(28^2, 32, relu),
  Dense(32, 10)) |> gpu

accuracy(x, y) = mean(onecold(m(x)) .== onecold(y))

loss(x, y) = mean(logitbinarycrossentropy.(m(x), y))

dataset = repeated((X, Y), 200)
evalcb = () -> @show(loss(X, Y))
opt = ADAM(params(m))

Flux.train!(loss, dataset, opt, cb = throttle(evalcb, 10))

println(accuracy(X, Y))

# Test set accuracy
tX = hcat(float.(reshape.(MNIST.images(:test), :))...) |> gpu
tY = onehotbatch(MNIST.labels(:test), 0:9) |> gpu

println(accuracy(tX, tY))

The text was updated successfully, but these errors were encountered:

frodofine · 2018-10-29T17:26:42Z

I believe #145 would address this issue for binarycrossentropy, but my solution addresses logitbinarycrossentropy.

@janEbert

940: Fix logitbinarycrossentropy on CuArrays r=MikeInnes a=matsueushi The issue of logitbinarycrossentropy on GPU #464 can be also fixed by @janEbert's approach #926. Co-authored-by: matsueushi <matsueushi@gmail.com>

CarloLucibello · 2020-03-06T11:06:18Z

fixed by #940

colinxs mentioned this issue Dec 10, 2018

elu not working with GPU #477

Closed

wehlutyk mentioned this issue Dec 18, 2018

σ.() on GPU not using CUDAnative #519

Closed

matsueushi mentioned this issue Nov 22, 2019

Fix logitbinarycrossentropy on CuArrays #940

Merged

CarloLucibello closed this as completed Mar 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary cross entropy does not work on GPUs #464

Binary cross entropy does not work on GPUs #464

frodofine commented Oct 26, 2018 •

edited

Loading

frodofine commented Oct 29, 2018

CarloLucibello commented Mar 6, 2020

Binary cross entropy does not work on GPUs #464

Binary cross entropy does not work on GPUs #464

Comments

frodofine commented Oct 26, 2018 • edited Loading

frodofine commented Oct 29, 2018

CarloLucibello commented Mar 6, 2020

frodofine commented Oct 26, 2018 •

edited

Loading