Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary cross entropy does not work on GPUs #464

Closed
frodofine opened this issue Oct 26, 2018 · 2 comments
Closed

Binary cross entropy does not work on GPUs #464

frodofine opened this issue Oct 26, 2018 · 2 comments

Comments

@frodofine
Copy link

frodofine commented Oct 26, 2018

I'm trying to train a multi-label model on GPUs, but I consistently get the following error when using the built-in logitbinarycrossentropy function (see gpu_broken.jl). The code works fine on a CPU (comment out line 6). If I introduce my own function (below), the code works fine:

mylogitbinarycrossentropy(logŷ, y) = (1 .- y).*logŷ .- logσ.(logŷ)
loss(x, y) = mean(mylogitbinarycrossentropy(m(x), y))

The only real difference is the location of the broadcast. If there is interest, I can submit a PR to address this problem or if I am doing something wrong, please let me know.

Note: I'm aware that my example is not multi-label, but I get the same error for my multi-label problem.

Error message:

┌ Warning: calls to Base intrinsics might be GPU incompatible
│   exception =
│    You called exp(x::T) where T<:Union{Float32, Float64} in Base.Math at special/exp.jl:75, maybe you intended to call exp(x::Float32) in CUDAnative at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/device/libdevice.jl:90 instead?
│    Stacktrace:
│     [1] exp at special/exp.jl:75
│     [2] #25 at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
└ @ CUDAnative ~/.julia/packages/CUDAnative/AGfq2/src/compiler/irgen.jl:111
┌ Warning: calls to Base intrinsics might be GPU incompatible
│   exception =
│    You called exp(x::T) where T<:Union{Float32, Float64} in Base.Math at special/exp.jl:75, maybe you intended to call exp(x::Float32) in CUDAnative at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/device/libdevice.jl:90 instead?
│    Stacktrace:
│     [1] exp at special/exp.jl:75
│     [2] #25 at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
└ @ CUDAnative ~/.julia/packages/CUDAnative/AGfq2/src/compiler/irgen.jl:111
┌ Warning: calls to Base intrinsics might be GPU incompatible
│   exception =
│    You called log(x::Float32) in Base.Math at special/log.jl:290, maybe you intended to call log(x::Float32) in CUDAnative at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/device/libdevice.jl:66 instead?
│    Stacktrace:
│     [1] log at special/log.jl:290
│     [2] #25 at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
└ @ CUDAnative ~/.julia/packages/CUDAnative/AGfq2/src/compiler/irgen.jl:111
ERROR: LoadError: GPU compilation of #25(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(logitbinarycrossentropy),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Flux.OneHotMatrix{CUDAnative.CuDeviceArray{Flux.OneHotVector,1,CUDAnative.AS.Global}},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}) failed
KernelError: recursion is currently not supported

Try inspecting the generated code with any of the @device_code_... macros.

Stacktrace:
 [1] #IOBuffer#300 at iobuffer.jl:112
 [2] Type at none:0
 [3] print_to_string at strings/io.jl:112
 [4] #IOBuffer#299 at iobuffer.jl:91
 [5] #IOBuffer#300 at iobuffer.jl:112
 [6] print_to_string at strings/io.jl:112
 [7] throw_complex_domainerror at math.jl:31
 [8] log at special/log.jl:290
 [9] #25 at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:58
Stacktrace:
 [1] (::getfield(CUDAnative, Symbol("#hook_emit_function#58")){CUDAnative.CompilerContext,Array{Core.MethodInstance,1}})(::Core.MethodInstance, ::Core.CodeInfo, ::UInt64) at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/compiler/irgen.jl:97
 [2] irgen(::CUDAnative.CompilerContext) at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/compiler/irgen.jl:133
 [3] #compile_function#78(::Bool, ::Function, ::CUDAnative.CompilerContext) at ./logging.jl:308
 [4] compile_function at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/compiler/driver.jl:56 [inlined]
 [5] #cufunction#77(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::CUDAdrv.CuDevice, ::Any, ::Any) at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/compiler/driver.jl:22
 [6] cufunction at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/compiler/driver.jl:10 [inlined]
 [7] macro expansion at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/execution.jl:180 [inlined]
 [8] _cuda(::getfield(GPUArrays, Symbol("##25#26")), ::Tuple{}, ::NamedTuple{(:blocks, :threads),Tuple{Tuple{Int64},Tuple{Int64}}}, ::CuArrays.CuKernelState, ::CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(logitbinarycrossentropy),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Flux.OneHotMatrix{CUDAnative.CuDeviceArray{Flux.OneHotVector,1,CUDAnative.AS.Global}},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}) at /home/jonathan_fine/.julia/packages/CUDAnative/AGfq2/src/execution.jl:139
 [9] _gpu_call(::CuArrays.CuArrayBackend, ::Function, ::CuArray{Float32,2}, ::Tuple{CuArray{Float32,2},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(logitbinarycrossentropy),Tuple{Base.Broadcast.Extruded{CuArray{Float32,2},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at ./gcutils.jl:87
 [10] gpu_call(::Function, ::CuArray{Float32,2}, ::Tuple{CuArray{Float32,2},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(logitbinarycrossentropy),Tuple{Base.Broadcast.Extruded{CuArray{Float32,2},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}, ::Int64) at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/abstract_gpu_interface.jl:151
 [11] gpu_call at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/abstract_gpu_interface.jl:128 [inlined]
 [12] copyto! at /home/jonathan_fine/.julia/packages/GPUArrays/hzyWn/src/broadcast.jl:57 [inlined]
 [13] copyto! at ./broadcast.jl:792 [inlined]
 [14] copy at ./broadcast.jl:768 [inlined]
 [15] materialize at ./broadcast.jl:748 [inlined]
 [16] broadcast(::typeof(logitbinarycrossentropy), ::CuArray{Float32,2}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at ./broadcast.jl:702
 [17] ∇broadcast at /home/jonathan_fine/.julia/packages/Flux/xMoJh/src/tracker/array.jl:390 [inlined]
 [18] materialize(::Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(logitbinarycrossentropy),Tuple{TrackedArray{…,CuArray{Float32,2}},Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}}}) at /home/jonathan_fine/.julia/packages/Flux/xMoJh/src/tracker/array.jl:421
 [19] loss(::CuArray{Float32,2}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at /home/jonathan_fine/projects/0035_spectra_analysis/my_turn/gpu_broken.jl:24
 [20] #train!#121(::getfield(Flux, Symbol("#throttled#18")){getfield(Flux, Symbol("##throttled#10#14")){Bool,Bool,getfield(Main, Symbol("##3#4")),Int64}}, ::Function, ::Function, ::Base.Iterators.Take{Base.Iterators.Repeated{Tuple{CuArray{Float32,2},Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}}}}, ::getfield(Flux.Optimise, Symbol("##43#47"))) at /home/jonathan_fine/.julia/packages/Juno/46C8i/src/progress.jl:109
 [21] (::getfield(Flux.Optimise, Symbol("#kw##train!")))(::NamedTuple{(:cb,),Tuple{getfield(Flux, Symbol("#throttled#18")){getfield(Flux, Symbol("##throttled#10#14")){Bool,Bool,getfield(Main, Symbol("##3#4")),Int64}}}}, ::typeof(Flux.Optimise.train!), ::Function, ::Base.Iterators.Take{Base.Iterators.Repeated{Tuple{CuArray{Float32,2},Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}}}}, ::Function) at ./none:0
 [22] top-level scope at none:0
 [23] include at ./boot.jl:317 [inlined]
 [24] include_relative(::Module, ::String) at ./loading.jl:1041
 [25] include(::Module, ::String) at ./sysimg.jl:29
 [26] exec_options(::Base.JLOptions) at ./client.jl:229
 [27] _start() at ./client.jl:421

nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.57                 Driver Version: 410.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:01:00.0  On |                  N/A |
| 27%   40C    P2    37W / 180W |   1970MiB /  8117MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

gpu_broken.jl:

using Flux, Flux.Data.MNIST, Statistics
using Flux: onehotbatch, onecold, logitbinarycrossentropy, throttle
using Base.Iterators: repeated
using NNlib: logσ
using CuArrays

# Classify MNIST digits with a simple multi-layer-perceptron

imgs = MNIST.images()
# Stack images into one large batch
X = hcat(float.(reshape.(imgs, :))...) |> gpu

labels = MNIST.labels()
# One-hot-encode the labels
Y = onehotbatch(labels, 0:9) |> gpu

m = Chain(
  Dense(28^2, 32, relu),
  Dense(32, 10)) |> gpu

accuracy(x, y) = mean(onecold(m(x)) .== onecold(y))

loss(x, y) = mean(logitbinarycrossentropy.(m(x), y))

dataset = repeated((X, Y), 200)
evalcb = () -> @show(loss(X, Y))
opt = ADAM(params(m))

Flux.train!(loss, dataset, opt, cb = throttle(evalcb, 10))

println(accuracy(X, Y))

# Test set accuracy
tX = hcat(float.(reshape.(MNIST.images(:test), :))...) |> gpu
tY = onehotbatch(MNIST.labels(:test), 0:9) |> gpu

println(accuracy(tX, tY))
@frodofine
Copy link
Author

I believe #145 would address this issue for binarycrossentropy, but my solution addresses logitbinarycrossentropy.

bors bot added a commit that referenced this issue Nov 26, 2019
940: Fix logitbinarycrossentropy on CuArrays r=MikeInnes a=matsueushi

The issue of logitbinarycrossentropy on GPU #464 can be also fixed by @janEbert's approach #926.

Co-authored-by: matsueushi <matsueushi@gmail.com>
@CarloLucibello
Copy link
Member

fixed by #940

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants