Flux.softmax returns wrong result with CuArray #1425

norci · 2020-12-14T06:32:47Z

version:

  [052768ef] CUDA v2.3.0
  [587475ba] Flux v0.11.2

on GPU, something is wrong with dims:

julia> Flux.softmax(randn(CURAND.default_rng(), 1,2,1,3), dims = 1)
1×2×1×3 CuArray{Float64,4}:
[:, :, 1, 1] =
 0.718913  0.281087
[:, :, 1, 2] =
 0.617561  0.382439
[:, :, 1, 3] =
 0.628182  0.371818
julia> Flux.softmax(randn(CURAND.default_rng(), 1,2,1,3), dims = 2)
1×2×1×3 CuArray{Float64,4}:
[:, :, 1, 1] =
 1.0  1.0
[:, :, 1, 2] =
 1.0  1.0
[:, :, 1, 3] =
 1.0  1.0

expected behavior:

julia> Flux.softmax(randn(1,2,1,3), dims = 1)
1×2×1×3 Array{Float64,4}:
[:, :, 1, 1] =
 1.0  1.0
[:, :, 1, 2] =
 1.0  1.0
[:, :, 1, 3] =
 1.0  1.0
julia> Flux.softmax(randn(1,2,1,3), dims = 2)
1×2×1×3 Array{Float64,4}:
[:, :, 1, 1] =
 0.876089  0.123911
[:, :, 1, 2] =
 0.368608  0.631392
[:, :, 1, 3] =
 0.433415  0.566585

The text was updated successfully, but these errors were encountered:

DhairyaLGandhi · 2020-12-14T06:44:25Z

Hmm, I would run it with a few older versions of Flux/ Zygote/ NNlib.

Things that come to mind are the moving of rules from zygote to chain rules core, which we should catch before it's released.

DhairyaLGandhi · 2020-12-14T06:55:56Z

Could you test with CUDA@2.3? Or a newer Flux with CUDA@1.3

norci · 2020-12-14T07:06:43Z

yes. I think it's a bug in CUDA, not Flux.

  [052768ef] CUDA v1.3.3
  [872c559c] NNlib v0.7.7

julia> using NNlib:softmax
julia> using CUDA
julia> softmax(randn(CURAND.default_rng(), 1,2,1,3), dims = 1)
1×2×1×3 CuArray{Float64,4}:
[:, :, 1, 1] =
 1.0  1.0
[:, :, 1, 2] =
 1.0  1.0
[:, :, 1, 3] =
 1.0  1.0

  [052768ef] CUDA v2.3.0
  [872c559c] NNlib v0.7.7

julia>  softmax(randn(CURAND.default_rng(), 1,2,1,3), dims = 1)
1×2×1×3 CuArray{Float64,4}:
[:, :, 1, 1] =
 0.170576  0.829424
[:, :, 1, 2] =
 0.58034  0.41966
[:, :, 1, 3] =
 0.613132  0.386868

Let me file a new bug in CUDA.
thanks for your help.

norci · 2020-12-14T07:10:50Z

@DhairyaLGandhi ,
shall we keep this bug open, and marked as "blocked by CUDA bug #N"?

DhairyaLGandhi · 2020-12-14T07:11:31Z

Yes that would be best

CarloLucibello · 2020-12-14T07:12:07Z

let's keep this open, we should also add a test here to make sure there won't be future regression

CarloLucibello · 2020-12-14T07:13:21Z

could you also check logsoftmax correcteness?

norci · 2020-12-14T08:02:35Z

logsoftmax is wrong too.
I created a test case:

import NNlib
using CUDA, Debugger

xs = CUDA.rand(1,2,1,3)

NNlib.logsoftmax(copy(xs),dims=1)
NNlib.logsoftmax(copy(xs),dims=2)

NNlib.softmax(copy(xs),dims=1)
NNlib.softmax(copy(xs),dims=2)

# copied from https://github.com/FluxML/NNlib.jl/blob/master/src/softmax.jl
function softmax(xs::AbstractArray; dims=1)
    max_ = maximum(xs, dims=dims)
    exp_ = exp.(xs .- max_)
    exp_ ./ sum(exp_, dims=dims)
end
softmax(copy(xs),dims=1)
softmax(copy(xs),dims=2)

Debugger.@enter NNlib.softmax(copy(xs),dims=1)

The softmax function from FluxML/NNlib.jl is correct.

But NNlib.softmax used cudnn

1|debug> st
In #softmax!#573(dims, , y, x) at C:\Users\X\.julia\packages\CUDA\YeS8q\lib\cudnn\nnlib.jl:60
 60  function softmax!(y::DenseCuArray{T}, x::DenseCuArray{T}; dims=1) where T<:CUDNNFloat
>61    cudnnSoftmaxForward(reshape4D(x), reshape4D(y),
 62                        algo=CUDNN_SOFTMAX_FAST, mode=cudnnSoftmaxMode_t(dims-1))
 63    return y
 64  end

CarloLucibello · 2020-12-14T08:22:22Z

seems like the CUDA implementation is not picking the dims argument and just uses the default dims=1. The issue should be filed to CUDA indeed

julia> using CUDA,NNlib

julia> x = rand(Float32, 2,2)
2×2 Array{Float32,2}:
 0.431404  0.794794
 0.118349  0.288102

julia> Array(softmax(cu(x), dims=1)) ≈ softmax(x, dims=1)
true

julia> Array(softmax(cu(x), dims=2)) ≈ softmax(x, dims=2)
false

julia> Array(softmax(cu(x), dims=2)) ≈ softmax(x, dims=1)
true

julia> Array(logsoftmax(cu(x), dims=1)) ≈ logsoftmax(x, dims=1)
true

julia> Array(logsoftmax(cu(x), dims=2)) ≈ logsoftmax(x, dims=2)
false

julia> Array(logsoftmax(cu(x), dims=2)) ≈ logsoftmax(x, dims=1)
true

ToucheSir · 2021-06-20T04:47:39Z

All of the examples above pass for me on the latest NNlib(CUDA):

(jl_B7k1p0) pkg> st
      Status `/tmp/jl_B7k1p0/Project.toml`
  [052768ef] CUDA v3.3.0
  [872c559c] NNlib v0.7.22
  [a00861dc] NNlibCUDA v0.1.3

Can we consider this fixed?

mcabbott · 2021-06-20T05:01:54Z

Maybe there should be a few more tests in https://github.com/FluxML/NNlibCUDA.jl/blob/master/test/softmax.jl, or is this tested elsewhere?

norci closed this as completed Dec 14, 2020

CarloLucibello reopened this Dec 14, 2020

norci mentioned this issue Dec 14, 2020

softmax has problem with dim parameter JuliaGPU/CUDA.jl#599

Closed

ToucheSir mentioned this issue Jun 26, 2021

Test softmax with varying dimensions FluxML/NNlibCUDA.jl#17

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flux.softmax returns wrong result with CuArray #1425

Flux.softmax returns wrong result with CuArray #1425

norci commented Dec 14, 2020

DhairyaLGandhi commented Dec 14, 2020

DhairyaLGandhi commented Dec 14, 2020

norci commented Dec 14, 2020

norci commented Dec 14, 2020

DhairyaLGandhi commented Dec 14, 2020

CarloLucibello commented Dec 14, 2020

CarloLucibello commented Dec 14, 2020

norci commented Dec 14, 2020

CarloLucibello commented Dec 14, 2020

ToucheSir commented Jun 20, 2021

mcabbott commented Jun 20, 2021

Flux.softmax returns wrong result with CuArray #1425

Flux.softmax returns wrong result with CuArray #1425

Comments

norci commented Dec 14, 2020

DhairyaLGandhi commented Dec 14, 2020

DhairyaLGandhi commented Dec 14, 2020

norci commented Dec 14, 2020

norci commented Dec 14, 2020

DhairyaLGandhi commented Dec 14, 2020

CarloLucibello commented Dec 14, 2020

CarloLucibello commented Dec 14, 2020

norci commented Dec 14, 2020

CarloLucibello commented Dec 14, 2020

ToucheSir commented Jun 20, 2021

mcabbott commented Jun 20, 2021