Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flux.softmax returns wrong result with CuArray #1425

Open
norci opened this issue Dec 14, 2020 · 11 comments
Open

Flux.softmax returns wrong result with CuArray #1425

norci opened this issue Dec 14, 2020 · 11 comments

Comments

@norci
Copy link
Contributor

norci commented Dec 14, 2020

version:

  [052768ef] CUDA v2.3.0
  [587475ba] Flux v0.11.2

on GPU, something is wrong with dims:

julia> Flux.softmax(randn(CURAND.default_rng(), 1,2,1,3), dims = 1)
1×2×1×3 CuArray{Float64,4}:
[:, :, 1, 1] =
 0.718913  0.281087
[:, :, 1, 2] =
 0.617561  0.382439
[:, :, 1, 3] =
 0.628182  0.371818
julia> Flux.softmax(randn(CURAND.default_rng(), 1,2,1,3), dims = 2)
1×2×1×3 CuArray{Float64,4}:
[:, :, 1, 1] =
 1.0  1.0
[:, :, 1, 2] =
 1.0  1.0
[:, :, 1, 3] =
 1.0  1.0

expected behavior:

julia> Flux.softmax(randn(1,2,1,3), dims = 1)
1×2×1×3 Array{Float64,4}:
[:, :, 1, 1] =
 1.0  1.0
[:, :, 1, 2] =
 1.0  1.0
[:, :, 1, 3] =
 1.0  1.0
julia> Flux.softmax(randn(1,2,1,3), dims = 2)
1×2×1×3 Array{Float64,4}:
[:, :, 1, 1] =
 0.876089  0.123911
[:, :, 1, 2] =
 0.368608  0.631392
[:, :, 1, 3] =
 0.433415  0.566585
@DhairyaLGandhi
Copy link
Member

Hmm, I would run it with a few older versions of Flux/ Zygote/ NNlib.

Things that come to mind are the moving of rules from zygote to chain rules core, which we should catch before it's released.

@DhairyaLGandhi
Copy link
Member

Could you test with CUDA@2.3? Or a newer Flux with CUDA@1.3

@norci
Copy link
Contributor Author

norci commented Dec 14, 2020

yes. I think it's a bug in CUDA, not Flux.


  [052768ef] CUDA v1.3.3
  [872c559c] NNlib v0.7.7
julia> using NNlib:softmax
julia> using CUDA
julia> softmax(randn(CURAND.default_rng(), 1,2,1,3), dims = 1)
1×2×1×3 CuArray{Float64,4}:
[:, :, 1, 1] =
 1.0  1.0
[:, :, 1, 2] =
 1.0  1.0
[:, :, 1, 3] =
 1.0  1.0

  [052768ef] CUDA v2.3.0
  [872c559c] NNlib v0.7.7
julia>  softmax(randn(CURAND.default_rng(), 1,2,1,3), dims = 1)
1×2×1×3 CuArray{Float64,4}:
[:, :, 1, 1] =
 0.170576  0.829424
[:, :, 1, 2] =
 0.58034  0.41966
[:, :, 1, 3] =
 0.613132  0.386868

Let me file a new bug in CUDA.
thanks for your help.

@norci norci closed this as completed Dec 14, 2020
@norci
Copy link
Contributor Author

norci commented Dec 14, 2020

@DhairyaLGandhi ,
shall we keep this bug open, and marked as "blocked by CUDA bug #N"?

@DhairyaLGandhi
Copy link
Member

Yes that would be best

@CarloLucibello
Copy link
Member

let's keep this open, we should also add a test here to make sure there won't be future regression

@CarloLucibello
Copy link
Member

could you also check logsoftmax correcteness?

@norci
Copy link
Contributor Author

norci commented Dec 14, 2020

logsoftmax is wrong too.
I created a test case:

import NNlib
using CUDA, Debugger

xs = CUDA.rand(1,2,1,3)

NNlib.logsoftmax(copy(xs),dims=1)
NNlib.logsoftmax(copy(xs),dims=2)

NNlib.softmax(copy(xs),dims=1)
NNlib.softmax(copy(xs),dims=2)

# copied from https://github.com/FluxML/NNlib.jl/blob/master/src/softmax.jl
function softmax(xs::AbstractArray; dims=1)
    max_ = maximum(xs, dims=dims)
    exp_ = exp.(xs .- max_)
    exp_ ./ sum(exp_, dims=dims)
end
softmax(copy(xs),dims=1)
softmax(copy(xs),dims=2)

Debugger.@enter NNlib.softmax(copy(xs),dims=1)

The softmax function from FluxML/NNlib.jl is correct.

But NNlib.softmax used cudnn

1|debug> st
In #softmax!#573(dims, , y, x) at C:\Users\X\.julia\packages\CUDA\YeS8q\lib\cudnn\nnlib.jl:60
 60  function softmax!(y::DenseCuArray{T}, x::DenseCuArray{T}; dims=1) where T<:CUDNNFloat
>61    cudnnSoftmaxForward(reshape4D(x), reshape4D(y),
 62                        algo=CUDNN_SOFTMAX_FAST, mode=cudnnSoftmaxMode_t(dims-1))
 63    return y
 64  end

@CarloLucibello
Copy link
Member

seems like the CUDA implementation is not picking the dims argument and just uses the default dims=1. The issue should be filed to CUDA indeed

julia> using CUDA,NNlib

julia> x = rand(Float32, 2,2)
2×2 Array{Float32,2}:
 0.431404  0.794794
 0.118349  0.288102

julia> Array(softmax(cu(x), dims=1))  softmax(x, dims=1)
true

julia> Array(softmax(cu(x), dims=2))  softmax(x, dims=2)
false

julia> Array(softmax(cu(x), dims=2))  softmax(x, dims=1)
true

julia> Array(logsoftmax(cu(x), dims=1))  logsoftmax(x, dims=1)
true

julia> Array(logsoftmax(cu(x), dims=2))  logsoftmax(x, dims=2)
false

julia> Array(logsoftmax(cu(x), dims=2))  logsoftmax(x, dims=1)
true

@ToucheSir
Copy link
Member

All of the examples above pass for me on the latest NNlib(CUDA):

(jl_B7k1p0) pkg> st
      Status `/tmp/jl_B7k1p0/Project.toml`
  [052768ef] CUDA v3.3.0
  [872c559c] NNlib v0.7.22
  [a00861dc] NNlibCUDA v0.1.3

Can we consider this fixed?

@mcabbott
Copy link
Member

Maybe there should be a few more tests in https://github.com/FluxML/NNlibCUDA.jl/blob/master/test/softmax.jl, or is this tested elsewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants