Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes OneHotMatrix/Vector GPU Performance #612

Merged
merged 9 commits into from
Apr 30, 2019
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion src/onehot.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ Base.size(xs::OneHotVector) = (Int64(xs.of),)

Base.getindex(xs::OneHotVector, i::Integer) = i == xs.ix

Base.getindex(xs::OneHotVector, ::Colon) = xs

A::AbstractMatrix * b::OneHotVector = A[:, b.ix]

struct OneHotMatrix{A<:AbstractVector{OneHotVector}} <: AbstractMatrix{Bool}
Expand All @@ -22,6 +24,19 @@ Base.getindex(xs::OneHotMatrix, i::Integer, j::Integer) = xs.data[j][i]
Base.getindex(xs::OneHotMatrix, ::Colon, i::Integer) = xs.data[i]
Base.getindex(xs::OneHotMatrix, ::Colon, i::AbstractArray) = OneHotMatrix(xs.height, xs.data[i])

Base.getindex(xs::OneHotMatrix, i::Integer, ::Colon) = map(x -> x[i], xs.data)

# handle special case when we want the whole column
function Base.getindex(xs::Flux.OneHotMatrix{T}, ot::Union{Base.Slice, Base.OneTo}, i::Int) where {T<:AbstractArray}
res = similar(xs, size(xs, 1), 1)
if length(ot) == size(xs, 1)
res = xs[:,i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this branch? Are there any cases where they aren't equivalent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W/O this branch - 136.165 ms (50001 allocations: 1.99 MiB)

julia> A = Flux.onehotbatch(1:300, 1:10000) |> gpu;

julia> d = 1
1

julia> a = Base.Slice(axes(A, d))
Base.Slice(Base.OneTo(10000))

julia> A[a, 5]
10000-element Array{Bool,1}:
 false
 false
 false
...

With - 15.930 μs (7 allocations: 10.16 KiB)

julia> A = Flux.onehotbatch(1:300, 1:10000) |> gpu;

julia> a = Base.Slice(axes(A, d))
Base.Slice(Base.OneTo(10000))

julia> A[a, 5]
10000-element Flux.OneHotVector:
 false
 false
 false
...

Performance and avoiding the allocation of the vector, basically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that also true on CPU? Is the slowdown due to scalar indexing? It seems like this might need to be something that's fixed at the CuArrays level rather than being special cased here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove it, if that behaviour is expected and should be maintained.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalar indexing happens when we try to get a column out of the .data field from OneHotMatrix currently, which does affect performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not so much about whether the behaviour is expected as where the bug should be filed. If it can be fixed in CuArrays instead then it should be. It's still not clear to me whether or not that's the case, but I'll take a closer look at the code.

else
res = xs[1:length(ot),i]
end
res
end

A::AbstractMatrix * B::OneHotMatrix = A[:, map(x->x.ix, B.data)]

Base.hcat(x::OneHotVector, xs::OneHotVector...) = OneHotMatrix(length(x), [x, xs...])
Expand Down Expand Up @@ -54,13 +69,17 @@ end
onehotbatch(ls, labels, unk...) =
OneHotMatrix(length(labels), [onehot(l, labels, unk...) for l in ls])

Base.argmax(xs::OneHotVector) = xs.ix

onecold(y::AbstractVector, labels = 1:length(y)) = labels[Base.argmax(y)]

onecold(y::AbstractMatrix, labels...) =
dropdims(mapslices(y -> onecold(y, labels...), y, dims=1), dims=1)

onecold(y::OneHotMatrix, labels...) = map(x -> onecold(x, labels...), y.data)

function argmax(xs...)
Base.depwarn("`argmax(...) is deprecated, use `onecold(...)` instead.", :argmax)
Base.depwarn("`argmax(...)` is deprecated, use `onecold(...)` instead.", :argmax)
return onecold(xs...)
end

Expand Down
6 changes: 6 additions & 0 deletions test/cuda/cuda.jl
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,12 @@ Flux.back!(sum(l))

end

@testset "onecold gpu" begin
y = Flux.onehotbatch(ones(3), 1:10) |> gpu;
@test Flux.onecold(y) isa CuArray
@test y[3,:] isa CuArray
end

if CuArrays.libcudnn != nothing
@info "Testing Flux/CUDNN"
include("cudnn.jl")
Expand Down
5 changes: 5 additions & 0 deletions test/onehot.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,8 @@ using Test
@test onecold(a, labels) == 'C'
@test onecold(A, labels) == ['C', 'A', 'D']
end

@testset "onehotbatch indexing" begin
y = Flux.onehotbatch(ones(3), 1:10)
@test y[:,1] isa Flux.OneHotVector
end