-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new onehot implementation #1447
Conversation
Base.convert(::Type{UInt32}, o::OneHot) = UInt32(o) | ||
Base.convert(::Type{To}, o::OneHot) where {To} = convert(To, convert(UInt32, o)) | ||
|
||
Base.convert(ot::Type{OneHot{K}}, x::Core.BuiltinInts) where {K} = toOneHot(ot, x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better not rely on julia internals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the Core.BuiltinInts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Core.BuiltinInts
is just a alias for Union{Bool, Int32, Int64, UInt32, UInt64, UInt8, Int128, Int16, Int8, UInt128, UInt16}
. I can replace it with the Union
if you think that would be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for jumping in, but it seems better to take Carlo's suggestion below and directly define convert
for each member of the union instead of toOneHot
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core.BuiltinInts
is undocumented, so it could be changed or moved in next julia releases (although it seems highly unlikely). Better define the union ourselves
toOneHot(ot::Type{OneHot{K}}, x::UInt32) where {K} = check_onehot_encode(ot, x) | ||
toOneHot(ot::Type{OneHot{K}}, x::UInt64) where {K} = checked_onehot_trunc_uint(ot, x) | ||
toOneHot(ot::Type{OneHot{K}}, x::UInt128) where {K} = checked_onehot_trunc_uint(ot, x) | ||
toOneHot(ot::Type{OneHot{K}}, x::Bool) where {K} = and_int(zext_int(ot, x), toOneHot(ot, 0x1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why dispatch convert
to toOneHot
instead of replacing these with convert
methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toOneHot
convert the primitive types into OneHot
with specific intrinsic bit operations. convert
will then using toOneHot
to convert them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I was wondering why you define toOneHot
at all, instead of
Base.convert(ot::Type{OneHot{K}}, x::UInt32) where {K} = check_onehot_encode(ot, x)
Base.convert(ot::Type{OneHot{K}}, x::UInt64) where {K} = checked_onehot_trunc_uint(ot, x)
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't really recall why they look the same as convert
but in separate method. Originally I try to follow the code in Base
. I'll check if they can be changed without any issues. if so, then the Core.BuiltinInts
issue could also be fixed.
This looks great. I see that the new |
The only breaking change I could spot is that the const OneHotVector = OneHot
const OneHotMatrix = OneHotArray{K, 1, 2} where K The parametrization of OneHotArray is a bit of a problem. In any case, given the impact of the change and since the PR is almost ready, I'll mark this as needed for v0.12 |
fix doc printing typo Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>
We do have |
@CarloLucibello Could you elaborate more on this? |
never mind, you solved the issue |
# remove workaround when https://github.com/JuliaGPU/CuArrays.jl/issues/676 is fixed | ||
A::AbstractMatrix * B::OneHotMatrix = A[:, cpu(map(x->x.ix, B.data))] | ||
const OneHotVector{K} = OneHot{K} | ||
const OneHotMatrix{K} = OneHotArray{K, 1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const OneHotMatrix{K} = OneHotArray{K, 1} | |
const OneHotMatrix{K, A} = OneHotArray{K, 1, 2, A} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why A
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why should we exclude it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that's not usually something people have to handle. The value of A
is handled by the OneHotArray
's outer constructor, excluding A
will allow the constructor to be correctly dispatched (avoid using the default constructor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be wrong, but I think that adding A
is not going to change the constructor dispatch at all, and has the advantage of more convenient function dispatch:
f(x::OneHotMatrix{K, <:CuArray}) where K = ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type var K
is handle with UInt32
but currently Julia cannot specific the type of type var. Therefore, I use constructor of partial specialized type (like OneHotMatrix{K, 1}
) to check/convert the value. Calling OneHotArray{K, 1, 2, A}
will result in calling the default constructor, which will bypass the conversion and result in type error. If we are going the use this kind of expression then I will try to replace the default constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why not generalise the current implementation. Seems the performance was gained by avoiding the scalar indexing on the internal vector in OneHotMatrix, we could just rewrite that?
|
||
Base.getindex(xs::OneHotVector, ::Colon) = OneHotVector(xs.ix, xs.of) | ||
# onehot encode | ||
primitive type OneHot{K} <: AbstractOneHotArray{1} 32 end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to define a primitive here instead of making OneHotArray general across dimensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question. The primitive type avoids the pointer allocation of a wrapper type is my guess.
But I think the need for a primitive type/array of one-hots comes from the design choice of having a one-hot vector as a unique type that arbitrary dimension arrays are built off. Instead of designing a type for the 1D case first, having a N-D one-hot array type as the base seems like an alternate design choice. I am currently prototyping this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid pointer allocation is part of the reason. See my comment on this PR for the throughout explanation
ret = cat(xidss...; dims=sdims) | ||
OneHotArray(ret) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, but probably we can remove all the Val
s and just use Ints,
I think now the compiler is smart enough to perform the same optimizations
Also, since this doesn't support multiple dims, we should restrict the signature to
Base.cat(xss::OneHotArray{K}...; dims::Int)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Val
is also used in Julia cat
relative function (see base/abstractarray.jl). Those definition are used in order to be compatible with Julia abstract array interface.
import Base: * | ||
using Base: @_noinline_meta, @_inline_meta | ||
using Core: is_top_bit_set | ||
using Core.Intrinsics: bitcast, trunc_int, sext_int, zext_int, sle_int, eq_int, and_int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should avoid using internal functions
isequal(prod(dims), length(parent)) || throw(DimensionMismatch("new dimensions $(dims) must be consistent with array size $(length(parent))")) | ||
return isequal(K, first(dims)) ? | ||
ohreshape(parent, Base.tail(dims)) : | ||
Base._reshape(parent, dims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here we shouldn't use julia's internals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Base._reshape
is use to avoid redefine all the reshape dim patterns and handle the reshape of AbstractArray{Bool}
. It's quite common for defining custom array (like FillArrays)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's enough to use
Base._reshape(parent, dims) | |
reshape(parent, dims) |
then Base will handle that
https://github.com/JuliaLang/julia/blob/0f0bc294f7b18d685c4998ada9c2c33841a74f2d/base/reshapedarray.jl#L112
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are overloading it, that would result in infinite recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That line is exactly the reason I use Base._reshape
Here is a slightly different approach (which I think @DhairyaLGandhi was alluding to). Fundamentally, both this PR and the old Flux code created one-hot arrays by first designing a one-hot vector then wrapping that type in a multi-dimensional array (or matrix only for the old code). Instead, I design first for the notion of a N-dimensional one-hot array, then I make const OneHotIndex{T, N} = Union{T, AbstractArray{T, N}}
struct OneHotArray{T<:Integer, L, N, var"N+1", I<:OneHotIndex{T, N}} <: AbstractArray{Bool, var"N+1"}
indices::I
end
const OneHotVector{T, L} = OneHotArray{T, L, 0, 1, T}
const OneHotMatrix{T, L, I} = OneHotArray{T, L, 1, 2, I}
Base.size(x::OneHotArray{<:Any, L}) where L = (L, size(x.indices)...)
_onehotindex(x, i) = (x == i)
Base.getindex(x::OneHotArray, i, I...) = _onehotindex.(x.indices[I...], i)
Base.getindex(x::OneHotVector, i) = _onehotindex(x.indices, i) Like this PR, the approach above should enjoy all the same advantages:
But using this approach means we don't need to define a new primitive type or use any A demo: julia> is = [1, 3, 2, 1]
4-element Array{Int64,1}:
1
3
2
1
julia> x = OneHotMatrix{eltype(is), 3, typeof(is)}(is)
3×4 OneHotArray{Int64,3,1,2,Array{Int64,1}}:
1 0 0 1
0 0 1 0
0 1 0 0
julia> x[2, 3]
true
julia> is = [1 2; 3 1; 2 1]
3×2 Array{Int64,2}:
1 2
3 1
2 1
julia> x = OneHotArray{eltype(is), 3, ndims(is), ndims(is) + 1, typeof(is)}(is)
3×3×2 OneHotArray{Int64,3,2,3,Array{Int64,2}}:
[:, :, 1] =
1 0 0
0 0 1
0 1 0
[:, :, 2] =
0 1 1
1 0 0
0 0 0
julia> x[1, 3, 2]
true |
There are a few thoughts about some decisions, so I'll make some explanation here. The core idea was to develop a general representation that's both semantically meaningful and GPU friendly. So why using primitive type instead of extending the origin However, primitive type does introduce lots of Julia's internal functions into the scope. Currently we need to use |
Thanks for the explanation!
I don't particularly understand this point, so do you mind elaborating? Why would CUDA handle an array of
I'm also a little confused here. The difference between Also what do you mean by 0-dimensional array? For one-hot arrays, there is always at least one dimension (the first). There's nothing like a scalar one-hot vector. Unless you are referring to the dimension of the internal integer array? Do you mind suggesting some more performance tests that illustrate the issues you had to solve with CUDA? I'd like to add them to #1448 for comparison. It's entirely possible that the GPUCompiler does something better with the primitive type, but fundamentally there doesn't seem like there is a reason to do so. |
I'm talking about the old
In Julia you can define a 0-dimension array with something like
I was referring to the |
Ah I understand your point. Yeah the corner case of a single one-hot vector will incur a wrapper overhead, but I'm curious where that is a problem in practice. If you have a single one-hot vector, then there is a total of two integers of memory being used, so it doesn't seem like a problem. If you have many one-hot vectors stored in an array, then it makes more sense to batch them. Is there a use-case where you don't want to batch, but you do need many one-hot vectors?
The |
0-dim array allocated on heap, and struct would depend on the compiler. Primitive type is special because it can always be simply passed by value after compilation, so no potential edge case. I agree that it makes more sense to batch them, but sometimes you can only iterate thought that sequence of one-hots (for some really dynamic and un-batch-able models). They are rare but exist, so I tend to rip out the potential performance issue if possible. |
1448: Arbitrary dimension one-hot arrays r=DhairyaLGandhi a=darsnack This supersedes #1447. It should address the same issues: - fix #1445, #1229 - probably fix also #864, #556, #189 This PR introduces a new one-hot N-dimensional array type, `OneHotArray`. Like #1447, this approach avoids the pointer allocations associated with `OneHotMatrix` being an array of `OneHotVector`s. It also lifts the "height" into the type parameter to avoid unnecessary allocation. Unlike #1447, this approach does not introduce a new primitive type. Instead, a "one-hot vector" is represented with a single subtype of `Integer` that is configurable by the user. By default, the exposed API will use `UInt32`. Fundamentally, the primitive type is necessary because wrapping a `UInt32` as a `OneHotVector` will suffer memory penalties when you create an `Array{<:OneHotVector}`. But if we begin by designing for N-dimensions, then `OneHotVector` is just the specialized 1D case (similar to how `Vector{T} = Array{T, 1}`). ## Performance I compared against the same tests mentioned in #1447. Please suggest more if you want to. 1. #189 ```jl #master julia> x = Flux.onehotbatch(rand(1:100, 50), 1:100); julia> W = rand(128, 100); julia> @Btime $W * $x; 5.095 μs (13 allocations: 50.86 KiB) julia> cW, cx = cu(W), cu(x); julia> @Btime $cW * $cx; 24.948 μs (86 allocations: 3.11 KiB) #1447 julia> x = Flux.onehotbatch(rand(1:100, 50), 1:100); julia> W = rand(128, 100); julia> @Btime $W * $x; 5.312 μs (3 allocations: 50.36 KiB) julia> cW, cx = cu(W), cu(x); julia> @Btime $cW * $cx; 8.466 μs (61 allocations: 1.69 KiB) # this PR julia> x = Flux.onehotbatch(rand(1:100, 50), 1:100); julia> W = rand(128, 100); julia> @Btime $W * $x; 4.708 μs (3 allocations: 50.56 KiB) julia> cW, cx = cu(W), cu(x); julia> @Btime $cW * $cx; 8.576 μs (63 allocations: 1.73 KiB) ``` 2. #556 ```jl #master julia> valY = randn(1000, 128); julia> @Btime Flux.onecold($valY); 365.712 μs (1131 allocations: 38.16 KiB) julia> @Btime Flux.onecold($(gpu(valY))); ┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)` └ @ GPUArrays ~/.julia/packages/GPUArrays/jhRU7/src/host/indexing.jl:43 1.330 s (781248 allocations: 31.59 MiB) #1447 julia> valY = randn(1000, 128); julia> @Btime Flux.onecold($valY); 524.767 μs (8 allocations: 4.00 KiB) julia> @Btime Flux.onecold($(gpu(valY))); 27.563 μs (169 allocations: 5.56 KiB) # this PR julia> valY = randn(1000, 128); julia> @Btime Flux.onecold($valY); 493.017 μs (8 allocations: 4.53 KiB) julia> @Btime Flux.onecold($(gpu(valY))); 26.702 μs (171 allocations: 5.61 KiB) ``` ## Summary This should basically be #1447 but simpler to maintain w/ fewer changes. Tests are passing, though I think we should add more tests for one-hot data (currently our test set seems pretty sparse). Performance matches #1447 where I have tested, but please suggest more performance tests. In theory, any performance difference between #1447 and this PR should be recoverable. ### PR Checklist - [ ] Tests are added - [ ] Entry in NEWS.md - [ ] Documentation, if applicable - [ ] Final review from @DhairyaLGandhi (for API changes). cc @CarloLucibello @chengchingwen Co-authored-by: Kyle Daruwalla <daruwalla@wisc.edu> Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>
fix #1445, #1229
probably also #864, #556, #189
The new implementation support multi-dimensional onehot representation (#1229, #1445) with the new type
OneHotArray{K}
andOneHot{K}
. Since theheight
of previousOneHotMatrix
is now store on the type parameter, we can reduce the memory consumption by almost half of the origin (#1445). Most the operation are now GPU compatible (#864) without scalar operations, potentially solve the performance issues (#189, #556).The API should be the same as I implemented them with the new type.
Remaining problem:
The backward of "embedding lookup" for GPU still require scalar operation. This can be fix with the scatter operator.
some performance comparison:
PR Checklist
@dhairyagandhi96
(for API changes).