Fast path `onehotbatch(::Vector{Int}, ::UnitRange)` #27

mcabbott · 2022-12-26T20:13:40Z

This adds the obvious shortcut when the data is already indicies. It's a bit quicker, but also a partial solution to #16, as this will work with GPU arrays too.

julia> let x = rand(0:99, 100)
         @btime onehotbatch($x, 0:99)
       end;
  min 231.052 ns, mean 245.482 ns (1 allocation, 496 bytes)
  min 97.912 ns, mean 106.604 ns (1 allocation, 496 bytes)  # after

~~Needs tests, and probably an error check.~~ Done.

codecov-commenter · 2022-12-26T20:17:45Z

Codecov Report

Base: 95.96% // Head: 96.21% // Increases project coverage by +0.24% 🎉

Coverage data is based on head (7c1238f) compared to base (d27d037).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #27      +/-   ##
==========================================
+ Coverage   95.96%   96.21%   +0.24%     
==========================================
  Files           3        4       +1     
  Lines         124      132       +8     
==========================================
+ Hits          119      127       +8     
  Misses          5        5

Impacted Files	Coverage Δ
src/onehot.jl	`96.15% <100.00%> (+0.59%)`	⬆️
src/OneHotArrays.jl	`100.00% <0.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

mcabbott · 2022-12-28T12:20:55Z

src/onehot.jl

+function onehotbatch(data::AbstractArray{<:Integer}, labels::AbstractUnitRange{<:Integer})
+  # lo, hi = extrema(data)  # fails on Julia 1.6
+  lo, hi = minimum(data), maximum(data)
+  lo < first(labels) && error("Value $lo not found in labels")
+  hi > last(labels) && error("Value $hi not found in labels")
+  offset = 1 - first(labels)
+  indices = UInt32.(data .+ offset)
+  return OneHotArray(indices, length(labels))


Unfortunately the bounds checking here is quite expensive, especially on GPU arrays where I think each of minimum & maximum forces synchronisation:

julia> let ci = cu(rand(1:99, 100)) @btime CUDA.@sync onehotbatch($ci, 1:99) @btime CUDA.@sync OneHotMatrix($ci, 99) end; 100.993 μs (86 allocations: 4.02 KiB) 2.803 μs (0 allocations: 0 bytes) julia> let ci = cu(rand(1:99, 100)) @btime CUDA.@sync maximum($ci), minimum($ci) @btime CUDA.@sync extrema($ci) @btime CUDA.@sync map($ci) do i 0<i<100 || error("bad index") UInt32(i+0) end end; 71.448 μs (58 allocations: 2.91 KiB) 38.094 μs (29 allocations: 1.47 KiB) 18.543 μs (30 allocations: 1.14 KiB) julia> let ci = cu(rand(1:99, 100)) # without explicit CUDA.@sync @btime extrema($ci) @btime OneHotMatrix($ci, 99) # async, which is good @btime OneHotMatrix(map($ci) do i # unfortunately not? 0<i<100 || error("bad index") UInt32(i+0) end, 99) end; 35.544 μs (29 allocations: 1.47 KiB) 6.527 ns (0 allocations: 0 bytes) 10.619 μs (30 allocations: 1.14 KiB)

Moving the check inside the broadcast is faster, at the cost of more obscure errors. Maybe that's ok? Still not fully async.

julia> map(cu(rand(1:199, 100))) do i 0<i<100 || error("bad index") UInt32(i+0) end ERROR: a exception was thrown during kernel execution. Run Julia on debug level 2 for device stack traces. ERROR: a exception was thrown during kernel execution. Run Julia on debug level 2 for device stack traces.

it is rather obscure indeed. What if we wrap the map inside a try-catch and raise a proper error?

I wonder if we can tell the GPU not to wait? This doesn't work but perhaps something similar does:

julia> let i = rand(1:99, 100) @btime maximum($i)<100 || error("outside") @btime @async maximum($i)<100 || error("outside") end; 58.407 ns (0 allocations: 0 bytes) 759.747 ns (5 allocations: 496 bytes) julia> let ci = cu(rand(1:99, 100)) @btime maximum($ci)<100 || error("outside") @btime @async maximum($ci)<100 || error("outside") end; 35.134 μs (29 allocations: 1.45 KiB) # hangs?

I think the ideal solution would be something like JuliaGPU/CUDA.jl#1140. If we had a way to write kernels, another idea would be to create an ad-hoc in kernel which flips a one-element bool array to true if it finds a matching element.

That sounds like the right thing. Perhaps rather than owning a kernel, this package could call checkbounds(out, inds, 1) or whatever -- that's essentially the same operation.

I wondered what gather did, and it turns out there is no check:

julia> NNlib.gather([1,20,300,4000] |> cu, [2,4,2,99] |> cu) 4-element CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}: 20 4000 20 0 julia> NNlib.gather([1,20,300,4000], [2,4,2,99]) ERROR: BoundsError: attempt to access 4-element Vector{Int64} at index [99]

The PR to add one FluxML/NNlibCUDA.jl#51 has many benchmarks... perhaps also 10s of μs.

add a fast path

e007533

mcabbott added 2 commits December 26, 2022 15:25

add an error check

6c432cc

fixup, add tests

6809fd9

mcabbott mentioned this pull request Dec 27, 2022

Ambiguity in getindex, and missing == definition? #28

Open

fix 1.6

7c1238f

mcabbott requested a review from CarloLucibello December 27, 2022 19:10

ToucheSir approved these changes Dec 27, 2022

View reviewed changes

mcabbott merged commit 32e06c8 into FluxML:main Dec 27, 2022

mcabbott deleted the simple branch December 27, 2022 21:15

This was referenced Dec 28, 2022

accept integer labels in (logit)crossentropy FluxML/Flux.jl#2141

Open

onehotbatch(::CuArray, ...) moves data to host #16

Open

mcabbott commented Dec 28, 2022

View reviewed changes

mcabbott mentioned this pull request Dec 31, 2022

Faster path for onehotbatch(::CUArray{Int}, ::UnitRange) #29

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast path `onehotbatch(::Vector{Int}, ::UnitRange)` #27

Fast path `onehotbatch(::Vector{Int}, ::UnitRange)` #27

mcabbott commented Dec 26, 2022 •

edited

Loading

codecov-commenter commented Dec 26, 2022 •

edited

Loading

mcabbott Dec 28, 2022 •

edited

Loading

CarloLucibello Dec 28, 2022

mcabbott Dec 28, 2022

ToucheSir Dec 30, 2022

mcabbott Dec 31, 2022

Fast path onehotbatch(::Vector{Int}, ::UnitRange) #27

Fast path onehotbatch(::Vector{Int}, ::UnitRange) #27

Conversation

mcabbott commented Dec 26, 2022 • edited Loading

codecov-commenter commented Dec 26, 2022 • edited Loading

Codecov Report

mcabbott Dec 28, 2022 • edited Loading

Choose a reason for hiding this comment

CarloLucibello Dec 28, 2022

Choose a reason for hiding this comment

mcabbott Dec 28, 2022

Choose a reason for hiding this comment

ToucheSir Dec 30, 2022

Choose a reason for hiding this comment

mcabbott Dec 31, 2022

Choose a reason for hiding this comment

Fast path `onehotbatch(::Vector{Int}, ::UnitRange)` #27

Fast path `onehotbatch(::Vector{Int}, ::UnitRange)` #27

mcabbott commented Dec 26, 2022 •

edited

Loading

codecov-commenter commented Dec 26, 2022 •

edited

Loading

mcabbott Dec 28, 2022 •

edited

Loading