findfirst not dispatching correctly for HashIndex & UniqueHashIndex? #15

kcajf · 2020-06-10T14:20:02Z

Performance was much worse than expected for the UniqueHashIndex that I was using, so I started digging into it. I could be doing something wrong, but from the below it doesn't look like findfirst is dispatching to the AcceleratedArrays methods, and instead falling back to the base one?

julia> using AcceleratedArrays

julia> a1 = accelerate(["string$i" for i in 1:1000], UniqueHashIndex);

julia> a2 = accelerate(["string$i" for i in 1:1000], HashIndex);

julia> @which findfirst(==("string940"), a2)
findfirst(testf::Function, A::Union{AbstractString, AbstractArray}) in Base at array.jl:1832

julia> @which findfirst(==("string940"), a1)
findfirst(testf::Function, A::Union{AbstractString, AbstractArray}) in Base at array.jl:1832

julia> @btime findfirst(==("string940"), a1)
  4.276 μs (3 allocations: 64 bytes)
940

julia> @btime findfirst(==("string940"), a2)
  4.195 μs (3 allocations: 64 bytes)
940

julia> @btime a2.index.dict["string940"]
  128.458 ns (3 allocations: 64 bytes)
1-element Array{Int64,1}:
 940

julia> @btime a1.index.dict["string940"]
  131.360 ns (4 allocations: 80 bytes)
1-element SingleVector{Int64}:
 940

julia> versioninfo()
Julia Version 1.5.0-beta1.0
Commit 6443f6c95a* (2020-05-28 17:42 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 10

The timings are consistent with a naive for-loop:

julia> function findfurst(k, xs)
           @inbounds for (i, x) in enumerate(xs)
               if x == k return i end
           end
           0
       end

julia> @btime findfurst("string940", a1)
  3.908 μs (1 allocation: 16 bytes)
940

For HashIndex I can find a method here that I think should match:
https://github.com/andyferris/AcceleratedArrays.jl/blob/master/src/HashIndex.jl#L46
although can't find the equivalent for UniqueHashIndex.

The text was updated successfully, but these errors were encountered:

kcajf · 2020-06-10T14:30:37Z

Ok, worked it out. I was using the == function instead of isequal. New timings:

julia> @btime findfirst(isequal("string940"), a2)
  68.477 ns (2 allocations: 32 bytes)
940

julia> @btime findfirst(isequal("string940"), a1)
  4.027 μs (3 allocations: 64 bytes)
940

Better for HashIndex, but UniqueHashIndex is bad because it is missing findfirst. Is that just an oversight or is there a sensible reason for it not making sense?

That aside, I'd also like to dig into why both of them are allocating - I can't think why they should. Looks like the dictionary lookup alone in HashIndex is allocation-free:

julia> @btime (x -> x.index.dict["string940"])(a2)
  55.604 ns (0 allocations: 0 bytes)
1-element Array{Int64,1}:
 940

But not so for UniqueHashIndex. Must be something to do with using SingleVector?

julia> @btime (x -> x.index.dict["string940"])(a1)
  60.164 ns (1 allocation: 16 bytes)
1-element SingleVector{Int64}:
 940

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

findfirst not dispatching correctly for HashIndex & UniqueHashIndex? #15

findfirst not dispatching correctly for HashIndex & UniqueHashIndex? #15

kcajf commented Jun 10, 2020

kcajf commented Jun 10, 2020

findfirst not dispatching correctly for HashIndex & UniqueHashIndex? #15

findfirst not dispatching correctly for HashIndex & UniqueHashIndex? #15

Comments

kcajf commented Jun 10, 2020

kcajf commented Jun 10, 2020