Skip to content

faster hash for specific types #43355

@goretkin

Description

@goretkin
julia> isequal(BitSet(1:100000), Set(1:100000))
true

so

julia> hash(BitSet(1:100000))
0x0bbb2afc9d6e69a0

julia> hash(Set(1:100000))
0x0bbb2afc9d6e69a0

hash needs to be defined so that isequal(x,y) implies hash(x)==hash(y), and it is good that two sets with the same members are isequal, even if their internal representation differs.

But the performance...

julia> @time hash(BitSet(1:100000))
  0.000281 seconds (4 allocations: 12.469 KiB)
0x0bbb2afc9d6e69a0

julia> @time hash(BitSet(1:100000).bits)
  0.000009 seconds (4 allocations: 12.469 KiB)
0x8f1d49c388957d15

It would be nice if we could have the best of both worlds. What about a hash function like hash2(domain, x, h) with a fallback
hash2(::Type{<:Any}, x, h) = hash(x, h), but then e.g. hash2(::Type{BitSet}, x::BitSet, h) = hash(x.bits, h)

Then code like

julia/base/dict.jl

Lines 280 to 300 in 30fe8cc

function ht_keyindex(h::Dict{K,V}, key) where V where K
sz = length(h.keys)
iter = 0
maxprobe = h.maxprobe
index = hashindex(key, sz)
keys = h.keys
@inbounds while true
if isslotempty(h,index)
break
end
if !isslotmissing(h,index) && (key === keys[index] || isequal(key,keys[index]))
return index
end
index = (index & (sz-1)) + 1
iter += 1
iter > maxprobe && break
end
return -1
end

would make sure that the type parameter K gets to the 3-argument hash function.

The original property that isequal(x,y) implies hash(x)==hash(y) would still hold.
A new property emerges:

isequal(x,y) implies hash2(T, x) == hash2(T, y) where T = promote_foo(typeof(x), typeof(y)) and promote_foo could be something simple like promote_foo(S, T) = ifelse(S == T, S, Any)

Metadata

Metadata

Assignees

No one assigned

    Labels

    designDesign of APIs or of the language itselfhashingperformanceMust go faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions