Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster BigInt hashing #33790

Merged
merged 2 commits into from
Feb 25, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 66 additions & 9 deletions base/hashing2.jl
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,19 @@ function hash_integer(n::Integer, h::UInt)
return h
end

function hash_integer(n::BigInt, h::UInt)
s = n.size
s == 0 && return hash_integer(0, h)
p = convert(Ptr{UInt}, n.d)
b = unsafe_load(p)
h ⊻= hash_uint(ifelse(s < 0, -b, b) ⊻ h)
for k = 2:abs(s)
h ⊻= hash_uint(unsafe_load(p, k) ⊻ h)
if GMP.Limb === UInt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it would be worth adding a comment noting what happens when this isn't true. Am I reading it correctly that when GMP.Limb !== UInt this falls back to the above hash_integer(n::Integer, h::UInt) method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this then falls back to the slower hash_integer(::Integer, ::UInt). Ok, will add a comment. I'm not even certain this happens in practive, but I couldn't prove that it can't, so it seemed safer to state clearly the assumptions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before pushing again and running CI for only comments change, let's check first if the comment is fine:

# this condition is true most (all?) of the time, and in this case we can define
# an optimized version of the above hash_integer(::Integer, ::UInt) method for BigInt
if GMP.Limb === UInt
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, looks good. You could also just push a separate [ci skip] commit and then we can squash on merge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok will do, for some reason I stayed with the idea that [ci skip] didn't work anymore, since like 2 years ago or so!

# used e.g. for Rational{BigInt}
function hash_integer(n::BigInt, h::UInt)
s = n.size
s == 0 && return hash_integer(0, h)
p = convert(Ptr{UInt}, n.d)
b = unsafe_load(p)
h ⊻= hash_uint(ifelse(s < 0, -b, b) ⊻ h)
for k = 2:abs(s)
h ⊻= hash_uint(unsafe_load(p, k) ⊻ h)
end
return h
end
return h
end

## generic hashing for rational values ##
Expand Down Expand Up @@ -72,6 +75,60 @@ function hash(x::Real, h::UInt)
return h
end

## streamlined hashing for BigInt, by avoiding allocation from shifts ##

if GMP.Limb === UInt
_divLimb(n) = UInt === UInt64 ? n >>> 6 : n >>> 5
_modLimb(n) = UInt === UInt64 ? n & 63 : n & 31

function hash(x::BigInt, h::UInt)
sz = x.size
sz == 0 && return hash(0, h)
ptr = Ptr{UInt}(x.d)
if sz == 1
return hash(unsafe_load(ptr), h)
elseif sz == -1
limb = unsafe_load(ptr)
limb <= typemin(Int) % UInt && return hash(-(limb % Int), h)
end
pow = trailing_zeros(x)
nd = ndigits0z(x, 2)
idx = _divLimb(pow) + 1
shift = _modLimb(pow) % UInt
upshift = GMP.BITS_PER_LIMB - shift
asz = abs(sz)
if shift == 0
limb = unsafe_load(ptr, idx)
else
limb1 = unsafe_load(ptr, idx)
limb2 = idx < asz ? unsafe_load(ptr, idx+1) : UInt(0)
limb = limb2 << upshift | limb1 >> shift
end
if nd <= 1024 && nd - pow <= 53
return hash(ldexp(flipsign(Float64(limb), sz), pow), h)
end
h = hash_integer(1, h)
h = hash_integer(pow, h)
h ⊻= hash_uint(flipsign(limb, sz) ⊻ h)
for idx = idx+1:asz
if shift == 0
limb = unsafe_load(ptr, idx)
else
limb1 = limb2
if idx == asz
limb = limb1 >> shift
limb == 0 && break # don't hash leading zeros
else
limb2 = unsafe_load(ptr, idx+1)
limb = limb2 << upshift | limb1 >> shift
end
end
h ⊻= hash_uint(limb ⊻ h)
end
return h
end
end

#=
`decompose(x)`: non-canonical decomposition of rational values as `num*2^pow/den`.

Expand Down