Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster BigInt hashing #33790

Merged
merged 2 commits into from
Feb 25, 2020
Merged

faster BigInt hashing #33790

merged 2 commits into from
Feb 25, 2020

Conversation

rfourquet
Copy link
Member

The improvements concerns mainly:

  1. ints which fit in one Limb (by special casing these more directly)
  2. ints which are even: they had to be right shifted in the generic algorithm, leading to allocations.

Here is a small table containing the time for 1 hash (on my laptop) for few different sizes and for even/odd numbers. In parenthesis is the approximate ratio compared to master. "16 (sparse)" means something like 2^1000, while "16 (dense)" is something like 2^1000+1 or 2^1000+2.

num limbs master even master odd PR
1 128 ns (18x) 39 ns (5.5x) 7 ns
2 143 ns (4x) 46 ns (1.3x) 36 ns
16 (sparse) 149 ns (3.5x) n.a. 41 ns
16 (dense) 232 ns (1.8x) 134 ns (1.08x) 124 ns

@rfourquet rfourquet added performance Must go faster bignums BigInt and BigFloat hashing labels Nov 8, 2019
@JeffreySarnoff
Copy link
Contributor

+1

@StefanKarpinski
Copy link
Member

Very clever! I'm happy to merge this whenever.

@rfourquet
Copy link
Member Author

So I reviewed again in detail, still looks good to me :) I had tested with many thousands of values that the old and new versions match.

But I wonder now whether I should revert deleting the hash_integer(::BitInt, ::UInt) method: it would not be used anymore in Base, but after all this optimized version might be used by custom number types which go through the hash(x::Real, h::UInt) method (which in turns calls hash_integer), and e.g. have a decompose which returns some BigInts.

@JeffreySarnoff
Copy link
Contributor

That is a good thought. Let's do that and then get this off the runway!

@JeffreySarnoff
Copy link
Contributor

bump

@StefanKarpinski
Copy link
Member

Anything left to do here besides merge?

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Feb 23, 2020

Faster BigInt hashing is ready to be merged.
@rfourquet do you feel differently? [I'm good either way]

Working on a custom type, I have run into another (though related) decompose error. The proper fix is that the custom numeric type implement its own type specific hash and decompose. It turns out that backstopping hash_integer(n::BigInt, h::UInt) is a porous fix (because it was present in the version I used and did not prevent erroring on/through decompose). I do not have that code.

The removed function is shown below for reference.

function hash_integer(n::BigInt, h::UInt)
    s = n.size
    s == 0 && return hash_integer(0, h)
    p = convert(Ptr{UInt}, n.d)
    b = unsafe_load(p)
    h ⊻= hash_uint(ifelse(s < 0, -b, b) ⊻ h)
    for k = 2:abs(s)
        h ⊻= hash_uint(unsafe_load(p, k) ⊻ h)
    end
    return h
end

@JeffreySarnoff
Copy link
Contributor

I assumed that not restoring the function indicated a desire to leave it out. If that is incorrect, so much the better, please put it back [covering paths accessed by reasonable use is reasonable].

@rfourquet
Copy link
Member Author

Sure, I restored hash_integer(n::BigInt, h::UInt). Otherwise, when CI turns green again, this seems good to go for me, no docs or NEWS.md to update, or tests to add.

@JeffreySarnoff
Copy link
Contributor

@StefanKarpinski skies are clear, it is a beautiful day for merging.

@KristofferC
Copy link
Member

KristofferC commented Feb 24, 2020

I didn't understand the reason for keeping hash_integer(n::BigInt, h::UInt). In what cases is it useful and how is a user supposed to know it exists? Is it still tested? And if GMP.Limb!== UInt that function definition will just disappear?

@StefanKarpinski
Copy link
Member

IIUC, it's because external code may be relying on it.

@JeffreySarnoff
Copy link
Contributor

I encountered this a few years ago, and the fine grained specifics are a little hazy.

struct ExtendedReal{T} <: Real
     value::T
end

Unless hash and decompose are overloaded to handle this type
using ExtendedReal will lean on hash(x::Real, h::UInt). That calls hash_integer, and for ExtendedReal the call is to hash_integer(n::BigInt, h::UInt) since decompose(x::ExtendedReal) will provide a decomposition that uses BigInts.

Writing special purpose decompose methods can be difficult to do; the operational semantics of decompose does not always mesh with the nature of an extended real type (e.g. precise yet inexact values, fuzzy numbers).

@rfourquet
Copy link
Member Author

rfourquet commented Feb 25, 2020

In what cases is it useful

To phrase it in a different way than @JeffreySarnoff put it, it's for user-defined integer types which might use the generic Base-provided hash(x::Real, h::UInt), but overload decompose such that one of the components (num, denom, pow) is a BigInt. Without this specific hash_integer(n::BigInt, h::UInt) method, such type will fall back to using hash_integer(n::Integer, h::UInt), which is quite slower.

and how is a user supposed to know it exists?

She doesn't have to know. But if we remove it here, while not breaking, she might notice a significant slow-down in hashing her type.

Is it still tested?

I believe not. We could add a few tests I guess.

And if GMP.Limb!== UInt that function definition will just disappear?

Yes, but then the generic slow fall-back will take over.

@KristofferC
Copy link
Member

KristofferC commented Feb 25, 2020

Alright, it just seems non-ideal to have untested "dead" code in Base "because it can be useful". Anyway, I don't want to hold up this PR but at some point, having a test to make it covered and some comment saying when it is being hit and that it exists as a performance optimization (like in the comment above) would be good imo.

@rfourquet
Copy link
Member Author

it just seems unideal to have untested "dead" code in Base "because it can be useful".

Yeah I agree, and that's why I deleted it initially. But actually, I was stupid, it's not dead code and is used by Base's own Rational{BigInt}:

julia> @btime hash($(big(2)//3), UInt(0))
  127.409 ns (3 allocations: 48 bytes)               
0x36fc23415080622e

julia> m = collect(methods(Base.hash_integer))[1]    
hash_integer(n::BigInt, h::UInt64) in Base at hashing2.jl:17                                              

julia> Base.delete_method(m)

julia> @btime hash($(big(2)//3), UInt(0))            
  257.855 ns (7 allocations: 128 bytes)              
0x36fc23415080622e                                   

@JeffreySarnoff
Copy link
Contributor

that's a wrap .. time to merge

h ⊻= hash_uint(ifelse(s < 0, -b, b) ⊻ h)
for k = 2:abs(s)
h ⊻= hash_uint(unsafe_load(p, k) ⊻ h)
if GMP.Limb === UInt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it would be worth adding a comment noting what happens when this isn't true. Am I reading it correctly that when GMP.Limb !== UInt this falls back to the above hash_integer(n::Integer, h::UInt) method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this then falls back to the slower hash_integer(::Integer, ::UInt). Ok, will add a comment. I'm not even certain this happens in practive, but I couldn't prove that it can't, so it seemed safer to state clearly the assumptions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before pushing again and running CI for only comments change, let's check first if the comment is fine:

# this condition is true most (all?) of the time, and in this case we can define
# an optimized version of the above hash_integer(::Integer, ::UInt) method for BigInt
if GMP.Limb === UInt
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, looks good. You could also just push a separate [ci skip] commit and then we can squash on merge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok will do, for some reason I stayed with the idea that [ci skip] didn't work anymore, since like 2 years ago or so!

@KristofferC KristofferC merged commit 1bd0593 into master Feb 25, 2020
@KristofferC KristofferC deleted the rf/bigint-fast-hash branch February 25, 2020 14:39
ravibitsgoa pushed a commit to ravibitsgoa/julia that referenced this pull request Apr 9, 2020
KristofferC pushed a commit that referenced this pull request Apr 11, 2020
* faster BigInt hashing
rfourquet added a commit that referenced this pull request May 19, 2021
When the code was moved out of hashing2.jl, some imports where missed,
cancelling the benefits of #33790.
dkarrasch pushed a commit that referenced this pull request May 20, 2021
…0881)

When the code was moved out of hashing2.jl, some imports where missed,
cancelling the benefits of #33790.
KristofferC pushed a commit that referenced this pull request Jun 4, 2021
…0881)

When the code was moved out of hashing2.jl, some imports where missed,
cancelling the benefits of #33790.

(cherry picked from commit c4ee1fa)
shirodkara pushed a commit to shirodkara/julia that referenced this pull request Jun 9, 2021
…liaLang#40881)

When the code was moved out of hashing2.jl, some imports where missed,
cancelling the benefits of JuliaLang#33790.
johanmon pushed a commit to johanmon/julia that referenced this pull request Jul 5, 2021
…liaLang#40881)

When the code was moved out of hashing2.jl, some imports where missed,
cancelling the benefits of JuliaLang#33790.
staticfloat pushed a commit that referenced this pull request Dec 23, 2022
…0881)

When the code was moved out of hashing2.jl, some imports where missed,
cancelling the benefits of #33790.

(cherry picked from commit c4ee1fa)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bignums BigInt and BigFloat hashing performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants