-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SipHasher takes up lots of time in incremental builds #51054
Comments
A small example: the impls of I guess I could retry the trick from #37427 of leb128-encoding integers. I did some measurements while compiling the compiler and 66% of the values hashed were |
Yeah, that's the reason. |
You could give The most effective optimizations we've done here is caching of hashes, like in 45bd091. Using a different hasher is an option if it supports 128 bit output and we can be sure that it has a similarly low collision probability as SipHash 2-4. However, I'd only like to change it if the results are compelling since with iterated on this quite a bit already (see #41215). |
The highwayhash authors claim to have a faster implementation of SipHash which still produces the same outputs: https://github.com/google/highwayhash. |
One thing that's not clear to me is what needs to be hashed and what doesn't. For example, Here is a lightly edited transcript of an IRC discussion about this, between me and @Zoxc.
@michaelwoerister: what do you think? |
Copying my comments from IRC:
It might be possible optimize specific cases like this:
That is, if you have a query C that depends A and B and B is derived from A. Then hashing B is indeed redundant. However, if there was one more query D that depends only on B, then not hashing B might lead to redundantly executing D if A changed but B was not affected by the change.
Also note that we don't re-hash query result where we already know the hash from a previous compilation session. |
FWIW, I tried replacing I also tried experimenting with different buffer sizes for SipHash. It seems that a larger buffer size ( |
I tried It may at least partly be explained by the fact that I managed to speed up SipHasher128 (the one used in incremental compilation) quite a lot last year in #68914. In fact, based on that, this issue doesn't need to be open any more. |
Here's Cachegrind data for
clap-rs
from rustc-benchmarks on a "base incremental" "check" build (i.e. the first incremental build):That's over 20% of the instructions under
short_write
.Here are the annotations for the hot pieces of code in librustc_data_structures/sip128.rs.
and
And from libcore/num/mod.rs:
I stared at this for a while but wasn't able to come up with any notable improvements.
Hashing is the hottest part of most incremental check builds. If we can't speed up this code, perhaps we could use a different hasher, or find a way to hash less data.
CC @michaelwoerister
The text was updated successfully, but these errors were encountered: