-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial value of FxHasher
leads to avoidable collisions
#17
Comments
Of course, resolving #15 would make this moot. |
@Nilstrieb Should this be closed? |
I would say that this should not be closed. While there are now API-s for setting initial seeds manually or randomly it would still be a good idea to change the default seed to avoid collisions in cases where the default FxHasher is used (for example the FxHashMap type alias). |
Ah. Very well. PRs welcome! |
I wanted to check some different values on rustc, to know if some values miraculously make it faster, but hadn't gotten to it yet |
I previously tried changing the initial value away from zero, along with lots of other things, see here for details. The strength of
If you are hashing a single integer, it's just three operations: rol, xor, mul. Except the rol is on the initial hash value, which is zero, so the result of that rol is zero. And then xor'ing If you change the initial value away from zero you can probably constant fold the rol but not the xor, so it becomes two instructions instead of one. |
@nnethercote have you tried the starting constant other than 1? The fact that |
I wonder if using |
I don't remember if I tried anything other than 1. I do remember trying many different things to improve
Non-tiny integers might increase binary size. |
Mmmh. I don't think so in this case, as |
…ze-max, r=<try> [WIP] Perf experiment for rustc-hash with ones-idiom init CPUs for a while now know (at least) PCMPEQ as a dependency-breaking "ones idiom", and it's not a huge encoding next to constant loads. Let's try it out and see how the hashing goes. Spurred by rust-lang/rustc-hash#17
Out of curiosity, do you know why it's xor and not wrapping add? According to instruction tables they have mostly the same latency and throughput (varies slightly between processors) and I'd naively assume that the carries in the add would help a little with the quality. |
My best guess is that xor is a something of a traditional choice?
|
A wrapping add instead of xor was done in #18 |
Experimented in rust-lang/rust#122316 and I was technically right about one thing re: |
It's easy to see that
FxHasher { hash: 0 }.add_to_hash(0)
is a fixed point (i.e. does not result in any change to the internal stateself.hash
):rustc-hash/src/lib.rs
Lines 76 to 81 in 5e09ea0
In particular
0.rotate_left(5).bitxor(0).wrapping_mul(K) == 0
.Therefore initialising
FxHasher
with ahash
value of0
is a bit undesirable, as sequences of0
s (written to the hash before any non-zero value) cannot be distinguished by their hashes.rustc-hash/src/lib.rs
Lines 69 to 74 in 5e09ea0
Of course, there's always the possibility that a value of
self.hash.rotate_left(5)
could be written to the hasher, which will result in the internal state reaching0
—but in practice writing values of0
can arise quite often, and choosing some other initial value would be quite beneficial? Perhaps evenK
?The text was updated successfully, but these errors were encountered: