Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial value of FxHasher leads to avoidable collisions #17

Open
eggyal opened this issue Apr 22, 2021 · 14 comments
Open

Initial value of FxHasher leads to avoidable collisions #17

eggyal opened this issue Apr 22, 2021 · 14 comments

Comments

@eggyal
Copy link
Contributor

eggyal commented Apr 22, 2021

It's easy to see that FxHasher { hash: 0 }.add_to_hash(0) is a fixed point (i.e. does not result in any change to the internal state self.hash):

rustc-hash/src/lib.rs

Lines 76 to 81 in 5e09ea0

impl FxHasher {
#[inline]
fn add_to_hash(&mut self, i: usize) {
self.hash = self.hash.rotate_left(5).bitxor(i).wrapping_mul(K);
}
}

In particular 0.rotate_left(5).bitxor(0).wrapping_mul(K) == 0.

Therefore initialising FxHasher with a hash value of 0 is a bit undesirable, as sequences of 0s (written to the hash before any non-zero value) cannot be distinguished by their hashes.

rustc-hash/src/lib.rs

Lines 69 to 74 in 5e09ea0

impl Default for FxHasher {
#[inline]
fn default() -> FxHasher {
FxHasher { hash: 0 }
}
}

Of course, there's always the possibility that a value of self.hash.rotate_left(5) could be written to the hasher, which will result in the internal state reaching 0—but in practice writing values of 0 can arise quite often, and choosing some other initial value would be quite beneficial? Perhaps even K?

@eggyal
Copy link
Contributor Author

eggyal commented Apr 29, 2021

Of course, resolving #15 would make this moot.

@workingjubilee
Copy link
Member

@Nilstrieb Should this be closed?

@FeldrinH
Copy link

FeldrinH commented Mar 9, 2024

I would say that this should not be closed. While there are now API-s for setting initial seeds manually or randomly it would still be a good idea to change the default seed to avoid collisions in cases where the default FxHasher is used (for example the FxHashMap type alias).

@workingjubilee
Copy link
Member

Ah. Very well. PRs welcome!

@WaffleLapkin
Copy link
Member

I wanted to check some different values on rustc, to know if some values miraculously make it faster, but hadn't gotten to it yet

@nnethercote
Copy link
Contributor

I previously tried changing the initial value away from zero, along with lots of other things, see here for details.

The strength of FxHasher lies entirely in its incredible speed at hashing small inputs, and the value zero contributes to that. Consider the main part of the code:

#[inline]
fn default() -> FxHasher {
    FxHasher { hash: 0 }
}

const K: usize = 0x517cc1b727220a95;

#[inline]
fn add_to_hash(&mut self, i: usize) {
    self.hash = self.hash.rotate_left(5).bitxor(i).wrapping_mul(K);
}

#[inline]
fn finish(&self) -> u64 {
    self.hash as u64
}

If you are hashing a single integer, it's just three operations: rol, xor, mul. Except the rol is on the initial hash value, which is zero, so the result of that rol is zero. And then xor'ing i onto zero just gives i. And then you multiply. All that can be constant folded away by the compiler and the actual hashing operation is just to multiply i by K, which is about the fastest hash function imaginable.

If you change the initial value away from zero you can probably constant fold the rol but not the xor, so it becomes two instructions instead of one.

@WaffleLapkin
Copy link
Member

@nnethercote have you tried the starting constant other than 1?

The fact that 0 allows to constant fold more is interesting, although I'm surprised it actually makes a difference 🤔

@workingjubilee
Copy link
Member

I wonder if using usize::MAX bears any fruit.

@nnethercote
Copy link
Contributor

I don't remember if I tried anything other than 1. I do remember trying many different things to improve FxHasher performance and none of them working, enough that I have sworn off trying to improve it further. (That's kind of the point of the blog post.)

I wonder if using usize::MAX bears any fruit.

Non-tiny integers might increase binary size.

@workingjubilee
Copy link
Member

Mmmh. I don't think so in this case, as usize::MAX is a bit special: it's the all-1s bitpattern.

bors added a commit to rust-lang-ci/rust that referenced this issue Mar 10, 2024
…ze-max, r=<try>

[WIP] Perf experiment for rustc-hash with ones-idiom init

CPUs for a while now know (at least) PCMPEQ as a dependency-breaking "ones idiom", and it's not a huge encoding next to constant loads. Let's try it out and see how the hashing goes.

Spurred by rust-lang/rustc-hash#17
@the8472
Copy link
Member

the8472 commented Mar 11, 2024

I previously tried changing the initial value away from zero, along with lots of other things, see here for details.

Out of curiosity, do you know why it's xor and not wrapping add? According to instruction tables they have mostly the same latency and throughput (varies slightly between processors) and I'd naively assume that the carries in the add would help a little with the quality.

@nnethercote
Copy link
Contributor

Out of curiosity, do you know why it's xor and not wrapping add?

My best guess is that xor is a something of a traditional choice?

FxHasher was copied from Firefox's source code, see here for some design discussion, though it doesn't touch upon the use of xor.

@lqd
Copy link
Member

lqd commented Mar 11, 2024

A wrapping add instead of xor was done in #18

@workingjubilee
Copy link
Member

Experimented in rust-lang/rust#122316 and I was technically right about one thing re: usize::MAX, but it sure wasn't fruitful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants