Initial value of `FxHasher` leads to avoidable collisions #17

eggyal · 2021-04-22T20:23:02Z

It's easy to see that FxHasher { hash: 0 }.add_to_hash(0) is a fixed point (i.e. does not result in any change to the internal state self.hash):

rustc-hash/src/lib.rs

Lines 76 to 81 in 5e09ea0

    
           impl FxHasher { 
        
               #[inline] 
        
               fn add_to_hash(&mut self, i: usize) { 
        
                   self.hash = self.hash.rotate_left(5).bitxor(i).wrapping_mul(K); 
        
               } 
        
           }

In particular 0.rotate_left(5).bitxor(0).wrapping_mul(K) == 0.

Therefore initialising FxHasher with a hash value of 0 is a bit undesirable, as sequences of 0s (written to the hash before any non-zero value) cannot be distinguished by their hashes.

rustc-hash/src/lib.rs

Lines 69 to 74 in 5e09ea0

    
           impl Default for FxHasher { 
        
               #[inline] 
        
               fn default() -> FxHasher { 
        
                   FxHasher { hash: 0 } 
        
               } 
        
           }

Of course, there's always the possibility that a value of self.hash.rotate_left(5) could be written to the hasher, which will result in the internal state reaching 0—but in practice writing values of 0 can arise quite often, and choosing some other initial value would be quite beneficial? Perhaps even K?

The text was updated successfully, but these errors were encountered:

eggyal · 2021-04-29T17:28:24Z

Of course, resolving #15 would make this moot.

workingjubilee · 2024-03-09T03:24:30Z

@Nilstrieb Should this be closed?

FeldrinH · 2024-03-09T04:51:27Z

I would say that this should not be closed. While there are now API-s for setting initial seeds manually or randomly it would still be a good idea to change the default seed to avoid collisions in cases where the default FxHasher is used (for example the FxHashMap type alias).

workingjubilee · 2024-03-09T05:17:42Z

Ah. Very well. PRs welcome!

WaffleLapkin · 2024-03-09T09:28:21Z

I wanted to check some different values on rustc, to know if some values miraculously make it faster, but hadn't gotten to it yet

nnethercote · 2024-03-10T21:49:37Z

I previously tried changing the initial value away from zero, along with lots of other things, see here for details.

The strength of FxHasher lies entirely in its incredible speed at hashing small inputs, and the value zero contributes to that. Consider the main part of the code:

#[inline]
fn default() -> FxHasher {
    FxHasher { hash: 0 }
}

const K: usize = 0x517cc1b727220a95;

#[inline]
fn add_to_hash(&mut self, i: usize) {
    self.hash = self.hash.rotate_left(5).bitxor(i).wrapping_mul(K);
}

#[inline]
fn finish(&self) -> u64 {
    self.hash as u64
}

If you are hashing a single integer, it's just three operations: rol, xor, mul. Except the rol is on the initial hash value, which is zero, so the result of that rol is zero. And then xor'ing i onto zero just gives i. And then you multiply. All that can be constant folded away by the compiler and the actual hashing operation is just to multiply i by K, which is about the fastest hash function imaginable.

If you change the initial value away from zero you can probably constant fold the rol but not the xor, so it becomes two instructions instead of one.

WaffleLapkin · 2024-03-10T21:57:26Z

@nnethercote have you tried the starting constant other than 1?

The fact that 0 allows to constant fold more is interesting, although I'm surprised it actually makes a difference 🤔

workingjubilee · 2024-03-10T23:01:00Z

I wonder if using usize::MAX bears any fruit.

nnethercote · 2024-03-10T23:10:32Z

I don't remember if I tried anything other than 1. I do remember trying many different things to improve FxHasher performance and none of them working, enough that I have sworn off trying to improve it further. (That's kind of the point of the blog post.)

I wonder if using usize::MAX bears any fruit.

Non-tiny integers might increase binary size.

workingjubilee · 2024-03-10T23:20:44Z

Mmmh. I don't think so in this case, as usize::MAX is a bit special: it's the all-1s bitpattern.

…ze-max, r=<try> [WIP] Perf experiment for rustc-hash with ones-idiom init CPUs for a while now know (at least) PCMPEQ as a dependency-breaking "ones idiom", and it's not a huge encoding next to constant loads. Let's try it out and see how the hashing goes. Spurred by rust-lang/rustc-hash#17

the8472 · 2024-03-11T00:01:38Z

I previously tried changing the initial value away from zero, along with lots of other things, see here for details.

Out of curiosity, do you know why it's xor and not wrapping add? According to instruction tables they have mostly the same latency and throughput (varies slightly between processors) and I'd naively assume that the carries in the add would help a little with the quality.

nnethercote · 2024-03-11T00:45:06Z

Out of curiosity, do you know why it's xor and not wrapping add?

My best guess is that xor is a something of a traditional choice?

FxHasher was copied from Firefox's source code, see here for some design discussion, though it doesn't touch upon the use of xor.

lqd · 2024-03-11T00:59:17Z

A wrapping add instead of xor was done in #18

workingjubilee · 2024-03-11T03:39:45Z

Experimented in rust-lang/rust#122316 and I was technically right about one thing re: usize::MAX, but it sure wasn't fruitful.

WaffleLapkin mentioned this issue Feb 15, 2022

Add (random) seed api #22

Merged

workingjubilee mentioned this issue Mar 10, 2024

[WIP] Perf experiment for rustc-hash with ones-idiom init rust-lang/rust#122316

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial value of `FxHasher` leads to avoidable collisions #17

Initial value of `FxHasher` leads to avoidable collisions #17

eggyal commented Apr 22, 2021 •

edited

Loading

eggyal commented Apr 29, 2021

workingjubilee commented Mar 9, 2024

FeldrinH commented Mar 9, 2024

workingjubilee commented Mar 9, 2024

WaffleLapkin commented Mar 9, 2024

nnethercote commented Mar 10, 2024

WaffleLapkin commented Mar 10, 2024

workingjubilee commented Mar 10, 2024

nnethercote commented Mar 10, 2024

workingjubilee commented Mar 10, 2024

the8472 commented Mar 11, 2024

nnethercote commented Mar 11, 2024

lqd commented Mar 11, 2024

workingjubilee commented Mar 11, 2024

Initial value of FxHasher leads to avoidable collisions #17

Initial value of FxHasher leads to avoidable collisions #17

Comments

eggyal commented Apr 22, 2021 • edited Loading

eggyal commented Apr 29, 2021

workingjubilee commented Mar 9, 2024

FeldrinH commented Mar 9, 2024

workingjubilee commented Mar 9, 2024

WaffleLapkin commented Mar 9, 2024

nnethercote commented Mar 10, 2024

WaffleLapkin commented Mar 10, 2024

workingjubilee commented Mar 10, 2024

nnethercote commented Mar 10, 2024

workingjubilee commented Mar 10, 2024

the8472 commented Mar 11, 2024

nnethercote commented Mar 11, 2024

lqd commented Mar 11, 2024

workingjubilee commented Mar 11, 2024

Initial value of `FxHasher` leads to avoidable collisions #17

Initial value of `FxHasher` leads to avoidable collisions #17

eggyal commented Apr 22, 2021 •

edited

Loading