-
Notifications
You must be signed in to change notification settings - Fork 13.2k
Commit 2eebeb8
committed
auto merge of #12081 : cgaebel/rust/robinhood-hashing, r=alexcrichton
Partially addresses #11783.
Previously, rust's hashtable was totally unoptimized. It used an Option
per key-value pair, and used very naive open allocation.
The old hashtable had very high variance in lookup time. For an example,
see the 'find_nonexisting' benchmark below. This is fixed by keys in
'lucky' spots with a low probe sequence length getting their good spots
stolen by keys with long probe sequence lengths. This reduces hashtable
probe length variance, while maintaining the same mean.
Also, other optimization liberties were taken. Everything is as cache
aware as possible, and this hashtable should perform extremely well for
both large and small keys and values.
Benchmarks:
```
comprehensive_old_hashmap 378 ns/iter (+/- 8)
comprehensive_new_hashmap 206 ns/iter (+/- 4)
1.8x faster
old_hashmap_as_queue 238 ns/iter (+/- 8)
new_hashmap_as_queue 119 ns/iter (+/- 2)
2x faster
old_hashmap_insert 172 ns/iter (+/- 8)
new_hashmap_insert 146 ns/iter (+/- 11)
1.17x faster
old_hashmap_find_existing 50 ns/iter (+/- 12)
new_hashmap_find_existing 35 ns/iter (+/- 6)
1.43x faster
old_hashmap_find_notexisting 49 ns/iter (+/- 49)
new_hashmap_find_notexisting 34 ns/iter (+/- 4)
1.44x faster
Memory usage of old hashtable (64-bit assumed):
aligned(8+sizeof(Option)+sizeof(K)+sizeof(V))/0.75 + 48ish bytes
Memory usage of new hashtable:
(aligned(sizeof(K))
+ aligned(sizeof(V))
+ 8)/0.9 + 112ish bytes
Timing of building librustc:
compile_and_link: x86_64-unknown-linux-gnu/stage0/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc
time: 0.457 s parsing
time: 0.028 s gated feature checking
time: 0.000 s crate injection
time: 0.108 s configuration 1
time: 1.049 s expansion
time: 0.219 s configuration 2
time: 0.222 s maybe building test harness
time: 0.223 s prelude injection
time: 0.268 s assinging node ids and indexing ast
time: 0.075 s external crate/lib resolution
time: 0.026 s language item collection
time: 1.016 s resolution
time: 0.038 s lifetime resolution
time: 0.000 s looking for entry point
time: 0.030 s looking for macro registrar
time: 0.061 s freevar finding
time: 0.138 s region resolution
time: 0.110 s type collecting
time: 0.072 s variance inference
time: 0.126 s coherence checking
time: 9.110 s type checking
time: 0.186 s const marking
time: 0.049 s const checking
time: 0.418 s privacy checking
time: 0.057 s effect checking
time: 0.033 s loop checking
time: 1.293 s compute moves
time: 0.182 s match checking
time: 0.242 s liveness checking
time: 0.866 s borrow checking
time: 0.150 s kind checking
time: 0.013 s reachability checking
time: 0.175 s death checking
time: 0.461 s lint checking
time: 13.112 s translation
time: 4.352 s llvm function passes
time: 96.702 s llvm module passes
time: 50.574 s codegen passes
time: 154.611 s LLVM passes
time: 2.821 s running linker
time: 15.750 s linking
compile_and_link: x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc
time: 0.422 s parsing
time: 0.031 s gated feature checking
time: 0.000 s crate injection
time: 0.126 s configuration 1
time: 1.014 s expansion
time: 0.251 s configuration 2
time: 0.249 s maybe building test harness
time: 0.273 s prelude injection
time: 0.279 s assinging node ids and indexing ast
time: 0.076 s external crate/lib resolution
time: 0.033 s language item collection
time: 1.028 s resolution
time: 0.036 s lifetime resolution
time: 0.000 s looking for entry point
time: 0.029 s looking for macro registrar
time: 0.063 s freevar finding
time: 0.133 s region resolution
time: 0.111 s type collecting
time: 0.077 s variance inference
time: 0.565 s coherence checking
time: 8.953 s type checking
time: 0.176 s const marking
time: 0.050 s const checking
time: 0.401 s privacy checking
time: 0.063 s effect checking
time: 0.032 s loop checking
time: 1.291 s compute moves
time: 0.172 s match checking
time: 0.249 s liveness checking
time: 0.831 s borrow checking
time: 0.121 s kind checking
time: 0.013 s reachability checking
time: 0.179 s death checking
time: 0.503 s lint checking
time: 14.385 s translation
time: 4.495 s llvm function passes
time: 92.234 s llvm module passes
time: 51.172 s codegen passes
time: 150.809 s LLVM passes
time: 2.542 s running linker
time: 15.109 s linking
```
BUT accesses are much more cache friendly. In fact, if the probe
sequence length is below 8, only two cache lines worth of hashes will be
pulled into cache. This is unlike the old version which would have to
stride over the stoerd keys and values, and would be more cache
unfriendly the bigger the stored values got.
And did you notice the higher load factor? We can now reasonably get a
load factor of 0.9 with very good performance.
Please review this very closely. This is my first major contribution to Rust. Sorry for the ugly diff!1 file changed
+1412
-594
lines changed
0 commit comments