Optimized SipHash implementation #13114

gereeter · 2014-03-24T12:27:01Z

This makes hashing a fair bit faster for long strings and non-string objects:

bench_compound_1: 70 -> 56
bench_long_str: 795 -> 525
bench_str: 32 -> 32

This helps with #11783.

huonw · 2014-03-24T12:36:07Z

src/libstd/hash/sip.rs

+    }
+
+    #[inline]
+    fn write_le_u16(&mut self, n: u16) -> IoResult<()> {


Are these still correct on a big-endian platform?

Also, I feel like these could be abstracted slightly (e.g. by a macro or a function taking n: u64 and length: uint), since the only (significant) difference between all these new writes is the 1/2/4/8.

It would also be nice to have a comment explaining why they're overwritten.

bench_compound_1: 70 -> 56 bench_long_str: 795 -> 525 bench_str: 32 -> 32

…hitectures

…implementation to get the same benifits as the previous try using unsafe code.

gereeter · 2014-03-30T19:06:28Z

I moved all the bytes to u64 conversion to using a new u64_from_le_bytes, combined the various writer functions into a macro and added a comment explaining why I am overriding them in the first place. I think that covers all the concerns brought up. If this looks good, I'll squash and rebase.

Incidentally, why are u64_from_le_bytes and friends in io::extensions instead of mem? They are largely unrelated to input and output.

alexcrichton · 2014-03-31T04:39:03Z

src/libstd/io/extensions.rs

+        copy_nonoverlapping_memory(out, ptr, size);
+        from_le64(*(out as *i64)) as u64
+    }
+}


This and the above function look quite similar, perhaps they could be refactored? Could the byte-swapping intrinsics be used in combination with reading using the big endian function?

I'd rather define the big endian function in terms of the little endian function, as the little endian function is slightly simpler, but merging seems reasonable, even if it breaks symmetry.

Either way is fine by me, I'd just shoot for less duplication.

alexcrichton · 2014-04-10T15:44:32Z

Closing due to inactivity, but feel free to reopen with a rebase!

@gereeter

work started from @gereeter's PR: rust-lang#13114 but adjusted bits

@gereeter

work started from @gereeter's PR: #13114 but adjusted bits ``` before test hash::sip::tests::bench_u64 ... bench: 34 ns/iter (+/- 0) test hash::sip::tests::bench_str_under_8_bytes ... bench: 37 ns/iter (+/- 1) test hash::sip::tests::bench_str_of_8_bytes ... bench: 43 ns/iter (+/- 1) test hash::sip::tests::bench_str_over_8_bytes ... bench: 50 ns/iter (+/- 1) test hash::sip::tests::bench_long_str ... bench: 613 ns/iter (+/- 14) test hash::sip::tests::bench_compound_1 ... bench: 114 ns/iter (+/- 11) after test hash::sip::tests::bench_u64 ... bench: 25 ns/iter (+/- 0) test hash::sip::tests::bench_str_under_8_bytes ... bench: 31 ns/iter (+/- 0) test hash::sip::tests::bench_str_of_8_bytes ... bench: 36 ns/iter (+/- 0) test hash::sip::tests::bench_str_over_8_bytes ... bench: 40 ns/iter (+/- 0) test hash::sip::tests::bench_long_str ... bench: 600 ns/iter (+/- 14) test hash::sip::tests::bench_compound_1 ... bench: 64 ns/iter (+/- 6) ``` Notably it seems smaller keys will hash faster. A long string doesn't see much gains, but compound cuts in half (once compound used a `int` and `u64`).

huonw reviewed Mar 24, 2014
View reviewed changes

Jonathan S added 5 commits March 28, 2014 17:20

Optimized SipHash implementation

3167e86

bench_compound_1: 70 -> 56 bench_long_str: 795 -> 525 bench_str: 32 -> 32

Reverted change to u8to64_le due to it being invalid on unaligned arc…

aa61854

…hitectures

Switched to a macro for the write_le functions

5bfe692

Added u64_from_le_bytes to io::extensions and used it in the SipHash …

7a99a1b

…implementation to get the same benifits as the previous try using unsafe code.

Switched wholly over to u64_from_le_bytes

5dba2cc

alexcrichton reviewed Mar 31, 2014
View reviewed changes

alexcrichton closed this Apr 10, 2014

seanmonstar mentioned this pull request Apr 14, 2014

optimzed SipHash implementation #13522

Merged

seanmonstar added a commit to seanmonstar/rust that referenced this pull request Apr 15, 2014

optimized SipHash implementation

9c1cd69

work started from @gereeter's PR: rust-lang#13114 but adjusted bits

gereeter deleted the faster-sip branch December 17, 2015 01:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimized SipHash implementation #13114

Optimized SipHash implementation #13114

Uh oh!

gereeter commented Mar 24, 2014

Uh oh!

huonw Mar 24, 2014

Uh oh!

alexcrichton Mar 24, 2014

Uh oh!

gereeter commented Mar 30, 2014

Uh oh!

alexcrichton Mar 31, 2014

Uh oh!

gereeter Mar 31, 2014

Uh oh!

alexcrichton Mar 31, 2014

Uh oh!

alexcrichton commented Apr 10, 2014

Uh oh!

Uh oh!

Optimized SipHash implementation #13114

Optimized SipHash implementation #13114

Uh oh!

Conversation

gereeter commented Mar 24, 2014

Uh oh!

huonw Mar 24, 2014

Choose a reason for hiding this comment

Uh oh!

alexcrichton Mar 24, 2014

Choose a reason for hiding this comment

Uh oh!

gereeter commented Mar 30, 2014

Uh oh!

alexcrichton Mar 31, 2014

Choose a reason for hiding this comment

Uh oh!

gereeter Mar 31, 2014

Choose a reason for hiding this comment

Uh oh!

alexcrichton Mar 31, 2014

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Apr 10, 2014

Uh oh!

Uh oh!