Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized SipHash implementation #13114

Closed
wants to merge 5 commits into from
Closed

Conversation

gereeter
Copy link
Contributor

This makes hashing a fair bit faster for long strings and non-string objects:

bench_compound_1: 70 -> 56
bench_long_str: 795 -> 525
bench_str: 32 -> 32

This helps with #11783.

}

#[inline]
fn write_le_u16(&mut self, n: u16) -> IoResult<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these still correct on a big-endian platform?

Also, I feel like these could be abstracted slightly (e.g. by a macro or a function taking n: u64 and length: uint), since the only (significant) difference between all these new writes is the 1/2/4/8.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be nice to have a comment explaining why they're overwritten.

Jonathan S added 5 commits March 28, 2014 17:20
@gereeter
Copy link
Contributor Author

I moved all the bytes to u64 conversion to using a new u64_from_le_bytes, combined the various writer functions into a macro and added a comment explaining why I am overriding them in the first place. I think that covers all the concerns brought up. If this looks good, I'll squash and rebase.

Incidentally, why are u64_from_le_bytes and friends in io::extensions instead of mem? They are largely unrelated to input and output.

copy_nonoverlapping_memory(out, ptr, size);
from_le64(*(out as *i64)) as u64
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the above function look quite similar, perhaps they could be refactored? Could the byte-swapping intrinsics be used in combination with reading using the big endian function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather define the big endian function in terms of the little endian function, as the little endian function is slightly simpler, but merging seems reasonable, even if it breaks symmetry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way is fine by me, I'd just shoot for less duplication.

@alexcrichton
Copy link
Member

Closing due to inactivity, but feel free to reopen with a rebase!

seanmonstar added a commit to seanmonstar/rust that referenced this pull request Apr 15, 2014
work started from @gereeter's PR: rust-lang#13114
but adjusted bits
bors added a commit that referenced this pull request Apr 16, 2014
work started from @gereeter's PR: #13114
but adjusted bits

```
before
test hash::sip::tests::bench_u64                            ... bench:        34 ns/iter (+/- 0)
test hash::sip::tests::bench_str_under_8_bytes              ... bench:        37 ns/iter (+/- 1)
test hash::sip::tests::bench_str_of_8_bytes                 ... bench:        43 ns/iter (+/- 1)
test hash::sip::tests::bench_str_over_8_bytes               ... bench:        50 ns/iter (+/- 1)
test hash::sip::tests::bench_long_str                       ... bench:       613 ns/iter (+/- 14)
test hash::sip::tests::bench_compound_1                     ... bench:       114 ns/iter (+/- 11)

after
test hash::sip::tests::bench_u64                            ... bench:        25 ns/iter (+/- 0)
test hash::sip::tests::bench_str_under_8_bytes              ... bench:        31 ns/iter (+/- 0)
test hash::sip::tests::bench_str_of_8_bytes                 ... bench:        36 ns/iter (+/- 0)
test hash::sip::tests::bench_str_over_8_bytes               ... bench:        40 ns/iter (+/- 0)
test hash::sip::tests::bench_long_str                       ... bench:       600 ns/iter (+/- 14)
test hash::sip::tests::bench_compound_1                     ... bench:        64 ns/iter (+/- 6)
```

Notably it seems smaller keys will hash faster. A long string doesn't see much gains, but compound cuts in half (once compound used a `int` and `u64`).
@gereeter gereeter deleted the faster-sip branch December 17, 2015 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants