-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: buffer SipHasher128 #77476
perf: buffer SipHasher128 #77476
Conversation
r? @varkor (rust_highfive has picked a reviewer for you, use r? to override) |
63a9296
to
634f9be
Compare
When someone has a chance, can I get a perf run on this? Thank you :) |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 634f9be5f25f97b58d444472dbd7c15fe940d9ce with merge cd008531939631a26b5c0b5b63f06ffb0810fe7b... |
☀️ Try build successful - checks-actions, checks-azure |
Queued cd008531939631a26b5c0b5b63f06ffb0810fe7b with parent 8c54cf6, future comparison URL. |
Finished benchmarking try commit (cd008531939631a26b5c0b5b63f06ffb0810fe7b): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Oh wow! those are some nice numbers :) Max of -6% instruction count, about -1% overall. Wall-times are all over the place as usual but seem to have mostly decreased other than an increase in hello-world. |
634f9be
to
f6f96e2
Compare
Okay, with Joshua's motivating comment :), I'm feeling a little better about the possibility of this being integrated. I'll remove the "[DO NOT MERGE]" and see where it goes. @nnethercote and @michaelwoerister, you may have feedback on this, as I know it's an area you've touched. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some style comments, will add more later.
yeah siphash doesn't respond well to SSE, it's fast enough that the time it takes to load an SSE register makes it slower than non SSE, at least for siphash64, you can see here my attempts on it for a different project: bitcoin/bitcoin#17774 |
c90d72d
to
de58d51
Compare
de58d51
to
581cc4a
Compare
I don't think I'm the right person to review this. I'll reassign to r? @nnethercote for now, but find another reviewer if they don't have time to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly looks good, though I have a few minor suggestions. Thanks for writing lots of good comments.
Prior to this PR, the SipHash in libstd was much the same as this one, but now they're not. Would it be worth changing the libstd one as well?
On the performance front, instruction counts have the least variance and wall time measures the right thing. Cycles can often be a good intermediate, being a more real-world measure while also being less variable than wall times. And the cycle results look even better than the instruction counts. So that's good.
664e78e
to
a602d15
Compare
Thanks for the review, btw. :)
I guess the libs team would have to weigh in on that. The buffering improves the rustc workload, but there are workloads that it would cause regressions for. Creating lots of hashers that only hash a couple bytes is a bad workload for example. |
@nnethercote Friendly ping--anything else you'd like to see improved? |
Sorry for the slow response. @bors r+ |
📌 Commit a602d15 has been approved by |
☀️ Test successful - checks-actions |
Not a problem at all, thanks for reviewing! :) |
Performance results on the merge commit were consistent with the try run in this PR, but wall times look noisy and largely unchanged. Still, less instructions is presumably better for e.g. valgrind or similar tooling, so good work :) |
This is an attempt to improve Siphasher128 performance by buffering input. Although it reduces instruction count, I'm not confident the effect on wall times, or lack-thereof, is worth the change.
Additional notes not reflected in source comments:
I tried a couple of different struct layouts that might be more cache friendly with no obvious effect.Update: a particular struct layout was chosen, but it's not critical to performance. See comments in source and discussion below.