Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: buffer SipHasher128 #77476

Merged
merged 4 commits into from
Oct 25, 2020
Merged

Conversation

tgnottingham
Copy link
Contributor

@tgnottingham tgnottingham commented Oct 3, 2020

This is an attempt to improve Siphasher128 performance by buffering input. Although it reduces instruction count, I'm not confident the effect on wall times, or lack-thereof, is worth the change.


Additional notes not reflected in source comments:

  • Implementation choices were guided by a combination of results from rustc-perf and micro-benchmarks, mostly the former.
  • I tried a couple of different struct layouts that might be more cache friendly with no obvious effect. Update: a particular struct layout was chosen, but it's not critical to performance. See comments in source and discussion below.
  • I suspect that buffering would be important to a SIMD-accelerated algorithm, but from what I've read and my own tests, SipHash does not seem very amenable to SIMD acceleration, at least by SSE.

@rust-highfive
Copy link
Contributor

r? @varkor

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 3, 2020
@tgnottingham tgnottingham force-pushed the buffered_siphasher128 branch from 63a9296 to 634f9be Compare October 3, 2020 04:46
@tgnottingham
Copy link
Contributor Author

When someone has a chance, can I get a perf run on this? Thank you :)

@jyn514
Copy link
Member

jyn514 commented Oct 3, 2020

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Collaborator

bors commented Oct 3, 2020

⌛ Trying commit 634f9be5f25f97b58d444472dbd7c15fe940d9ce with merge cd008531939631a26b5c0b5b63f06ffb0810fe7b...

@jyn514 jyn514 added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 3, 2020
@bors
Copy link
Collaborator

bors commented Oct 3, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: cd008531939631a26b5c0b5b63f06ffb0810fe7b (cd008531939631a26b5c0b5b63f06ffb0810fe7b)

@rust-timer
Copy link
Collaborator

Queued cd008531939631a26b5c0b5b63f06ffb0810fe7b with parent 8c54cf6, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (cd008531939631a26b5c0b5b63f06ffb0810fe7b): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@jyn514
Copy link
Member

jyn514 commented Oct 3, 2020

Oh wow! those are some nice numbers :) Max of -6% instruction count, about -1% overall. Wall-times are all over the place as usual but seem to have mostly decreased other than an increase in hello-world.

@tgnottingham tgnottingham force-pushed the buffered_siphasher128 branch from 634f9be to f6f96e2 Compare October 3, 2020 17:06
@tgnottingham
Copy link
Contributor Author

tgnottingham commented Oct 4, 2020

Okay, with Joshua's motivating comment :), I'm feeling a little better about the possibility of this being integrated. I'll remove the "[DO NOT MERGE]" and see where it goes.

@nnethercote and @michaelwoerister, you may have feedback on this, as I know it's an area you've touched.

@tgnottingham tgnottingham changed the title [DO NOT MERGE] perf: buffer SipHasher128 perf: buffer SipHasher128 Oct 4, 2020
Copy link
Contributor

@tesuji tesuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some style comments, will add more later.

@elichai
Copy link
Contributor

elichai commented Oct 4, 2020

  • I suspect that buffering would be important to a SIMD-accelerated algorithm, but from what I've read and my own tests, SipHash does not seem very amenable to SIMD acceleration, at least by SSE.

yeah siphash doesn't respond well to SSE, it's fast enough that the time it takes to load an SSE register makes it slower than non SSE, at least for siphash64, you can see here my attempts on it for a different project: bitcoin/bitcoin#17774

@tgnottingham tgnottingham force-pushed the buffered_siphasher128 branch from c90d72d to de58d51 Compare October 5, 2020 02:42
@varkor
Copy link
Member

varkor commented Oct 7, 2020

I don't think I'm the right person to review this. I'll reassign to r? @nnethercote for now, but find another reviewer if they don't have time to review.

@rust-highfive rust-highfive assigned nnethercote and unassigned varkor Oct 7, 2020
Copy link
Contributor

@nnethercote nnethercote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks good, though I have a few minor suggestions. Thanks for writing lots of good comments.

Prior to this PR, the SipHash in libstd was much the same as this one, but now they're not. Would it be worth changing the libstd one as well?

On the performance front, instruction counts have the least variance and wall time measures the right thing. Cycles can often be a good intermediate, being a more real-world measure while also being less variable than wall times. And the cycle results look even better than the instruction counts. So that's good.

@tgnottingham tgnottingham force-pushed the buffered_siphasher128 branch from 664e78e to a602d15 Compare October 12, 2020 06:48
@tgnottingham
Copy link
Contributor Author

Thanks for the review, btw. :)

Prior to this PR, the SipHash in libstd was much the same as this one, but now they're not. Would it be worth changing the libstd one as well?

I guess the libs team would have to weigh in on that. The buffering improves the rustc workload, but there are workloads that it would cause regressions for. Creating lots of hashers that only hash a couple bytes is a bad workload for example.

@tgnottingham
Copy link
Contributor Author

@nnethercote Friendly ping--anything else you'd like to see improved?

@nnethercote
Copy link
Contributor

Sorry for the slow response.

@bors r+

@bors
Copy link
Collaborator

bors commented Oct 25, 2020

📌 Commit a602d15 has been approved by nnethercote

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 25, 2020
@bors
Copy link
Collaborator

bors commented Oct 25, 2020

⌛ Testing commit a602d15 with merge 5171cc7...

@bors
Copy link
Collaborator

bors commented Oct 25, 2020

☀️ Test successful - checks-actions
Approved by: nnethercote
Pushing 5171cc7 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 25, 2020
@bors bors merged commit 5171cc7 into rust-lang:master Oct 25, 2020
@rustbot rustbot added this to the 1.49.0 milestone Oct 25, 2020
@tgnottingham
Copy link
Contributor Author

Sorry for the slow response.

Not a problem at all, thanks for reviewing! :)

@Mark-Simulacrum
Copy link
Member

Performance results on the merge commit were consistent with the try run in this PR, but wall times look noisy and largely unchanged. Still, less instructions is presumably better for e.g. valgrind or similar tooling, so good work :)

@tgnottingham tgnottingham deleted the buffered_siphasher128 branch January 20, 2021 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.