-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Rewrite shootout-nbody for better autovectorization #28891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
Oh wow. Faster and Rusty-er. Awesome! 💯 |
I'm pretty sure the folks at https://github.com/TeXitoi/benchmarksgame-rs would be interested in this as well. |
Damn! This mostly looks like the implementation I was working on – you beat me to it. 😄 kudos. And yes, please push a PR to benchmarksgame-rs. |
@llogiq Ah, didn't realize there was a dedicated repo for this, thanks for the tip---it was your blog post that got me working on this! I already applied to submit my solution to alioth, though, hope that's all right. Will create a PR to the benchmarks repo tomorrow, it's currently midnight on a work night over here! |
⌛ Testing commit 49d2441 with merge 5e06068... |
@cristicbz Of course, it's all good. Keep your alioth username in the submitter line, then teXitoi will know it's already submitted (you may look at Veedrac's recent PRs for examples) |
💔 Test failed - auto-mac-64-nopt-t |
Looks like a spurious failure to me:
|
@bors retry
|
This new version takes inspiration from the C implementation of the benchmark, but instead of explicitly using SIMD operations which can't be done on stable, it instead arranges everything the same way and leaves the actual vectorization up to LLVM. In addition to the ~20% speed gains (see below), this PR also adds some general niceties which showcase the language a little bit: a `Vec3` type to cut down on `(x, y, z)` repetition, using `while let` instead of `loop-if-break`, iterator adapters instead of for loops etc. Here are the times in seconds of 10 runs each on my workstation: ``` before: 6.254, 6.260, 6.263, 6.264, 6.265, 6.267, 6.334, 6.341, 6.441, 6.509 before-min: 6.254 before-median: 6.266 before-max: 6.509 after: 4.823, 4.824, 4.826, 4.827, 4.837, 4.839, 4.881, 4.959, 4.990, 5.377 after-min: 4.823 after-median: 4.838 after-max: 5.377 gcc: 4.674, 4.676, 4.680, 4.682, 4.695, 4.696, 4.701, 4.708, 4.794, 5.297 gcc-min: 4.674 gcc-median: 4.696 gcc-max: 5.297 ``` On my i7 laptop the speed up is less impressive, from ~5.4s to ~4.7s, but still significant. On my Vultr VPS the numbers look closer to the workstation results. Surprisingly my laptop beats both office workstation and VPS...
This new version takes inspiration from the C implementation of the benchmark, but instead of explicitly using SIMD operations which can't be done on stable, it instead arranges everything the same way and leaves the actual vectorization up to LLVM.
In addition to the ~20% speed gains (see below), this PR also adds some general niceties which showcase the language a little bit: a
Vec3
type to cut down on(x, y, z)
repetition, usingwhile let
instead ofloop-if-break
, iterator adapters instead of for loops etc.Here are the times in seconds of 10 runs each on my workstation:
On my i7 laptop the speed up is less impressive, from ~5.4s to ~4.7s, but still significant. On my Vultr VPS the numbers look closer to the workstation results. Surprisingly my laptop beats both office workstation and VPS...