Skip to content

Rewrite shootout-nbody for better autovectorization #28891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 8, 2015

Conversation

cristicbz
Copy link
Contributor

This new version takes inspiration from the C implementation of the benchmark, but instead of explicitly using SIMD operations which can't be done on stable, it instead arranges everything the same way and leaves the actual vectorization up to LLVM.

In addition to the ~20% speed gains (see below), this PR also adds some general niceties which showcase the language a little bit: a Vec3 type to cut down on (x, y, z) repetition, using while let instead of loop-if-break, iterator adapters instead of for loops etc.

Here are the times in seconds of 10 runs each on my workstation:

before: 6.254, 6.260, 6.263, 6.264, 6.265, 6.267, 6.334, 6.341, 6.441, 6.509
before-min: 6.254
before-median: 6.266
before-max: 6.509

after: 4.823, 4.824, 4.826, 4.827, 4.837, 4.839, 4.881, 4.959, 4.990, 5.377
after-min: 4.823
after-median: 4.838
after-max: 5.377

gcc: 4.674, 4.676, 4.680, 4.682, 4.695, 4.696, 4.701, 4.708, 4.794, 5.297
gcc-min: 4.674
gcc-median: 4.696
gcc-max: 5.297

On my i7 laptop the speed up is less impressive, from ~5.4s to ~4.7s, but still significant. On my Vultr VPS the numbers look closer to the workstation results. Surprisingly my laptop beats both office workstation and VPS...

@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@steveklabnik
Copy link
Member

Oh wow. Faster and Rusty-er. Awesome! 💯

@killercup
Copy link
Member

I'm pretty sure the folks at https://github.com/TeXitoi/benchmarksgame-rs would be interested in this as well.

cc @TeXitoi, @Veedrac, @llogiq

@alexcrichton
Copy link
Member

@bors: r+ 49d2441

Nice wins!

@llogiq
Copy link
Contributor

llogiq commented Oct 7, 2015

Damn! This mostly looks like the implementation I was working on – you beat me to it. 😄 kudos.

And yes, please push a PR to benchmarksgame-rs.

@cristicbz
Copy link
Contributor Author

@llogiq Ah, didn't realize there was a dedicated repo for this, thanks for the tip---it was your blog post that got me working on this! I already applied to submit my solution to alioth, though, hope that's all right.

Will create a PR to the benchmarks repo tomorrow, it's currently midnight on a work night over here!

@bors
Copy link
Collaborator

bors commented Oct 8, 2015

⌛ Testing commit 49d2441 with merge 5e06068...

@llogiq
Copy link
Contributor

llogiq commented Oct 8, 2015

@cristicbz Of course, it's all good. Keep your alioth username in the submitter line, then teXitoi will know it's already submitted (you may look at Veedrac's recent PRs for examples)

@bors
Copy link
Collaborator

bors commented Oct 8, 2015

💔 Test failed - auto-mac-64-nopt-t

@cristicbz
Copy link
Contributor Author

Looks like a spurious failure to me:

command timed out: 1200 seconds without output, attempting to kill process killed by signal 15 program finished with exit code -1 elapsedTime=1843.612040

@dotdash
Copy link
Contributor

dotdash commented Oct 8, 2015

@bors retry
Am 08.10.2015 09:55 schrieb "Cristi Cobzarenco" notifications@github.com:

Looks like a spurious failure to me:

command timed out: 1200 seconds without output, attempting to kill process killed by signal 15 program finished with exit code -1 elapsedTime=1843.612040


Reply to this email directly or view it on GitHub
#28891 (comment).

@bors
Copy link
Collaborator

bors commented Oct 8, 2015

⌛ Testing commit 49d2441 with merge bcd27eb...

bors added a commit that referenced this pull request Oct 8, 2015
This new version takes inspiration from the C implementation of the benchmark, but instead of explicitly using SIMD operations which can't be done on stable, it instead arranges everything the same way and leaves the actual vectorization up to LLVM.

In addition to the ~20% speed gains (see below), this PR also adds some general niceties which showcase the language a little bit: a `Vec3` type to cut down on `(x, y, z)` repetition,  using `while let` instead of `loop-if-break`, iterator adapters instead of for loops etc.

Here are the times in seconds of 10 runs each on my workstation:

```
before: 6.254, 6.260, 6.263, 6.264, 6.265, 6.267, 6.334, 6.341, 6.441, 6.509
before-min: 6.254
before-median: 6.266
before-max: 6.509

after: 4.823, 4.824, 4.826, 4.827, 4.837, 4.839, 4.881, 4.959, 4.990, 5.377
after-min: 4.823
after-median: 4.838
after-max: 5.377

gcc: 4.674, 4.676, 4.680, 4.682, 4.695, 4.696, 4.701, 4.708, 4.794, 5.297
gcc-min: 4.674
gcc-median: 4.696
gcc-max: 5.297
```

On my i7 laptop the speed up is less impressive, from ~5.4s to ~4.7s, but still significant. On my Vultr VPS the numbers look closer to the workstation results. Surprisingly my laptop beats both office workstation and VPS...
@bors bors merged commit 49d2441 into rust-lang:master Oct 8, 2015
@cristicbz cristicbz deleted the new-nbody branch October 8, 2015 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants