Rewrite shootout-nbody for better autovectorization #28891

cristicbz · 2015-10-07T20:07:10Z

This new version takes inspiration from the C implementation of the benchmark, but instead of explicitly using SIMD operations which can't be done on stable, it instead arranges everything the same way and leaves the actual vectorization up to LLVM.

In addition to the ~20% speed gains (see below), this PR also adds some general niceties which showcase the language a little bit: a Vec3 type to cut down on (x, y, z) repetition, using while let instead of loop-if-break, iterator adapters instead of for loops etc.

Here are the times in seconds of 10 runs each on my workstation:

before: 6.254, 6.260, 6.263, 6.264, 6.265, 6.267, 6.334, 6.341, 6.441, 6.509
before-min: 6.254
before-median: 6.266
before-max: 6.509

after: 4.823, 4.824, 4.826, 4.827, 4.837, 4.839, 4.881, 4.959, 4.990, 5.377
after-min: 4.823
after-median: 4.838
after-max: 5.377

gcc: 4.674, 4.676, 4.680, 4.682, 4.695, 4.696, 4.701, 4.708, 4.794, 5.297
gcc-min: 4.674
gcc-median: 4.696
gcc-max: 5.297

On my i7 laptop the speed up is less impressive, from ~5.4s to ~4.7s, but still significant. On my Vultr VPS the numbers look closer to the workstation results. Surprisingly my laptop beats both office workstation and VPS...

rust-highfive · 2015-10-07T20:07:25Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

steveklabnik · 2015-10-07T20:09:01Z

Oh wow. Faster and Rusty-er. Awesome! 💯

killercup · 2015-10-07T20:21:41Z

I'm pretty sure the folks at https://github.com/TeXitoi/benchmarksgame-rs would be interested in this as well.

cc @TeXitoi, @Veedrac, @llogiq

alexcrichton · 2015-10-07T21:33:00Z

@bors: r+ 49d2441

Nice wins!

llogiq · 2015-10-07T21:33:50Z

Damn! This mostly looks like the implementation I was working on – you beat me to it. 😄 kudos.

And yes, please push a PR to benchmarksgame-rs.

cristicbz · 2015-10-07T22:58:17Z

@llogiq Ah, didn't realize there was a dedicated repo for this, thanks for the tip---it was your blog post that got me working on this! I already applied to submit my solution to alioth, though, hope that's all right.

Will create a PR to the benchmarks repo tomorrow, it's currently midnight on a work night over here!

bors · 2015-10-08T06:28:56Z

⌛ Testing commit 49d2441 with merge 5e06068...

llogiq · 2015-10-08T06:44:01Z

@cristicbz Of course, it's all good. Keep your alioth username in the submitter line, then teXitoi will know it's already submitted (you may look at Veedrac's recent PRs for examples)

bors · 2015-10-08T07:39:05Z

💔 Test failed - auto-mac-64-nopt-t

cristicbz · 2015-10-08T07:55:11Z

Looks like a spurious failure to me:

command timed out: 1200 seconds without output, attempting to kill process killed by signal 15 program finished with exit code -1 elapsedTime=1843.612040

dotdash · 2015-10-08T08:00:50Z

@bors retry
Am 08.10.2015 09:55 schrieb "Cristi Cobzarenco" notifications@github.com:

Looks like a spurious failure to me:

command timed out: 1200 seconds without output, attempting to kill process killed by signal 15 program finished with exit code -1 elapsedTime=1843.612040

—
Reply to this email directly or view it on GitHub
#28891 (comment).

bors · 2015-10-08T10:04:13Z

⌛ Testing commit 49d2441 with merge bcd27eb...

This new version takes inspiration from the C implementation of the benchmark, but instead of explicitly using SIMD operations which can't be done on stable, it instead arranges everything the same way and leaves the actual vectorization up to LLVM. In addition to the ~20% speed gains (see below), this PR also adds some general niceties which showcase the language a little bit: a `Vec3` type to cut down on `(x, y, z)` repetition, using `while let` instead of `loop-if-break`, iterator adapters instead of for loops etc. Here are the times in seconds of 10 runs each on my workstation: ``` before: 6.254, 6.260, 6.263, 6.264, 6.265, 6.267, 6.334, 6.341, 6.441, 6.509 before-min: 6.254 before-median: 6.266 before-max: 6.509 after: 4.823, 4.824, 4.826, 4.827, 4.837, 4.839, 4.881, 4.959, 4.990, 5.377 after-min: 4.823 after-median: 4.838 after-max: 5.377 gcc: 4.674, 4.676, 4.680, 4.682, 4.695, 4.696, 4.701, 4.708, 4.794, 5.297 gcc-min: 4.674 gcc-median: 4.696 gcc-max: 5.297 ``` On my i7 laptop the speed up is less impressive, from ~5.4s to ~4.7s, but still significant. On my Vultr VPS the numbers look closer to the workstation results. Surprisingly my laptop beats both office workstation and VPS...

bors · 2015-10-08T11:50:31Z

☀️ Test successful - auto-linux-32-nopt-t, auto-linux-32-opt, auto-linux-64-nopt-t, auto-linux-64-opt, auto-linux-64-x-android-t, auto-mac-32-opt, auto-mac-64-nopt-t, auto-mac-64-opt, auto-win-gnu-32-nopt-t, auto-win-gnu-32-opt, auto-win-gnu-64-nopt-t, auto-win-gnu-64-opt, auto-win-msvc-32-opt, auto-win-msvc-64-opt

bench: rewrite nbody for better vectorization

49d2441

rust-highfive assigned alexcrichton Oct 7, 2015

bors merged commit 49d2441 into rust-lang:master Oct 8, 2015

cristicbz deleted the new-nbody branch October 8, 2015 16:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rewrite shootout-nbody for better autovectorization #28891

Rewrite shootout-nbody for better autovectorization #28891

Uh oh!

cristicbz commented Oct 7, 2015

Uh oh!

rust-highfive commented Oct 7, 2015

Uh oh!

steveklabnik commented Oct 7, 2015

Uh oh!

killercup commented Oct 7, 2015

Uh oh!

alexcrichton commented Oct 7, 2015

Uh oh!

llogiq commented Oct 7, 2015

Uh oh!

cristicbz commented Oct 7, 2015

Uh oh!

bors commented Oct 8, 2015

Uh oh!

llogiq commented Oct 8, 2015

Uh oh!

bors commented Oct 8, 2015

Uh oh!

cristicbz commented Oct 8, 2015

Uh oh!

dotdash commented Oct 8, 2015

Uh oh!

bors commented Oct 8, 2015

Uh oh!

bors commented Oct 8, 2015

Uh oh!

Uh oh!

Rewrite shootout-nbody for better autovectorization #28891

Rewrite shootout-nbody for better autovectorization #28891

Uh oh!

Conversation

cristicbz commented Oct 7, 2015

Uh oh!

rust-highfive commented Oct 7, 2015

Uh oh!

steveklabnik commented Oct 7, 2015

Uh oh!

killercup commented Oct 7, 2015

Uh oh!

alexcrichton commented Oct 7, 2015

Uh oh!

llogiq commented Oct 7, 2015

Uh oh!

cristicbz commented Oct 7, 2015

Uh oh!

bors commented Oct 8, 2015

Uh oh!

llogiq commented Oct 8, 2015

Uh oh!

bors commented Oct 8, 2015

Uh oh!

cristicbz commented Oct 8, 2015

Uh oh!

dotdash commented Oct 8, 2015

Uh oh!

bors commented Oct 8, 2015

Uh oh!

bors commented Oct 8, 2015

Uh oh!

Uh oh!