Newtype impedes vectorization #24963

bluss · 2015-04-29T21:46:10Z

In the following example, the newtype is not zero-cost in practice since it seems to impede optimizations in llvm. The plain u32 sum vectorizes while Foo(u32) does not. The newtype fold needs 3 times the runtime of the plain u32 fold.

rustc version: rustc 1.1.0-nightly (97d4e76 2015-04-27) (built 2015-04-28)

code (playpen link)

(The code has been updated)

#![crate_type="lib"]
#![feature(test)]
extern crate test;

#[inline(never)]
pub fn folds(x: &[u32]) -> u32 { x.iter().fold(0, |a, &b| a + b) }

#[derive(Copy, Clone)]
pub struct Foo<T>(T);

#[inline(never)]
pub fn folds_foo(x: &[Foo<u32>]) -> Foo<u32> { x.iter().fold(Foo(0), |a, &b| Foo(a.0 + b.0)) }

#[bench]
fn folds1(b: &mut test::Bencher)
{
    let xs = test::black_box(vec![1; 1024]);
    b.iter(|| {
        folds(&xs)
    })
}

#[bench]
fn folds2(b: &mut test::Bencher)
{
    let xs = test::black_box(vec![Foo(1); 1024]);
    b.iter(|| {
        folds_foo(&xs)
    })
}

bench results vary with compilation setting

// rustc -C opt-level=3 --test

running 2 tests
test folds1 ... bench:       206 ns/iter (+/- 5)
test folds2 ... bench:       609 ns/iter (+/- 6)

// rustc -C opt-level=3 -C target-cpu=corei7-avx --test
running 2 tests
test folds1 ... bench:       131 ns/iter (+/- 1)
test folds2 ... bench:       192 ns/iter (+/- 3)

The text was updated successfully, but these errors were encountered:

Stebalien · 2015-04-29T22:56:18Z

bluss · 2015-04-30T10:38:48Z

I agree, that's what I was intending @Stebalien, but it shouldn't affect codegen either. When I tried it doesn't affect benchmarks. Vectorization is cool btw, compiling with corei7-avx decreases the plain u32 fold's runtime even more, increasing the benchmark difference. :-)

bluss · 2015-04-30T13:21:11Z

Updated code & bench.

Aatch · 2015-05-07T04:43:04Z

Ah goddammit, I actually wrote a patch that unwrapped newtypes, at least for simple cases (which includes this one), but abandoned it because I couldn't see a noticable difference in optimisation/performance.

bluss · 2015-05-07T09:50:10Z

Not vectorizing isn't actually correct. It just does it less efficiently. That's some puzzle to try to understand.

bluss · 2015-05-25T15:19:52Z

cc @dotdash if you have time & interest

bluss · 2016-03-25T17:04:51Z

Triage: Still an issue with rustc 1.9.0-nightly (98f0a9128 2016-03-23)

Mark-Simulacrum · 2017-05-02T12:14:17Z

I feel like this is fixed judging by the bench results below. Please reopen if that's not the case, preferably with a summary of what we should be looking for to close this issue.

Without target-cpu:

$ ./test --bench

running 2 tests
test folds1 ... bench:          44 ns/iter (+/- 8)
test folds2 ... bench:          45 ns/iter (+/- 2)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured

and with:

$ rustc -Copt-level=3 -C target-cpu=corei7-avx --test test.rs
$ ./test --bench

running 2 tests
test folds1 ... bench:          40 ns/iter (+/- 0)
test folds2 ... bench:          40 ns/iter (+/- 0)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured

bluss · 2017-05-02T17:58:04Z

Nice! I can confirm that too.

According to playpen right now, not fixed in stable not fixed in rustc 1.18.0-beta.1 (4dce67253 2017-04-25)

But it is fixed in rustc 1.19.0-nightly (777ee2079 2017-05-01)

sanxiyn added the A-codegen Area: Code generation label Apr 30, 2015

bluss mentioned this issue Mar 4, 2016

Sub-optimal codegen for float newtypes #32031

Closed

Mark-Simulacrum closed this as completed May 2, 2017

bluss mentioned this issue Jun 8, 2017

Imprecise floating point operations (fast-math) #21690

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newtype impedes vectorization #24963

Newtype impedes vectorization #24963

bluss commented Apr 29, 2015

Stebalien commented Apr 29, 2015

bluss commented Apr 30, 2015

bluss commented Apr 30, 2015

Aatch commented May 7, 2015

bluss commented May 7, 2015

bluss commented May 25, 2015

bluss commented Mar 25, 2016

Mark-Simulacrum commented May 2, 2017

bluss commented May 2, 2017

Newtype impedes vectorization #24963

Newtype impedes vectorization #24963

Comments

bluss commented Apr 29, 2015

Stebalien commented Apr 29, 2015

bluss commented Apr 30, 2015

bluss commented Apr 30, 2015

Aatch commented May 7, 2015

bluss commented May 7, 2015

bluss commented May 25, 2015

bluss commented Mar 25, 2016

Mark-Simulacrum commented May 2, 2017

bluss commented May 2, 2017