2x benchmark loss in rayon-hash from multiple codegen-units

I'm seeing a huge slowdown in [rayon-hash](https://github.com/rayon-rs/rayon-hash) benchmarks, resolved with `-Ccodegen-units=1`.

```
$ rustc -Vv
rustc 1.25.0-nightly (97520ccb1 2018-01-21)
binary: rustc
commit-hash: 97520ccb101609af63f29919bb0a39115269c89e
commit-date: 2018-01-21
host: x86_64-unknown-linux-gnu
release: 1.25.0-nightly
LLVM version: 4.0

$ cargo bench --bench set_sum
   Compiling [...]
    Finished release [optimized] target(s) in 5.51 secs
     Running target/release/deps/set_sum-833cf161cf760670

running 4 tests
test rayon_set_sum_parallel ... bench:   2,295,348 ns/iter (+/- 152,025)
test rayon_set_sum_serial   ... bench:   7,730,830 ns/iter (+/- 171,552)
test std_set_sum_parallel   ... bench:  10,038,209 ns/iter (+/- 188,189)
test std_set_sum_serial     ... bench:   7,733,258 ns/iter (+/- 134,850)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out

$ RUSTFLAGS=-Ccodegen-units=1 cargo bench --bench set_sum
   Compiling [...]
    Finished release [optimized] target(s) in 6.48 secs
     Running target/release/deps/set_sum-833cf161cf760670

running 4 tests
test rayon_set_sum_parallel ... bench:   1,092,732 ns/iter (+/- 105,979)
test rayon_set_sum_serial   ... bench:   6,152,751 ns/iter (+/- 83,103)
test std_set_sum_parallel   ... bench:   8,957,618 ns/iter (+/- 132,791)
test std_set_sum_serial     ... bench:   6,144,775 ns/iter (+/- 75,377)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out
```

`rayon_set_sum_parallel` is the showcase for this crate, and it suffers the most from CGU.

From profiling and disassembly, this seems to mostly be a lost inlining opportunity.  In the slower build, the profile is split 35% `bridge_unindexed_producer_consumer`, 34% `Iterator::fold`, 28% `Sum::sum`, and the hot loop in the first looks like:

```asm
 16.72 │126d0:   cmpq   $0x0,(%r12,%rbx,8)
  6.73 │126d5: ↓ jne    126e1 <rayon::iter::plumbing::bridge_unindexed_producer_consumer+0x201>
 16.65 │126d7:   inc    %rbx
  0.00 │126da:   cmp    %rbp,%rbx
  7.20 │126dd: ↑ jb     126d0 <rayon::iter::plumbing::bridge_unindexed_producer_consumer+0x1f0>
  0.05 │126df: ↓ jmp    1272f <rayon::iter::plumbing::bridge_unindexed_producer_consumer+0x24f>
 26.93 │126e1:   mov    0x0(%r13,%rbx,4),%eax
  4.26 │126e6:   movq   $0x1,0x38(%rsp)
  2.27 │126ef:   mov    %rax,0x40(%rsp)
  1.88 │126f4:   mov    %r14,%rdi
  4.58 │126f7: → callq  15b90 <<u64 as core::iter::traits::Sum>::sum>
  0.68 │126fc:   movq   $0x1,0x38(%rsp)
  2.58 │12705:   mov    %r15,0x40(%rsp)
  0.62 │1270a:   movq   $0x1,0x48(%rsp)
  0.31 │12713:   mov    %rax,0x50(%rsp)
  0.49 │12718:   movb   $0x0,0x58(%rsp)
  2.50 │1271d:   xor    %esi,%esi
  0.41 │1271f:   mov    %r14,%rdi
  0.85 │12722: → callq  153f0 <<core::iter::Chain<A, B> as core::iter::iterator::Iterator>::fold>
  1.30 │12727:   mov    %rax,%r15
  2.16 │1272a: ↑ jmp    126d7 <rayon::iter::plumbing::bridge_unindexed_producer_consumer+0x1f7>
```

With CGU=1, 96% of the profile is in `bridge_unindexed_producer_consumer`, with this hot loop:

```asm
  2.28 │1426d:   test   %rbx,%rbx
       │14270: ↓ je     14296 <rayon::iter::plumbing::bridge_unindexed_producer_consumer+0x146>
  5.40 │14272:   mov    (%rbx),%ebx
  2.75 │14274:   add    %rbx,%rax
  1.47 │14277:   lea    (%rdx,%rsi,4),%rbx
  0.21 │1427b:   nopl   0x0(%rax,%rax,1)
 29.56 │14280:   cmp    %rdi,%rsi
  0.04 │14283: ↓ jae    14296 <rayon::iter::plumbing::bridge_unindexed_producer_consumer+0x146>
  2.87 │14285:   add    $0x4,%rbx
 20.22 │14289:   cmpq   $0x0,(%rcx,%rsi,8)
  1.48 │1428e:   lea    0x1(%rsi),%rsi
  8.00 │14292: ↑ je     14280 <rayon::iter::plumbing::bridge_unindexed_producer_consumer+0x130>
 25.25 │14294: ↑ jmp    1426d <rayon::iter::plumbing::bridge_unindexed_producer_consumer+0x11d>
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2x benchmark loss in rayon-hash from multiple codegen-units #47665

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

2x benchmark loss in rayon-hash from multiple codegen-units #47665

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions