compiler generate slower code in my benchmarks, after upgrade from 0.13.0 to master #24014

kostya · 2015-04-03T17:35:20Z

https://github.com/kostya/benchmarks

update is here: kostya/benchmarks@042ee51...78ef5f2

brainfuck example 1.7 times slower
matmul example 1.8 times slower
json example used 2 times more memory

The text was updated successfully, but these errors were encountered:

dotdash · 2015-04-03T22:54:23Z

At least for matmul, this looks like it might be caused by the new Range iterator. I didn't look at the others yet.

Old asm for the inner loop:

  0,05 │4e0:   cmp    %rdx,%rax
       │     ↓ jae    50e
  0,03 │4e5:   mov    (%rbx),%rbp
  0,20 │       cmp    %rbp,%rax
  0,05 │     ↓ jae    527
 28,57 │4ed:   mov    (%r8),%rbp
  0,03 │       mov    (%rdi),%r9
  0,04 │       movsd  (%r9,%rax,8),%xmm1
  0,30 │       mulsd  0x0(%rbp,%rax,8),%xmm1
 40,50 │       lea    0x1(%rax),%rax
  0,06 │       addsd  %xmm1,%xmm0
 29,15 │       cmp    %r12,%rax
       │     ↑ jb     4e0

New asm:

       │1 600:   mov    %rsi,%rcx
 21,23 │         add    $0x1,%rcx
  0,03 │         cmovb  %r12,%rcx
  0,02 │         cmp    %rsi,%rdx
  0,03 │       ↓ jbe    638
 20,87 │1 610:   mov    0x0(%rbp),%rbx
  0,10 │         cmp    %rsi,%rbx
       │       ↓ jbe    646
  0,03 │1 619:   mov    (%r9),%rbx
  0,01 │         mov    (%rdi),%r15
 21,45 │         movsd  (%r15,%rsi,8),%xmm1
  0,21 │         mulsd  (%rbx,%rsi,8),%xmm1
  7,03 │         addsd  %xmm1,%xmm0
 28,06 │         cmp    %r12,%rcx
  0,02 │         mov    %rcx,%rsi
       │       ↑ jb     600

That extra cmov in there is probably the result of the checked add +
mem::swap that the new implementation uses.

cc @aturon

pnkfelix · 2015-04-03T23:08:31Z

@dotdash when you say "the checked add +", do you mean artihmetic overflow detection? Are you thinking that even when one passes -O without enabling debug-assertions that we are still taking a hit here (though hopefully a much smaller one)?

aturon · 2015-04-03T23:15:54Z

@pnkfelix I suspect https://github.com/rust-lang/rust/blob/master/src/libcore/iter.rs#L2468-L2470 is the culprit -- the step implementation is using checked_add regardless of debug assertions.

dotdash · 2015-04-03T23:16:43Z

@pnkfelix Yes, step() explicitly calls checked_add() so it can handle the overflow. The new Range thing combines the old Range and RangeStep. The old Range didn't have to check for overflow because the step size was fixed at 1.

aturon · 2015-04-03T23:18:23Z

Note that the traits here are unstable, and it may be best to break them up for these distinct cases (or perhaps to instead switch to some other design).

Do either of you want to take this on? If not, I can look into it next week.

dotdash · 2015-04-03T23:28:20Z

I'm trying to get some other things done and only checked this to see if it's related, so I'll have to pass.

dotdash · 2015-04-04T12:03:02Z

The brainfuck benchmark seems to be (at least in part) slowed down by integer overflow checks in the SipHasher. I guess nightlies are built with debug-assertions enabled.

alexcrichton · 2015-04-06T16:11:06Z

Thanks @dotdash for tracking down the iteration problem! #24095 I believe is also about that as well.

A recent change to the implementation of range iterators meant that, even when stepping by 1, the iterators *always* involved checked arithmetic. This commit reverts to the earlier behavior (while retaining the refactoring into traits). Fixes rust-lang#24095 cc rust-lang#24014

@alexcrichton

A recent change to the implementation of range iterators meant that, even when stepping by 1, the iterators *always* involved checked arithmetic. This commit reverts to the earlier behavior (while retaining the refactoring into traits). Fixes #24095 Closes #24119 cc #24014 r? @alexcrichton

kostya · 2015-04-18T15:01:34Z

confirm that matmul fixed kostya/benchmarks@5e8409f
but brainfuck, still slower, than before

Since the hashmap and its hasher are implemented in different crates, we currently can't benefit from inlining, which means that especially for small, fixed size keys, there is a huge overhead in hash calculations, because the compiler can't apply optimizations that only apply for these keys. Fixes the brainfuck benchmark in rust-lang#24014.

Since the hashmap and its hasher are implemented in different crates, we currently can't benefit from inlining, which means that especially for small, fixed size keys, there is a huge overhead in hash calculations, because the compiler can't apply optimizations that only apply for these keys. Fixes the brainfuck benchmark in #24014.

dotdash · 2015-05-17T17:46:44Z

@kostya can you confirm that the brainfuck performance is good again now?

kostya · 2015-05-18T13:38:28Z

yes brainfuck in 1.2-nighly much faster kostya/benchmarks@34ff509...378760b (btw still slower than javascript :) )

kostya · 2015-05-18T13:42:58Z

about json example used 2 times more memory i think it issue of rustc-serialize

pnkfelix added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Apr 3, 2015

alexcrichton mentioned this issue Apr 6, 2015

Perf regression in Vec<u8> Write impl? #24095

Closed

aturon mentioned this issue Apr 6, 2015

Fix range performance regression #24120

Merged

dotdash mentioned this issue May 3, 2015

Restore HashMap performance by allowing some functions to be inlined #25070

Merged

kostya closed this as completed May 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

compiler generate slower code in my benchmarks, after upgrade from 0.13.0 to master #24014

compiler generate slower code in my benchmarks, after upgrade from 0.13.0 to master #24014

kostya commented Apr 3, 2015

dotdash commented Apr 3, 2015

Uh oh!

pnkfelix commented Apr 3, 2015

Uh oh!

aturon commented Apr 3, 2015

Uh oh!

dotdash commented Apr 3, 2015

Uh oh!

aturon commented Apr 3, 2015

Uh oh!

dotdash commented Apr 3, 2015

Uh oh!

dotdash commented Apr 4, 2015

Uh oh!

alexcrichton commented Apr 6, 2015

Uh oh!

kostya commented Apr 18, 2015

Uh oh!

dotdash commented May 17, 2015

Uh oh!

kostya commented May 18, 2015

Uh oh!

kostya commented May 18, 2015

Uh oh!

compiler generate slower code in my benchmarks, after upgrade from 0.13.0 to master #24014

compiler generate slower code in my benchmarks, after upgrade from 0.13.0 to master #24014

Comments

kostya commented Apr 3, 2015

dotdash commented Apr 3, 2015

Uh oh!

pnkfelix commented Apr 3, 2015

Uh oh!

aturon commented Apr 3, 2015

Uh oh!

dotdash commented Apr 3, 2015

Uh oh!

aturon commented Apr 3, 2015

Uh oh!

dotdash commented Apr 3, 2015

Uh oh!

dotdash commented Apr 4, 2015

Uh oh!

alexcrichton commented Apr 6, 2015

Uh oh!

kostya commented Apr 18, 2015

Uh oh!

dotdash commented May 17, 2015

Uh oh!

kostya commented May 18, 2015

Uh oh!

kostya commented May 18, 2015

Uh oh!