Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler generate slower code in my benchmarks, after upgrade from 0.13.0 to master #24014

Closed
2 of 3 tasks
kostya opened this issue Apr 3, 2015 · 12 comments
Closed
2 of 3 tasks
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@kostya
Copy link

kostya commented Apr 3, 2015

https://github.com/kostya/benchmarks

update is here: kostya/benchmarks@042ee51...78ef5f2

  • brainfuck example 1.7 times slower
  • matmul example 1.8 times slower
  • json example used 2 times more memory
@pnkfelix pnkfelix added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Apr 3, 2015
@dotdash
Copy link
Contributor

dotdash commented Apr 3, 2015

At least for matmul, this looks like it might be caused by the new Range iterator. I didn't look at the others yet.

Old asm for the inner loop:

  0,05 │4e0:   cmp    %rdx,%rax
       │     ↓ jae    50e
  0,03 │4e5:   mov    (%rbx),%rbp
  0,20cmp    %rbp,%rax
  0,05 │     ↓ jae    527
 28,57 │4ed:   mov    (%r8),%rbp
  0,03mov    (%rdi),%r9
  0,04movsd  (%r9,%rax,8),%xmm1
  0,30mulsd  0x0(%rbp,%rax,8),%xmm1
 40,50lea    0x1(%rax),%rax
  0,06addsd  %xmm1,%xmm0
 29,15cmp    %r12,%rax
       │     ↑ jb     4e0

New asm:

1 600:   mov    %rsi,%rcx
 21,23add    $0x1,%rcx
  0,03 │         cmovb  %r12,%rcx
  0,02cmp    %rsi,%rdx
  0,03 │       ↓ jbe    638
 20,871 610:   mov    0x0(%rbp),%rbx
  0,10cmp    %rsi,%rbx
       │       ↓ jbe    646
  0,031 619:   mov    (%r9),%rbx
  0,01mov    (%rdi),%r15
 21,45movsd  (%r15,%rsi,8),%xmm1
  0,21mulsd  (%rbx,%rsi,8),%xmm1
  7,03addsd  %xmm1,%xmm0
 28,06cmp    %r12,%rcx
  0,02mov    %rcx,%rsi
       │       ↑ jb     600

That extra cmov in there is probably the result of the checked add +
mem::swap that the new implementation uses.

cc @aturon

@pnkfelix
Copy link
Member

pnkfelix commented Apr 3, 2015

@dotdash when you say "the checked add +", do you mean artihmetic overflow detection? Are you thinking that even when one passes -O without enabling debug-assertions that we are still taking a hit here (though hopefully a much smaller one)?

@aturon
Copy link
Member

aturon commented Apr 3, 2015

@pnkfelix I suspect https://github.com/rust-lang/rust/blob/master/src/libcore/iter.rs#L2468-L2470 is the culprit -- the step implementation is using checked_add regardless of debug assertions.

@dotdash
Copy link
Contributor

dotdash commented Apr 3, 2015

@pnkfelix Yes, step() explicitly calls checked_add() so it can handle the overflow. The new Range thing combines the old Range and RangeStep. The old Range didn't have to check for overflow because the step size was fixed at 1.

@aturon
Copy link
Member

aturon commented Apr 3, 2015

Note that the traits here are unstable, and it may be best to break them up for these distinct cases (or perhaps to instead switch to some other design).

Do either of you want to take this on? If not, I can look into it next week.

@dotdash
Copy link
Contributor

dotdash commented Apr 3, 2015

I'm trying to get some other things done and only checked this to see if it's related, so I'll have to pass.

@dotdash
Copy link
Contributor

dotdash commented Apr 4, 2015

The brainfuck benchmark seems to be (at least in part) slowed down by integer overflow checks in the SipHasher. I guess nightlies are built with debug-assertions enabled.

@alexcrichton
Copy link
Member

Thanks @dotdash for tracking down the iteration problem! #24095 I believe is also about that as well.

aturon added a commit to aturon/rust that referenced this issue Apr 6, 2015
A recent change to the implementation of range iterators meant that,
even when stepping by 1, the iterators *always* involved checked
arithmetic.

This commit reverts to the earlier behavior (while retaining the
refactoring into traits).

Fixes rust-lang#24095
cc rust-lang#24014
aturon added a commit to aturon/rust that referenced this issue Apr 7, 2015
A recent change to the implementation of range iterators meant that,
even when stepping by 1, the iterators *always* involved checked
arithmetic.

This commit reverts to the earlier behavior (while retaining the
refactoring into traits).

Fixes rust-lang#24095
cc rust-lang#24014
bors added a commit that referenced this issue Apr 8, 2015
A recent change to the implementation of range iterators meant that,
even when stepping by 1, the iterators *always* involved checked
arithmetic.

This commit reverts to the earlier behavior (while retaining the
refactoring into traits).

Fixes #24095
Closes #24119
cc #24014 

r? @alexcrichton
@kostya
Copy link
Author

kostya commented Apr 18, 2015

confirm that matmul fixed kostya/benchmarks@5e8409f
but brainfuck, still slower, than before

dotdash added a commit to dotdash/rust that referenced this issue May 3, 2015
Since the hashmap and its hasher are implemented in different crates, we
currently can't benefit from inlining, which means that especially for
small, fixed size keys, there is a huge overhead in hash calculations,
because the compiler can't apply optimizations that only apply for these
keys.

Fixes the brainfuck benchmark in rust-lang#24014.
bors added a commit that referenced this issue May 3, 2015
Since the hashmap and its hasher are implemented in different crates, we
currently can't benefit from inlining, which means that especially for
small, fixed size keys, there is a huge overhead in hash calculations,
because the compiler can't apply optimizations that only apply for these
keys.

Fixes the brainfuck benchmark in #24014.
@dotdash
Copy link
Contributor

dotdash commented May 17, 2015

@kostya can you confirm that the brainfuck performance is good again now?

@kostya
Copy link
Author

kostya commented May 18, 2015

yes brainfuck in 1.2-nighly much faster kostya/benchmarks@34ff509...378760b (btw still slower than javascript :) )

@kostya
Copy link
Author

kostya commented May 18, 2015

about json example used 2 times more memory i think it issue of rustc-serialize

@kostya kostya closed this as completed May 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

5 participants