Skip to content

Generic code produce lots of no-ops compared to the monomorphic version. #8334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sebcrozet opened this issue Aug 6, 2013 · 8 comments
Closed

Comments

@sebcrozet
Copy link
Contributor

Generic code should produce the same code as their monomorphic counterparts. However, a lot of additional nop are produced on generic code. For example:

#[inline(never)]
fn doit_not_generic(a: f32) -> f32 {
    let mut a = a;
    do 1000000000.times {
        a = a * a;
    }

    a
}

#[inline(never)]
fn doit<N: Mul<N, N>>(a: N) -> N {
    let mut a = a;
    do 1000000000.times {
        a = a * a;
    }

    a
}

When called with an f32, produced asm for doit has a lot of nop before the multiplication:

00000000004025e0 <_ZN9doit_437216_38a5b7ada5228707_0$x2e0E>:
  4025e0:   64 48 3b 24 25 70 00    cmp    %fs:0x70,%rsp
  4025e7:   00 00
  4025e9:   77 1a                   ja     402605 <_ZN9doit_437216_38a5b7ada5228707_0$x2e0E+0x25>
  4025eb:   49 ba 08 00 00 00 00    movabs $0x8,%r10
  4025f2:   00 00 00
  4025f5:   49 bb 00 00 00 00 00    movabs $0x0,%r11
  4025fc:   00 00 00
  4025ff:   e8 28 00 00 00          callq  40262c <__morestack>
  402604:   c3                      retq
  402605:   55                      push   %rbp
  402606:   48 89 e5                mov    %rsp,%rbp
  402609:   f3 0f 10 05 23 01 00    movss  0x123(%rip),%xmm0        # 402734 <_IO_stdin_used+0x14>
  402610:   00
  402611:   48 c7 c0 00 36 65 c4    mov    $0xffffffffc4653600,%rax
  402618:   90                      nop
  402619:   90                      nop
  40261a:   90                      nop
  40261b:   90                      nop
  40261c:   90                      nop
  40261d:   90                      nop
  40261e:   90                      nop
  40261f:   90                      nop
  402620:   f3 0f 59 c0             mulss  %xmm0,%xmm0
  402624:   48 ff c0                inc    %rax
  402627:   75 f7                   jne    402620 <_ZN9doit_437216_38a5b7ada5228707_0$x2e0E+0x40>
  402629:   5d                      pop    %rbp
  40262a:   c3                      retq
  40262b:   90                      nop 

Produced asm for doit_not_generic is nop-free before the multiplication:

0000000000401380 <_ZN16doit_not_generic16_38a5b7ada5228707_0$x2e0E>:
  401380:   64 48 3b 24 25 70 00    cmp    %fs:0x70,%rsp
  401387:   00 00
  401389:   77 1a                   ja     4013a5 <_ZN16doit_not_generic16_38a5b7ada5228707_0$x2e0E+0x25>
  40138b:   49 ba 08 00 00 00 00    movabs $0x8,%r10
  401392:   00 00 00
  401395:   49 bb 00 00 00 00 00    movabs $0x0,%r11
  40139c:   00 00 00
  40139f:   e8 88 12 00 00          callq  40262c <__morestack>
  4013a4:   c3                      retq
  4013a5:   55                      push   %rbp
  4013a6:   48 89 e5                mov    %rsp,%rbp
  4013a9:   48 c7 c0 00 36 65 c4    mov    $0xffffffffc4653600,%rax
  4013b0:   f3 0f 59 c0             mulss  %xmm0,%xmm0
  4013b4:   48 ff c0                inc    %rax
  4013b7:   75 f7                   jne    4013b0 <_ZN16doit_not_generic16_38a5b7ada5228707_0$x2e0E+0x30>
  4013b9:   5d                      pop    %rbp
  4013ba:   c3                      retq
  4013bb:   90                      nop
  4013bc:   90                      nop
  4013bd:   90                      nop
  4013be:   90                      nop
  4013bf:   90                      nop
@bstrie
Copy link
Contributor

bstrie commented Aug 6, 2013

I also spy a movss that's present in the generic version, but not in the normal version.

@Florob
Copy link
Contributor

Florob commented Aug 9, 2013

I was curious about the amount of NOPs appearing in some code today, so I had a look around trying to deteremine where they originate. It turns out that the preferred alignment (on x86_64) for loop bodies is 16 Byte, so padding is introduced before loop bodies to ensure this. That is also what is happening here. The reason it's not aligned in the generic version is the additional movss. With current master I don't actually see that additional movss any more, so this is likely "fixed".
I do wonder why (unlike clang) we get a lot of 1 Byte NOPs instead of a multi-byte NOP though...

@thestinger
Copy link
Contributor

@Florob: I think it's because we're doing target info wrong. @alexcrichton has some work in-progress that may fix that.

I'm going to close this issue since I can't duplicate it on master.

@alexcrichton
Copy link
Member

@thestinger, are you sure you can't reproduce? If so, perhaps this is an OSX-specific problem because I was able to reproduce the extra nops on master.

Additionally, #8700 doesn't fix this :(

@alexcrichton alexcrichton reopened this Aug 23, 2013
@thestinger
Copy link
Contributor

What about -Z no-monomorphic-collapse?

@alexcrichton
Copy link
Member

I still see the nops :(

@Florob
Copy link
Contributor

Florob commented Aug 23, 2013

@alexcrichton What code are you using exactly (i.e. how and how often do you call doit())?
It seems to me that rustc is "clever" here and instantiates doit() not for f32 in general, but for the specific argument you call it with. This adds the additional movss to get that argument into xmm0. Once that movss is there the nops are expected for alignment.
If I call doit() twice, with different arguments, or compile using --lib the movss (and nops) vanishes for me.

@alexcrichton
Copy link
Member

Oh interesting, I using this code:

#[inline(never)]
fn doit_not_generic(a: f32) -> f32 {
    let mut a = a;
    do 1000000000.times {
        a = a * a;
    }

    a
}

#[inline(never)]
fn doit<N: Mul<N, N>>(a: N) -> N {
    let mut a = a;
    do 1000000000.times {
        a = a * a;
    }

    a
}


fn main() {
    assert!(doit_not_generic(2.0f32) == doit(2.0f32));
}

You are correct though that if I later call it with a different argument, the two codegens are the same. I'm a little surprised these aren't merged via the mergefunc pass, but that's for a later day!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants