Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary memcpy during struct initialization #45663

Closed
jrmuizel opened this issue Oct 31, 2017 · 5 comments
Closed

Unnecessary memcpy during struct initialization #45663

jrmuizel opened this issue Oct 31, 2017 · 5 comments
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations A-mir-opt-nrvo Fixed by the Named Return Value Opt. (NRVO) C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@jrmuizel
Copy link
Contributor

pub struct Bar {
    l: u8,
    f: [u8; 200],
}

pub fn roo(f: &mut Bar) {
    f.f = noo();
}

#[inline(never)]
pub fn noo() -> [u8; 200] {
    [0; 200]
}

compiles to

example::roo:
        push    rbp
        mov     rbp, rsp
        push    r14
        push    rbx
        sub     rsp, 208
        mov     rbx, rdi
        lea     r14, [rbp - 216]
        mov     rdi, r14
        call    example::noo@PLT
        inc     rbx
        mov     edx, 200
        mov     rdi, rbx
        mov     rsi, r14
        call    memcpy@PLT
        add     rsp, 208
        pop     rbx
        pop     r14
        pop     rbp
        ret

example::noo:
        push    rbp
        mov     rbp, rsp
        xorps   xmm0, xmm0
        movups  xmmword ptr [rdi + 176], xmm0
        movups  xmmword ptr [rdi + 160], xmm0
        movups  xmmword ptr [rdi + 144], xmm0
        movups  xmmword ptr [rdi + 128], xmm0
        movups  xmmword ptr [rdi + 112], xmm0
        movups  xmmword ptr [rdi + 96], xmm0
        movups  xmmword ptr [rdi + 80], xmm0
        movups  xmmword ptr [rdi + 64], xmm0
        movups  xmmword ptr [rdi + 48], xmm0
        movups  xmmword ptr [rdi + 32], xmm0
        movups  xmmword ptr [rdi + 16], xmm0
        movups  xmmword ptr [rdi], xmm0
        mov     qword ptr [rdi + 192], 0
        mov     rax, rdi
        pop     rbp
        ret

If I drop 'l' field the copy goes away and I get:

example::roo:
        push    rbp
        mov     rbp, rsp
        call    example::noo@PLT
        pop     rbp
        ret

example::noo:
        push    rbp
        mov     rbp, rsp
        xorps   xmm0, xmm0
        movups  xmmword ptr [rdi + 176], xmm0
        movups  xmmword ptr [rdi + 160], xmm0
        movups  xmmword ptr [rdi + 144], xmm0
        movups  xmmword ptr [rdi + 128], xmm0
        movups  xmmword ptr [rdi + 112], xmm0
        movups  xmmword ptr [rdi + 96], xmm0
        movups  xmmword ptr [rdi + 80], xmm0
        movups  xmmword ptr [rdi + 64], xmm0
        movups  xmmword ptr [rdi + 48], xmm0
        movups  xmmword ptr [rdi + 32], xmm0
        movups  xmmword ptr [rdi + 16], xmm0
        movups  xmmword ptr [rdi], xmm0
        mov     qword ptr [rdi + 192], 0
        mov     rax, rdi
        pop     rbp
        ret

See also https://bugs.llvm.org/show_bug.cgi?id=35134

@kennytm kennytm added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Nov 9, 2017
@dotdash
Copy link
Contributor

dotdash commented Nov 29, 2017

I have a rough patch for the example in the LLVM bug report, but that is not equal to the rust code above. The above example would then still be blocked on #31681

@dotdash
Copy link
Contributor

dotdash commented Dec 1, 2017

@dotdash
Copy link
Contributor

dotdash commented Dec 1, 2017

That patch handles the original test case if -Zmutable-noalias is used. Turning roo into

fn roo() -> Bar {
    Bar {
        l: 4,
        f: noo(),
    }
}

is also handled without -Zmutable-noalias and gets rid of an extra memcpy that is currently still in there.

@nikic
Copy link
Contributor

nikic commented Mar 13, 2021

This optimizes well on nightly with -Zmutable-noalias, as a result of the LLVM 12 upgrade (call slot optimization with GEP dest now supported).

@jrmuizel
Copy link
Contributor Author

We have -Zmutable-noalias by default now so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations A-mir-opt-nrvo Fixed by the Named Return Value Opt. (NRVO) C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants