Unnecessary local variable and memcpy not elided during struct initialization #77955

tgnottingham · 2020-10-14T23:11:32Z

Here is a case where a local variable isn't optimized away, which introduces an unnecessary copy:

struct S {
    item: [u64; 17],
}

pub fn f() {
    let item = [1u64; 17];  
    unsafe { ext(&S{item}) };
}

extern {
    fn ext(s: &S);
}

item is created on the stack, and then copied into the S temporary. But item is otherwise unused, so the compiler could just populate the temporary directly.

Assembly at -C opt-level=3:

.LCPI0_0:
        .quad   1
        .quad   1
example::f:
        pushq   %rbx
        subq    $272, %rsp
        movaps  .LCPI0_0(%rip), %xmm0
        movaps  %xmm0, (%rsp)
        movaps  %xmm0, 16(%rsp)
        movaps  %xmm0, 32(%rsp)
        movaps  %xmm0, 48(%rsp)
        movaps  %xmm0, 64(%rsp)
        movaps  %xmm0, 80(%rsp)
        movaps  %xmm0, 96(%rsp)
        movaps  %xmm0, 112(%rsp)
        movq    $1, 128(%rsp)
        leaq    136(%rsp), %rbx
        movq    %rsp, %rsi
        movl    $136, %edx
        movq    %rbx, %rdi
        callq   *memcpy@GOTPCREL(%rip)
        movq    %rbx, %rdi
        callq   *ext@GOTPCREL(%rip)
        addq    $272, %rsp
        popq    %rbx
        retq

The problem occurs if the array initializer is 1u64, but not if it's 0u64. With 1u64, the generated IR uses a loop to populate the array, whereas with 0u64, it doesn't. The issue is likely that LLVM is unable to see through the relative complexity of the loop to notice that item can be eliminated.

Here is the LLVM-IR before any optimization passes:

define void @_ZN3lib1f17ha3b7b123ae9e0a2fE() unnamed_addr #0 {
start:
  %item = alloca [17 x i64], align 8
  %_3 = alloca %S, align 8
  %0 = getelementptr inbounds [17 x i64], [17 x i64]* %item, i64 0, i64 0
  %1 = getelementptr inbounds [17 x i64], [17 x i64]* %item, i64 0, i64 0
  %2 = getelementptr inbounds [17 x i64], [17 x i64]* %item, i64 0, i64 17
  br label %repeat_loop_header

bb1:                                              ; preds = %repeat_loop_next
  %3 = bitcast %S* %_3 to i8*
  call void @llvm.lifetime.end.p0i8(i64 136, i8* %3)
  ret void

repeat_loop_header:                               ; preds = %repeat_loop_body, %start
  %4 = phi i64* [ %1, %start ], [ %6, %repeat_loop_body ]
  %5 = icmp ne i64* %4, %2
  br i1 %5, label %repeat_loop_body, label %repeat_loop_next

repeat_loop_body:                                 ; preds = %repeat_loop_header
  store i64 1, i64* %4, align 8
  %6 = getelementptr inbounds i64, i64* %4, i64 1
  br label %repeat_loop_header

repeat_loop_next:                                 ; preds = %repeat_loop_header
  %7 = bitcast %S* %_3 to i8*
  call void @llvm.lifetime.start.p0i8(i64 136, i8* %7)
  %8 = bitcast %S* %_3 to [17 x i64]*
  %9 = bitcast [17 x i64]* %8 to i8*
  %10 = bitcast [17 x i64]* %item to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %9, i8* align 8 %10, i64 136, i1 false)
  call void @ext(%S* noalias readonly align 8 dereferenceable(136) %_3)
  br label %bb1
}

If you replace the array with a struct with 17 u64 members, the problem also goes away. Again, this causes the IR to not use a loop to initialize item.

This was found while investigating #77613. I think the cause is slightly different, though I would guess it's the same general idea--LLVM can't see how to make some optimization through multiple basic blocks.

#45663 is also related, but may or may not have the same root cause.

Reproduces from 1.0.

The text was updated successfully, but these errors were encountered:

nikic · 2020-10-17T10:41:51Z

https://rust.godbolt.org/z/haxG5T Typical stores to alloca followed by memcpy pattern.

nikic · 2021-03-12T20:00:41Z

This no longer generates memcpy on nightly, due to the LLVM 12 update. The extra SROA after simple unrolling breaks it down.

tgnottingham added the C-bug Category: This is a bug. label Oct 14, 2020

nikic added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Oct 17, 2020

nikic closed this as completed Mar 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unnecessary local variable and memcpy not elided during struct initialization #77955

Unnecessary local variable and memcpy not elided during struct initialization #77955

tgnottingham commented Oct 14, 2020

nikic commented Oct 17, 2020

nikic commented Mar 12, 2021

Unnecessary local variable and memcpy not elided during struct initialization #77955

Unnecessary local variable and memcpy not elided during struct initialization #77955

Comments

tgnottingham commented Oct 14, 2020

nikic commented Oct 17, 2020

nikic commented Mar 12, 2021