Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed optimization opportunity when trivially moving tuple of slices #107436

Closed
recatek opened this issue Jan 29, 2023 · 8 comments · Fixed by #115050
Closed

Missed optimization opportunity when trivially moving tuple of slices #107436

recatek opened this issue Jan 29, 2023 · 8 comments · Fixed by #115050
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@recatek
Copy link

recatek commented Jan 29, 2023

Example code:

#[repr(C)]
pub struct ThreeSlices<'a>(&'a [u32], &'a [u32], &'a [u32]);

#[inline(always)]
pub fn should_be_no_op(val: ThreeSlices) -> ThreeSlices {
    val
}

pub fn sum_slices_1(val: ThreeSlices) -> u32 {
    sum(&val)
}

pub fn sum_slices_2(val: ThreeSlices) -> u32 {
    let val = should_be_no_op(val);
    sum(&val)
}

#[inline(never)]
pub fn sum(val: &ThreeSlices) -> u32 {
    val.0.iter().sum::<u32>() + val.1.iter().sum::<u32>() + val.2.iter().sum::<u32>()
}

In rustc 1.67 stable this generates a number of moves (I suppose for calling convention?) that I don't think need to be there, especially when inlining:

example::sum_slices_1:
        jmp     qword ptr [rip + example::sum@GOTPCREL]

example::sum_slices_2:
        sub     rsp, 56
        movups  xmm0, xmmword ptr [rdi]
        movups  xmm1, xmmword ptr [rdi + 16]
        movups  xmm2, xmmword ptr [rdi + 32]
        movaps  xmmword ptr [rsp + 32], xmm2
        movaps  xmmword ptr [rsp + 16], xmm1
        movaps  xmmword ptr [rsp], xmm0
        mov     rdi, rsp
        call    qword ptr [rip + example::sum@GOTPCREL]
        add     rsp, 56
        ret

See https://rust.godbolt.org/z/azs11edK8

While this is a pretty pointless example, this comes up in situations where you might want to convert a tuple of slices into a struct of slices in order to assign names to the tuple members.

@recatek recatek added the C-bug Category: This is a bug. label Jan 29, 2023
@clubby789
Copy link
Contributor

clubby789 commented Jan 29, 2023

LLVM should probably be able to elide the below memcpy and pass the original pointer.

  %val1 = alloca %"ThreeSlices<'_>", align 8
  call void @llvm.lifetime.start.p0(i64 48, ptr nonnull %val1)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(48) %val1, ptr noundef nonnull align 8 dereferenceable(48) %val, i64 48, i1 false)
  %0 = call noundef i32 @example::sum(ptr noalias noundef nonnull readonly align 8 dereferenceable(48) %val1)
  call void @llvm.lifetime.end.p0(i64 48, ptr nonnull %val1)
  ret i32 %0

@rustbot label +A-llvm +I-slow

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jan 29, 2023
@Noratrieb Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
@khei4
Copy link
Contributor

khei4 commented Jul 22, 2023

This seems to be resolved on current nightly by #112157 & https://reviews.llvm.org/D150970 Thanks @erikdesjardins @nikic 🎉 (I'm not so sure about an incoming inherent problem on alignment attribute)

@nikic nikic added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Aug 21, 2023
@nikic
Copy link
Contributor

nikic commented Aug 21, 2023

Fixed as part of the LLVM 17 update (#114048), needs codegen test.

@khei4
Copy link
Contributor

khei4 commented Aug 21, 2023

Making the sum a little more vague than the above code prevents the removal of memcpy.

#[repr(C)]
pub struct ThreeSlices<'a>(&'a [u32], &'a [u32], &'a [u32]);

#[no_mangle]
pub fn sum_slices_2(val: ThreeSlices, f: fn(_: &ThreeSlices))  {
    let val = val;
    f(&val)
}

https://rust.godbolt.org/z/1hK4f1zGs
And I now think that preventing memcpy removal is the right, because the function may capture the argument. But rustc seems to annotate the original argument with nocapture.

define void @sum_slices_2(ptr noalias nocapture noundef readonly align 8 dereferenceable(48) %val, ptr nocapture noundef nonnull readonly %f) unnamed_addr #0 {
  %val1 = alloca %"ThreeSlices<'_>", align 8
  call void @llvm.lifetime.start.p0(i64 48, ptr nonnull %val1)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(48) %val1, ptr noundef nonnull align 8 dereferenceable(48) %val, i64 48, i1 false)
  call void %f(ptr noalias noundef nonnull readonly align 8 dereferenceable(48) %val1)
  call void @llvm.lifetime.end.p0(i64 48, ptr nonnull %val1)
  ret void
}

Is this parameter nocapture attribution broken? Shared reference could be captured by closure generally, and nocapture attributes with argument guarantee that no captures happen during the function execution, right?

Edit: It seems like not so, precisely, the above argument is a function pointer, not a closure, and I now think function pointers never capture any variables (because of no-capture closure only coerced to function pointers?, fn doc). So is this another missing opportunity?

@nikic
Copy link
Contributor

nikic commented Aug 21, 2023

And I now think that preventing memcpy removal is the right, because the function may capture the argument.

Yes, the removal is only allowed in the original example, because the called function is known. It does not work with an unknown function due to missing nocapture.

But rustc seems to annotate the original argument with nocapture

Note that the nocapture is on the by-value argument of sum_slices_2, not the by-reference argument of the function pointer f.

@khei4
Copy link
Contributor

khei4 commented Aug 21, 2023

Note that the nocapture is on the by-value argument of sum_slices_2, not the by-reference argument of the function pointer f.

That sounds reasonable. Thanks! So, in this case, if the assumption that the function pointer never captures the argument was correct, missing attribution opportunities for the callbase?

@nikic
Copy link
Contributor

nikic commented Aug 21, 2023

I don't think there is any missing attribution opportunity here. In your example, if %val is nocapture, then that does not imply anything about %val1 being nocapture (it's a completely different object).

Even if you directly passed through the argument to the call (without intermediate memcpy to alloca), inferring nocapture is still not straightforward, because nocapture allows interior captures (i.e. it's possible to capture the pointer within the function, as long as it does not stay captured after the function returns).

@khei4
Copy link
Contributor

khei4 commented Aug 21, 2023

Even if you directly passed through the argument to the call (without intermediate memcpy to alloca), inferring nocapture is still not straightforward, because nocapture allows interior captures (i.e. it's possible to capture the pointer within the function, as long as it does not stay captured after the function returns).

Oh, I missed that fact. I think for general pointers, it sounds correct, but I'm still wondering whether rust's safe function pointer fn semantics can deduce the nocapture for all callee arguments.

(Edit: Now I think that on this case, capturing in LLVM might happen. Thanks!

khei4 added a commit to khei4/rust that referenced this issue Aug 26, 2023
remove trailing whitespace, add trailing newline

fix llvm version and function name
@bors bors closed this as completed in 9f48a85 Aug 28, 2023
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Aug 29, 2023
…nikic

add codegen test for the move before passing to nocapture, by shared-ref arg

This PR adds codegen test for rust-lang/rust#107436 (comment) (It seems like this works from llvm-16?)

Fixes #107436
lnicola pushed a commit to lnicola/rust-analyzer that referenced this issue Apr 7, 2024
…nikic

add codegen test for the move before passing to nocapture, by shared-ref arg

This PR adds codegen test for rust-lang/rust#107436 (comment) (It seems like this works from llvm-16?)

Fixes #107436
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 27, 2024
…nikic

add codegen test for the move before passing to nocapture, by shared-ref arg

This PR adds codegen test for rust-lang/rust#107436 (comment) (It seems like this works from llvm-16?)

Fixes #107436
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants