-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missed optimization opportunity when trivially moving tuple of slices #107436
Comments
LLVM should probably be able to elide the below %val1 = alloca %"ThreeSlices<'_>", align 8
call void @llvm.lifetime.start.p0(i64 48, ptr nonnull %val1)
call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(48) %val1, ptr noundef nonnull align 8 dereferenceable(48) %val, i64 48, i1 false)
%0 = call noundef i32 @example::sum(ptr noalias noundef nonnull readonly align 8 dereferenceable(48) %val1)
call void @llvm.lifetime.end.p0(i64 48, ptr nonnull %val1)
ret i32 %0 @rustbot label +A-llvm +I-slow |
This seems to be resolved on current nightly by #112157 & https://reviews.llvm.org/D150970 Thanks @erikdesjardins @nikic 🎉 (I'm not so sure about an incoming inherent problem on alignment attribute) |
Fixed as part of the LLVM 17 update (#114048), needs codegen test. |
Making the sum a little more vague than the above code prevents the removal of memcpy. #[repr(C)]
pub struct ThreeSlices<'a>(&'a [u32], &'a [u32], &'a [u32]);
#[no_mangle]
pub fn sum_slices_2(val: ThreeSlices, f: fn(_: &ThreeSlices)) {
let val = val;
f(&val)
} https://rust.godbolt.org/z/1hK4f1zGs define void @sum_slices_2(ptr noalias nocapture noundef readonly align 8 dereferenceable(48) %val, ptr nocapture noundef nonnull readonly %f) unnamed_addr #0 {
%val1 = alloca %"ThreeSlices<'_>", align 8
call void @llvm.lifetime.start.p0(i64 48, ptr nonnull %val1)
call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(48) %val1, ptr noundef nonnull align 8 dereferenceable(48) %val, i64 48, i1 false)
call void %f(ptr noalias noundef nonnull readonly align 8 dereferenceable(48) %val1)
call void @llvm.lifetime.end.p0(i64 48, ptr nonnull %val1)
ret void
}
Edit: It seems like not so, precisely, the above argument is a function pointer, not a closure, and I now think function pointers never capture any variables (because of no-capture closure only coerced to function pointers?, fn doc). So is this another missing opportunity? |
Yes, the removal is only allowed in the original example, because the called function is known. It does not work with an unknown function due to missing
Note that the |
That sounds reasonable. Thanks! So, in this case, if the assumption that the function pointer never captures the argument was correct, missing attribution opportunities for the callbase? |
I don't think there is any missing attribution opportunity here. In your example, if Even if you directly passed through the argument to the call (without intermediate memcpy to alloca), inferring nocapture is still not straightforward, because |
(Edit: Now I think that on this case, capturing in LLVM might happen. Thanks! |
remove trailing whitespace, add trailing newline fix llvm version and function name
…nikic add codegen test for the move before passing to nocapture, by shared-ref arg This PR adds codegen test for rust-lang/rust#107436 (comment) (It seems like this works from llvm-16?) Fixes #107436
…nikic add codegen test for the move before passing to nocapture, by shared-ref arg This PR adds codegen test for rust-lang/rust#107436 (comment) (It seems like this works from llvm-16?) Fixes #107436
…nikic add codegen test for the move before passing to nocapture, by shared-ref arg This PR adds codegen test for rust-lang/rust#107436 (comment) (It seems like this works from llvm-16?) Fixes #107436
Example code:
In rustc 1.67 stable this generates a number of moves (I suppose for calling convention?) that I don't think need to be there, especially when inlining:
See https://rust.godbolt.org/z/azs11edK8
While this is a pretty pointless example, this comes up in situations where you might want to convert a tuple of slices into a struct of slices in order to assign names to the tuple members.
The text was updated successfully, but these errors were encountered: