-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment: Remove #[rustc_box] usage in the vec![] macro #135068
Conversation
16a3688
to
54e8f83
Compare
54e8f83
to
0c82a13
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Experiment: Remove #[rustc_box] usage in the vec![] macro [Discussion](https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/.23.5Brustc_box.5D.20attribute) around rust-lang#135046 suggested that this annotation may not be needed anymore. Note that the comment claims compile-time improvements, not run-time improvements, so I thought I'd do a perf run. The PR that had removed all other uses and added this comment is rust-lang#108476. r? `@ghost`
The job Click to see the possible cause of the failure (guessed by this bot)
|
/// Constructs a `Box<T>` by calling the `exchange_malloc` lang item and moving the argument into | ||
/// the newly allocated memory. This is an intrinsic to avoid unnecessary copies. | ||
/// | ||
/// This is the surface syntax for `box <expr>` expressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is now wildly outdated. We haven't had exchange_malloc
nor box <expr>
for years.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's from #135046 :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does call exchange_malloc
though:
let exchange_malloc = Operand::function_handle( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This intrinsic is how we write box <expr>
, so in that sense it still exists.
And yeah, nobody ever bothered to rename exchange_malloc
😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
box <expr>
doesn't exist as syntax in the language anymore: #108471
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Box expressions still exist in the compiler, and this intrinsic are their syntax.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (32ad697): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -0.1%, secondary 22.6%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (secondary 92.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary 0.0%, secondary 0.9%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 764.353s -> 763.147s (-0.16%) |
|
Probably worth a codegen test if we do want to remove this. Given that the difference for deep-vector is in the backend portion, my hope is that it optimizes to the same thing but just takes more time. |
(There may already be a codegen test.) I personally am a big ambivalent on whether it makes sense to remove it though. On one hand, it does reduce the specialness, but it's not enough to remove the intrinsic altogether, so it seems like an obvious compile-time win to just use it. |
My hope was that this was indeed enough to remove the intrinsic (as well as |
If that is the goal then the PR should also remove that other use and we can see if the intrinsic can truly be avoided. That said, the perf run clearly shows that the comment in the code is still accurate. So I'm not convinced we actually want to remove the intrinsic in this state. Do we have any idea where all that extra compilation time is spent and whether we can avoid it? |
A codegen test of what though? The compile time regression that's reported above is to unoptimized builds. As far as I can tell, LLVM is indeed fixing up our mess. Just slowly (as is tradition).
Diffing the LLVM IR on nightly and on this PR, I see these changes to start:
+ %_4 = alloca [543864 x i8], align 4
%_x = alloca [24 x i8], align 8
-; call alloc::alloc::exchange_malloc
- %_5 = call ptr @_ZN5alloc5alloc15exchange_malloc17h373738c909339ef5E(i64 543864, i64 4)
- %_9 = ptrtoint ptr %_5 to i64
- %_12 = and i64 %_9, 3
- %_13 = icmp eq i64 %_12, 0
- br i1 %_13, label %bb4, label %panic
-
-bb4: ; preds = %start +; call alloc::boxed::Box<T>::new
+ %_3 = call align 4 ptr @"_ZN5alloc5boxed12Box$LT$T$GT$3new17h0eae96c7f92e5aa7E"(ptr align 4 %_4)
; call alloc::slice::<impl [T]>::into_vec
- call void @"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$8into_vec17hf307cc899eb61d2eE"(ptr sret([24 x i8]) align 8 %_x, ptr
align 4 %_5, i64 135966)
+ call void @"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$8into_vec17hac0528ae86e71898E"(ptr sret([24 x i8]) align 8 %_x, ptr
align 4 %_3, i64 135966)
; call core::ptr::drop_in_place<alloc::vec::Vec<i32>>
- call void @"_ZN4core3ptr47drop_in_place$LT$alloc..vec..Vec$LT$i32$GT$$GT$17h6ba5100bfd6008bdE"(ptr align 8 %_x)
+ call void @"_ZN4core3ptr47drop_in_place$LT$alloc..vec..Vec$LT$i32$GT$$GT$17hc0b4d783589713aaE"(ptr align 8 %_x)
ret void
-
-panic: ; preds = %start
-; call core::panicking::panic_misaligned_pointer_dereference
- call void @_ZN4core9panicking36panic_misaligned_pointer_dereference17h956de8dcbda95434E(i64 4, i64 %_9, ptr align 8 @alloc_4e
2288cd26dda91e67d51c42946506bb) #14
- unreachable I think opt builds are unaffected partly because GVN turns this MIR:
Into this:
which we hand off to LLVM as a Just glancing at So somehow I think there's a double-whammy of both having the massive But simply having the big alloca with this PR is a fair point against this PR; as it stands this change will increase the stack usage of some programs and probably make them stack overflow in dev builds where they didn't before. Though this doesn't extend to much more reasonable code like |
More discussion happened on Zulip, overall this is not the easy win that I hoped it would be, closing. |
Interestingly, #135046 alone already caused an RSS regression (but less than this PR). I don't understand why. |
…rrors Add an InstSimplify for repetitive array expressions I noticed in rust-lang#135068 (comment) that GVN's implementation of this same transform was quite profitable on the deep-vector benchmark. But of course GVN doesn't run in unoptimized builds, so this is my attempt to write a version of this transform that benefits the deep-vector case and is fast enough to run in InstSimplify. The benchmark suite indicates that this is effective.
Discussion around #135046 suggested that this annotation may not be needed anymore. Note that the comment claims compile-time improvements, not run-time improvements, so I thought I'd do a perf run. The PR that had removed all other uses and added this comment is #108476.
r? @ghost