Experiment: Remove #[rustc_box] usage in the vec![] macro #135068

Nadrieril · 2025-01-03T12:23:49Z

Discussion around #135046 suggested that this annotation may not be needed anymore. Note that the comment claims compile-time improvements, not run-time improvements, so I thought I'd do a perf run. The PR that had removed all other uses and added this comment is #108476.

r? @ghost

Nadrieril · 2025-01-03T13:34:28Z

@bors try @rust-timer queue

Experiment: Remove #[rustc_box] usage in the vec![] macro [Discussion](https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/.23.5Brustc_box.5D.20attribute) around rust-lang#135046 suggested that this annotation may not be needed anymore. Note that the comment claims compile-time improvements, not run-time improvements, so I thought I'd do a perf run. The PR that had removed all other uses and added this comment is rust-lang#108476. r? `@ghost`

bors · 2025-01-03T13:35:40Z

⌛ Trying commit 0c82a13 with merge 32ad697...

rust-log-analyzer · 2025-01-03T13:49:46Z

The job x86_64-gnu-llvm-18 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

#22 exporting to docker image format
#22 sending tarball 27.1s done
#22 DONE 32.8s
##[endgroup]
Setting extra environment values for docker:  --env ENABLE_GCC_CODEGEN=1 --env GCC_EXEC_PREFIX=/usr/lib/gcc/
[CI_JOB_NAME=x86_64-gnu-llvm-18]
debug: `DISABLE_CI_RUSTC_IF_INCOMPATIBLE` configured.
---
sccache: Starting the server...
##[group]Configure the build
configure: processing command line
configure: 
configure: build.configure-args := ['--build=x86_64-unknown-linux-gnu', '--llvm-root=/usr/lib/llvm-18', '--enable-llvm-link-shared', '--set', 'rust.randomize-layout=true', '--set', 'rust.thin-lto-import-instr-limit=10', '--enable-verbose-configure', '--enable-sccache', '--disable-manage-submodules', '--enable-locked-deps', '--enable-cargo-native-static', '--set', 'rust.codegen-units-std=1', '--set', 'dist.compression-profile=balanced', '--dist-compression-formats=xz', '--set', 'rust.lld=false', '--disable-dist-src', '--release-channel=nightly', '--enable-debug-assertions', '--enable-overflow-checks', '--enable-llvm-assertions', '--set', 'rust.verify-llvm-ir', '--set', 'rust.codegen-backends=llvm,cranelift,gcc', '--set', 'llvm.static-libstdcpp', '--enable-new-symbol-mangling']
configure: target.x86_64-unknown-linux-gnu.llvm-config := /usr/lib/llvm-18/bin/llvm-config
configure: llvm.link-shared     := True
configure: rust.randomize-layout := True
configure: rust.thin-lto-import-instr-limit := 10
---
  Downloaded boml v0.3.1
   Compiling boml v0.3.1
   Compiling y v0.1.0 (/checkout/compiler/rustc_codegen_gcc/build_system)
    Finished `release` profile [optimized] target(s) in 3.81s
     Running `/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-codegen/x86_64-unknown-linux-gnu/release/y test --use-system-gcc --use-backend gcc --out-dir /checkout/obj/build/x86_64-unknown-linux-gnu/stage1-tools/cg_gcc --release --mini-tests --std-tests`
Using system GCC
[BUILD] example
[AOT] mini_core_hello_world
/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-tools/cg_gcc/mini_core_hello_world
abc
---
---- /checkout/obj/build/x86_64-unknown-linux-gnu/test/error-index.md - Rust_Compiler_Error_Index::E0010 (line 224) stdout ----
error[E0015]: cannot call non-const associated function `Box::<[i32; 3]>::new` in constants
##[error] --> /checkout/obj/build/x86_64-unknown-linux-gnu/test/error-index.md:225:24
  |
3 | const CON : Vec<i32> = vec![1, 2, 3];
  |
  = note: calls in constants are limited to constant functions, tuple structs and tuple variants
  = note: this error originates in the macro `vec` (in Nightly builds, run with -Z macro-backtrace for more info)


error[E0015]: cannot call non-const method `slice::<impl [i32]>::into_vec::<std::alloc::Global>` in constants
##[error] --> /checkout/obj/build/x86_64-unknown-linux-gnu/test/error-index.md:225:24
  |
3 | const CON : Vec<i32> = vec![1, 2, 3];
  |
  = note: calls in constants are limited to constant functions, tuple structs and tuple variants
  = note: this error originates in the macro `vec` (in Nightly builds, run with -Z macro-backtrace for more info)


error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0015`.
Some expected error codes were not found: ["E0010"]
failures:
    /checkout/obj/build/x86_64-unknown-linux-gnu/test/error-index.md - Rust_Compiler_Error_Index::E0010 (line 224)

test result: FAILED. 1045 passed; 1 failed; 61 ignored; 0 measured; 0 filtered out; finished in 9.42s
test result: FAILED. 1045 passed; 1 failed; 61 ignored; 0 measured; 0 filtered out; finished in 9.42s



Command CFG_RELEASE_CHANNEL="nightly" RUSTC_BOOTSTRAP="1" RUSTC_STAGE="2" RUSTC_SYSROOT="/checkout/obj/build/x86_64-unknown-linux-gnu/stage2" RUSTDOC_LIBDIR="/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" RUSTDOC_REAL="/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustdoc" RUST_TEST_THREADS="16" "/checkout/obj/build/bootstrap/debug/rustdoc" "-Wrustdoc::invalid_codeblock_attributes" "-Dwarnings" "-Znormalize-docs" "-Z" "unstable-options" "--test" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/error-index.md" "--test-args" "" (failure_mode=DelayFail) has failed. Rerun with -v to see more details.
  local time: Fri Jan  3 13:49:38 UTC 2025
  network time: Fri, 03 Jan 2025 13:49:38 GMT
##[error]Process completed with exit code 1.
Post job cleanup.

bjorn3 · 2025-01-03T13:59:04Z

library/alloc/src/boxed.rs

+/// Constructs a `Box<T>` by calling the `exchange_malloc` lang item and moving the argument into
+/// the newly allocated memory. This is an intrinsic to avoid unnecessary copies.
+///
+/// This is the surface syntax for `box <expr>` expressions.


This comment is now wildly outdated. We haven't had exchange_malloc nor box <expr> for years.

That's from #135046 :D

It does call exchange_malloc though:

rust/compiler/rustc_mir_build/src/builder/expr/as_rvalue.rs

Line 146 in 319f529

let exchange_malloc = Operand::function_handle(

This intrinsic is how we write box <expr>, so in that sense it still exists.

And yeah, nobody ever bothered to rename exchange_malloc 😂

box <expr> doesn't exist as syntax in the language anymore: #108471

Box expressions still exist in the compiler, and this intrinsic are their syntax.

bors · 2025-01-03T15:24:35Z

☀️ Try build successful - checks-actions
Build commit: 32ad697 (32ad697ebeb90c17712d2e4d4d15065cc68b17c8)

rust-timer · 2025-01-03T17:16:12Z

Finished benchmarking commit (32ad697): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.4%, 0.5%]	2
Regressions ❌ (secondary)	34.7%	[0.2%, 92.4%]	10
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.1%]	1
Improvements ✅ (secondary)	-0.4%	[-0.6%, -0.3%]	2
All ❌✅ (primary)	0.3%	[-0.1%, 0.5%]	3

Max RSS (memory usage)

Results (primary -0.1%, secondary 22.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.2%	[1.1%, 6.6%]	7
Regressions ❌ (secondary)	28.2%	[3.4%, 80.7%]	14
Improvements ✅ (primary)	-4.8%	[-14.0%, -1.1%]	5
Improvements ✅ (secondary)	-3.8%	[-6.3%, -2.5%]	3
All ❌✅ (primary)	-0.1%	[-14.0%, 6.6%]	12

Cycles

Results (secondary 92.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	111.6%	[2.9%, 144.4%]	5
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.9%	[-1.9%, -1.9%]	1
All ❌✅ (primary)	-	-	0

Binary size

Results (primary 0.0%, secondary 0.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.1%, 0.2%]	15
Regressions ❌ (secondary)	1.3%	[0.1%, 2.5%]	11
Improvements ✅ (primary)	-0.1%	[-0.4%, -0.0%]	22
Improvements ✅ (secondary)	-0.1%	[-0.1%, -0.1%]	4
All ❌✅ (primary)	0.0%	[-0.4%, 0.2%]	37

Bootstrap: 764.353s -> 763.147s (-0.16%)
Artifact size: 325.61 MiB -> 325.57 MiB (-0.01%)

Nadrieril · 2025-01-03T17:49:49Z

deep_vector is one vec![] invocation with several thousand elements. I would quite like to ignore it, given that nothing else is affected.

jackh726 · 2025-01-03T17:54:03Z

Probably worth a codegen test if we do want to remove this. Given that the difference for deep-vector is in the backend portion, my hope is that it optimizes to the same thing but just takes more time.

jackh726 · 2025-01-03T17:58:14Z

(There may already be a codegen test.)

I personally am a big ambivalent on whether it makes sense to remove it though. On one hand, it does reduce the specialness, but it's not enough to remove the intrinsic altogether, so it seems like an obvious compile-time win to just use it.

Nadrieril · 2025-01-03T18:13:15Z

My hope was that this was indeed enough to remove the intrinsic (as well as thir::Expr::Box and Rvalue::ShallowInitBox). The only other use is Box::new and my understanding is that the intrinsic is only useful to avoid allocating a temporary, but in Box::new the value is already in a local so there's no need.

RalfJung · 2025-01-03T19:23:08Z

My hope was that this was indeed enough to remove the intrinsic (as well as thir::Expr::Box and Rvalue::ShallowInitBox). The only other use is Box::new and my understanding is that the intrinsic is only useful to avoid allocating a temporary, but in Box::new the value is already in a local so there's no need.

If that is the goal then the PR should also remove that other use and we can see if the intrinsic can truly be avoided.

That said, the perf run clearly shows that the comment in the code is still accurate. So I'm not convinced we actually want to remove the intrinsic in this state. Do we have any idea where all that extra compilation time is spent and whether we can avoid it?

saethlin · 2025-01-04T00:15:42Z

Probably worth a codegen test if we do want to remove this. Given that the difference for deep-vector is in the backend portion, my hope is that it optimizes to the same thing but just takes more time.

A codegen test of what though? The compile time regression that's reported above is to unoptimized builds. As far as I can tell, LLVM is indeed fixing up our mess. Just slowly (as is tradition).

Do we have any idea where all that extra compilation time is spent and whether we can avoid it?

Diffing the LLVM IR on nightly and on this PR, I see these changes to main, and no significant changes elsewhere:

 start:
+  %_4 = alloca [543864 x i8], align 4
   %_x = alloca [24 x i8], align 8
-; call alloc::alloc::exchange_malloc
-  %_5 = call ptr @_ZN5alloc5alloc15exchange_malloc17h373738c909339ef5E(i64 543864, i64 4)
-  %_9 = ptrtoint ptr %_5 to i64
-  %_12 = and i64 %_9, 3
-  %_13 = icmp eq i64 %_12, 0
-  br i1 %_13, label %bb4, label %panic
-
-bb4:                                              ; preds = %start

+; call alloc::boxed::Box<T>::new
+  %_3 = call align 4 ptr @"_ZN5alloc5boxed12Box$LT$T$GT$3new17h0eae96c7f92e5aa7E"(ptr align 4 %_4)
 ; call alloc::slice::<impl [T]>::into_vec
-  call void @"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$8into_vec17hf307cc899eb61d2eE"(ptr sret([24 x i8]) align 8 %_x, ptr
 align 4 %_5, i64 135966)
+  call void @"_ZN5alloc5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$8into_vec17hac0528ae86e71898E"(ptr sret([24 x i8]) align 8 %_x, ptr
 align 4 %_3, i64 135966)
 ; call core::ptr::drop_in_place<alloc::vec::Vec<i32>>
-  call void @"_ZN4core3ptr47drop_in_place$LT$alloc..vec..Vec$LT$i32$GT$$GT$17h6ba5100bfd6008bdE"(ptr align 8 %_x)
+  call void @"_ZN4core3ptr47drop_in_place$LT$alloc..vec..Vec$LT$i32$GT$$GT$17hc0b4d783589713aaE"(ptr align 8 %_x)
   ret void
-
-panic:                                            ; preds = %start
-; call core::panicking::panic_misaligned_pointer_dereference
-  call void @_ZN4core9panicking36panic_misaligned_pointer_dereference17h956de8dcbda95434E(i64 4, i64 %_9, ptr align 8 @alloc_4e
2288cd26dda91e67d51c42946506bb) #14
-  unreachable

I think opt builds are unaffected partly because GVN turns this MIR:

        (*_8) = [const 0_i32, const 0_i32, const 0_i32, ...

Into this:

         (*_8) = [const 0_i32; 135966];

which we hand off to LLVM as a memset call instead of a hundred thousand gepi+store.

Just glancing at perf stat, enabling GVN with this branch brings build times from 1900 ms to 720 ms, but on nightly from 680 ms to 480 ms.

So somehow I think there's a double-whammy of both having the massive alloca that this PR introduces and also having the element-by-element initialization code that's causing the slowdown.

But simply having the big alloca with this PR is a fair point against this PR; as it stands this change will increase the stack usage of some programs and probably make them stack overflow in dev builds where they didn't before. Though this doesn't extend to much more reasonable code like vec![0; 100_000] because that calls vec::from_elem and never has a big local array.

Nadrieril · 2025-01-05T14:39:58Z

More discussion happened on Zulip, overall this is not the easy win that I hoped it would be, closing.

RalfJung · 2025-01-07T11:48:00Z

Interestingly, #135046 alone already caused an RSS regression (but less than this PR). I don't understand why.

…rrors Add an InstSimplify for repetitive array expressions I noticed in rust-lang#135068 (comment) that GVN's implementation of this same transform was quite profitable on the deep-vector benchmark. But of course GVN doesn't run in unoptimized builds, so this is my attempt to write a version of this transform that benefits the deep-vector case and is fast enough to run in InstSimplify. The benchmark suite indicates that this is effective.

Add an InstSimplify for repetitive array expressions I noticed in rust-lang/rust#135068 (comment) that GVN's implementation of this same transform was quite profitable on the deep-vector benchmark. But of course GVN doesn't run in unoptimized builds, so this is my attempt to write a version of this transform that benefits the deep-vector case and is fast enough to run in InstSimplify. The benchmark suite indicates that this is effective.

safinaskar · 2025-02-06T00:15:39Z

@rustbot label A-box

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 3, 2025

Nadrieril force-pushed the remove-rustc-box branch 2 times, most recently from 16a3688 to 54e8f83 Compare January 3, 2025 12:57

RalfJung and others added 2 commits January 3, 2025 14:03

turn rustc_box into an intrinsic

ab78384

Remove #[rustc_box] usage in the vec![] macro

0c82a13

Nadrieril force-pushed the remove-rustc-box branch from 54e8f83 to 0c82a13 Compare January 3, 2025 13:10

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 3, 2025

bjorn3 reviewed Jan 3, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jan 3, 2025

Nadrieril closed this Jan 5, 2025

Kobzol mentioned this pull request Jan 7, 2025

turn rustc_box into an intrinsic #135046

Merged

saethlin mentioned this pull request Jan 9, 2025

Add an InstSimplify for repetitive array expressions #135274

Merged

rustbot added the A-box Area: Our favorite opsem complication label Feb 6, 2025

Nadrieril deleted the remove-rustc-box branch July 5, 2025 15:48

Experiment: Remove #[rustc_box] usage in the vec![] macro #135068

Experiment: Remove #[rustc_box] usage in the vec![] macro #135068

Uh oh!

Conversation

Nadrieril commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Nadrieril commented Jan 3, 2025

Uh oh!

This comment has been minimized.

bors commented Jan 3, 2025

Uh oh!

rust-log-analyzer commented Jan 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bors commented Jan 3, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Jan 3, 2025

Overall result: ❌✅ regressions and improvements - please read the text below

Uh oh!

Nadrieril commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jackh726 commented Jan 3, 2025

Uh oh!

jackh726 commented Jan 3, 2025

Uh oh!

Nadrieril commented Jan 3, 2025

Uh oh!

RalfJung commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saethlin commented Jan 4, 2025

Uh oh!

Nadrieril commented Jan 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RalfJung commented Jan 7, 2025

Uh oh!

safinaskar commented Feb 6, 2025

Uh oh!

Uh oh!

Nadrieril commented Jan 3, 2025 •

edited

Loading

RalfJung Jan 3, 2025 •

edited

Loading

Nadrieril commented Jan 3, 2025 •

edited

Loading

RalfJung commented Jan 3, 2025 •

edited

Loading

Nadrieril commented Jan 5, 2025 •

edited

Loading