specialize `slice::fill` to use memset #147294

the8472 · 2025-10-03T11:48:50Z

This was attempted previously in #83245 but had to be reverted #87891 because the intrinsic didn't support uninitialized values.

This is my first attempt at modifying intrinsics and const-eval, so I have no clue if I'm doing this correctly.

It should also help const eval performance: rust-lang/miri#4616

rustbot · 2025-10-03T11:48:53Z

Some changes occurred to the CTFE machinery

cc @RalfJung, @oli-obk, @lcnr

Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter
gets adapted for the changes, if necessary.

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

rustbot · 2025-10-03T11:48:56Z

r? @joboet

rustbot has assigned @joboet.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

RalfJung · 2025-10-03T12:26:31Z

library/core/src/intrinsics/mod.rs

 #[rustc_nounwind]
 #[rustc_intrinsic]
-pub const unsafe fn write_bytes<T>(dst: *mut T, val: u8, count: usize);
+pub const unsafe fn write_bytes<T, B>(dst: *mut T, val: B, count: usize);


This does have a small chance of breaking stable code as the intrinsic is (accidentally) exposed on stable. Such code would have seen deprecation warnings since Rust 1.86.

RalfJung · 2025-10-03T12:36:52Z

library/core/src/slice/specialize.rs

+macro spec_fill_int {
+    ($($type:ty)*) => {$(
+        impl SpecFill<$type> for [$type] {
+            #[inline]
+            fn spec_fill(&mut self, value: $type) {
+                if crate::intrinsics::is_val_statically_known(value) {
+                    let bytes = value.to_ne_bytes();
+                    if value == <$type>::from_ne_bytes([bytes[0]; size_of::<$type>()]) {
+                        // SAFETY: The pointer is derived from a reference, so it's writable.
+                        unsafe {
+                            crate::intrinsics::write_bytes(self.as_mut_ptr(), bytes[0], self.len());
+                        }
+                        return;
+                    }
+                }
+                for item in self.iter_mut() {
+                    *item = value;
+                }
+            }
+        }
+    )*}
+}
+
+spec_fill_int! { u16 i16 u32 i32 u64 i64 u128 i128 usize isize }


Couldn't this approach (for i8/u8) suffice to fix the Miri performance issue, without any changes to the intrinsic?

Sure that'd work for the particular reported case but not for newtypes. slice::fill is generic, so it's nice if it can handle anything that's has a scalar abi, but we can't detect "scalar but always-initialized" in the library, so it has to be more general.

That sounds like the motivation you have in mind goes beyond what is spelled out in the PR description.

(You link to #87891 but not the original PR that landed the change in the first place. So without digging through the history of this code it's not clear what you want this to do and why, apart from the Miri perf issue that you linked. Would be nice to make all that context easily accessible from the PR description.)

Found the original PR at #83245, but that also doesn't say anything, so I guess it's mostly "because we can".

🤷 The extended intrinsic isn't pretty but it's not terrible either. So if t-libs says this is worth a special case in the compiler that's fine.

Previously it only covered u8, i8 and bool. That leaves out other cases where people may want to initialize large chunks of AtomicU8, MaybeUninit<u8> or custom newtypes.
It's some amount of "because we can" and some "avoid performance cliffs where approach A gets optimized and B doesn't".

compiler/rustc_const_eval/src/interpret/intrinsics.rs

RalfJung · 2025-10-03T12:39:01Z

library/core/src/intrinsics/mod.rs

 #[rustc_nounwind]
 #[rustc_intrinsic]
-pub const unsafe fn write_bytes<T>(dst: *mut T, val: u8, count: usize);
+pub const unsafe fn write_bytes<T, B>(dst: *mut T, val: B, count: usize);


Whether we can allow this for arbitrary 1-sized types depends on whether LLVM intends to allow memset on arbitrary bytes. @nikic in a future where LLVM has a byte type or something else that has a size of 8 bits but can hold non-integral things such as provenance or poison/undef, do you expect memset will work on such values? It's argument type might have to change for that...

I would not expect the memset argument type to change, but there is a separate memset.pattern intrinsic which accepts an arbitrary argument type and could be used for that purpose. It's not ready for general usage yet though. And there was some discussion about generalizing memset to effectively become memset.pattern in the future.

Okay, so by landing this we'd be making the bet that

memset currently works fine with arbitrary bytes in an i8 (the same bet we're already making when compiling MaybeUninit<u8> to LLVM's i8)

if that ever becomes a problem, memset.pattern will be a viable alternative

Why does it matter what LLVM does in the future? We can just adapt our toolchain, right?

LLVM currently has no explicitly documented way to memset with a byte that is uninit or contains provenance. So there's a risk that this might just not be something LLVM can even express in the future.

If LLVM were to require that the byte for memset must be initialized, couldn't we just change the lowering of this intrinsic to not call memset, and do something else, like a loop?

Sure, we could provide our own implementation. That doesn't sound great for performance.

We can also revert the PR / go with the libs-only approach then, as long as we keep the intrinsic private and don't publicly expose its new capabilities.

Yes it wouldn't be great for performance but that's a problem we could also fix, and in any case I don't think it's useful to think about what internal APIs we should write on the basis of "maybe in the future LLVM decides to be bad at optimizing this".

If this discussion is being driven by the fact that this intrinsic was accidentally exposed, I wonder what libs-api would have to say about that.

This subthread is just about the LLVM concerns, since we're reaching slightly deeper into an area that is poorly defined there, which we should only do deliberately IMO. Now that we have deliberated I'm okay with proceeding.

The discussion for the signature change should happen here.

rustbot · 2025-10-03T13:42:59Z

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

compiler/rustc_const_eval/src/interpret/intrinsics.rs

This lets us use it for any T since T may contain uninitialized bytes.

LLVM generally can do this on its own, but it helps miri and other backends.

rustbot · 2025-10-03T15:55:58Z

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

the8472 · 2025-10-03T17:17:34Z

@bors try @rust-timer queue

specialize `slice::fill` to use memset

rust-bors · 2025-10-03T19:33:27Z

☀️ Try build successful (CI)
Build commit: 51de2e9 (51de2e90f91fdec6cf64ff542a4e15b9c177a797, parent: 8b6b15b877fbceb1ee5d9a5a4746e7515901574a)

rust-timer · 2025-10-03T20:43:16Z

Finished benchmarking commit (51de2e9): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.7%	[-1.0%, -0.1%]	8
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results (primary 0.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	7.6%	[7.6%, 7.6%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-5.8%	[-5.8%, -5.8%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.9%	[-5.8%, 7.6%]	2

Cycles

Results (secondary 1.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.6%	[3.3%, 4.0%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.2%	[-2.2%, -2.2%]	1
All ❌✅ (primary)	-	-	0

Binary size

Results (primary -0.0%, secondary -0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.2%, 0.2%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.0%]	7
Improvements ✅ (secondary)	-0.0%	[-0.0%, -0.0%]	12
All ❌✅ (primary)	-0.0%	[-0.1%, 0.2%]	8

Bootstrap: 471.778s -> 472.471s (0.15%)
Artifact size: 388.26 MiB -> 388.31 MiB (0.01%)

RalfJung · 2025-10-05T16:07:55Z

@craterbot check
to see if anything is broken by the intrinsic signature change

(We could work around that by making intrinsics::write_bytes not be the actual intrinsic but I'd rather avoid that if we can get away with changing the signature.)

craterbot · 2025-10-05T16:08:09Z

👌 Experiment pr-147294 created and queued.
🤖 Automatically detected try build 51de2e9
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

craterbot · 2025-10-05T16:09:23Z

🚧 Experiment pr-147294 is now running

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 3, 2025

rustbot assigned joboet Oct 3, 2025

This comment has been minimized.

Sign in to view

the8472 force-pushed the slice_fill_memset branch from 21525aa to e299ff9 Compare October 3, 2025 12:01

This comment has been minimized.

Sign in to view

RalfJung reviewed Oct 3, 2025

View reviewed changes

the8472 force-pushed the slice_fill_memset branch from e299ff9 to 54f0913 Compare October 3, 2025 13:42

This comment has been minimized.

Sign in to view

the8472 force-pushed the slice_fill_memset branch 2 times, most recently from f4791af to 19c5b08 Compare October 3, 2025 14:14

RalfJung reviewed Oct 3, 2025

View reviewed changes

compiler/rustc_const_eval/src/interpret/intrinsics.rs Outdated Show resolved Hide resolved

the8472 added 2 commits October 3, 2025 16:35

generalize intrinsics::write_bytes to take any 1-byte wide type as value

b7cae19

This lets us use it for any T since T may contain uninitialized bytes.

specialize slice::fill to use memset when possible

115aeee

LLVM generally can do this on its own, but it helps miri and other backends.

the8472 force-pushed the slice_fill_memset branch from 19c5b08 to 115aeee Compare October 3, 2025 14:35

This comment has been minimized.

Sign in to view

the8472 added 2 commits October 3, 2025 17:55

update cranelift to match new write_bytes signature

ebc94fb

update gcc codegen to match new write_bytes signature

cea7f2a

This comment has been minimized.

Sign in to view

rust-bors bot added a commit that referenced this pull request Oct 3, 2025

Auto merge of #147294 - the8472:slice_fill_memset, r=<try>

51de2e9

specialize `slice::fill` to use memset

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 3, 2025

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 3, 2025

craterbot added S-waiting-on-crater Status: Waiting on a crater run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 5, 2025

specialize slice::fill to use memset #147294

Are you sure you want to change the base?

specialize slice::fill to use memset #147294

Conversation

the8472 commented Oct 3, 2025 • edited by RalfJung Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Oct 3, 2025

Uh oh!

rustbot commented Oct 3, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

the8472 Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rustbot commented Oct 3, 2025

Uh oh!

This comment has been minimized.

Uh oh!

This comment has been minimized.

rustbot commented Oct 3, 2025

Uh oh!

the8472 commented Oct 3, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors bot commented Oct 3, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Oct 3, 2025

Overall result: ✅ improvements - no action needed

Uh oh!

RalfJung commented Oct 5, 2025

Uh oh!

craterbot commented Oct 5, 2025

Uh oh!

craterbot commented Oct 5, 2025

Uh oh!

Uh oh!

specialize `slice::fill` to use memset #147294

specialize `slice::fill` to use memset #147294

the8472 commented Oct 3, 2025 •

edited by RalfJung

Loading

RalfJung Oct 5, 2025 •

edited

Loading

the8472 Oct 5, 2025 •

edited

Loading