Skip to content

Conversation

the8472
Copy link
Member

@the8472 the8472 commented Oct 3, 2025

This was attempted previously in #83245 but had to be reverted #87891 because the intrinsic didn't support uninitialized values.

This is my first attempt at modifying intrinsics and const-eval, so I have no clue if I'm doing this correctly.

It should also help const eval performance: rust-lang/miri#4616

@rustbot
Copy link
Collaborator

rustbot commented Oct 3, 2025

Some changes occurred to the CTFE machinery

cc @RalfJung, @oli-obk, @lcnr

Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter
gets adapted for the changes, if necessary.

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 3, 2025
@rustbot
Copy link
Collaborator

rustbot commented Oct 3, 2025

r? @joboet

rustbot has assigned @joboet.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rust-log-analyzer

This comment has been minimized.

@the8472 the8472 force-pushed the slice_fill_memset branch from 21525aa to e299ff9 Compare October 3, 2025 12:01
@rust-log-analyzer

This comment has been minimized.

#[rustc_nounwind]
#[rustc_intrinsic]
pub const unsafe fn write_bytes<T>(dst: *mut T, val: u8, count: usize);
pub const unsafe fn write_bytes<T, B>(dst: *mut T, val: B, count: usize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does have a small chance of breaking stable code as the intrinsic is (accidentally) exposed on stable. Such code would have seen deprecation warnings since Rust 1.86.

Comment on lines +34 to +57
macro spec_fill_int {
($($type:ty)*) => {$(
impl SpecFill<$type> for [$type] {
#[inline]
fn spec_fill(&mut self, value: $type) {
if crate::intrinsics::is_val_statically_known(value) {
let bytes = value.to_ne_bytes();
if value == <$type>::from_ne_bytes([bytes[0]; size_of::<$type>()]) {
// SAFETY: The pointer is derived from a reference, so it's writable.
unsafe {
crate::intrinsics::write_bytes(self.as_mut_ptr(), bytes[0], self.len());
}
return;
}
}
for item in self.iter_mut() {
*item = value;
}
}
}
)*}
}

spec_fill_int! { u16 i16 u32 i32 u64 i64 u128 i128 usize isize }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this approach (for i8/u8) suffice to fix the Miri performance issue, without any changes to the intrinsic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that'd work for the particular reported case but not for newtypes. slice::fill is generic, so it's nice if it can handle anything that's has a scalar abi, but we can't detect "scalar but always-initialized" in the library, so it has to be more general.

Copy link
Member

@RalfJung RalfJung Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like the motivation you have in mind goes beyond what is spelled out in the PR description.

(You link to #87891 but not the original PR that landed the change in the first place. So without digging through the history of this code it's not clear what you want this to do and why, apart from the Miri perf issue that you linked. Would be nice to make all that context easily accessible from the PR description.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the original PR at #83245, but that also doesn't say anything, so I guess it's mostly "because we can".

🤷 The extended intrinsic isn't pretty but it's not terrible either. So if t-libs says this is worth a special case in the compiler that's fine.

Copy link
Member Author

@the8472 the8472 Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously it only covered u8, i8 and bool. That leaves out other cases where people may want to initialize large chunks of AtomicU8, MaybeUninit<u8> or custom newtypes.
It's some amount of "because we can" and some "avoid performance cliffs where approach A gets optimized and B doesn't".

#[rustc_nounwind]
#[rustc_intrinsic]
pub const unsafe fn write_bytes<T>(dst: *mut T, val: u8, count: usize);
pub const unsafe fn write_bytes<T, B>(dst: *mut T, val: B, count: usize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether we can allow this for arbitrary 1-sized types depends on whether LLVM intends to allow memset on arbitrary bytes. @nikic in a future where LLVM has a byte type or something else that has a size of 8 bits but can hold non-integral things such as provenance or poison/undef, do you expect memset will work on such values? It's argument type might have to change for that...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not expect the memset argument type to change, but there is a separate memset.pattern intrinsic which accepts an arbitrary argument type and could be used for that purpose. It's not ready for general usage yet though. And there was some discussion about generalizing memset to effectively become memset.pattern in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so by landing this we'd be making the bet that

  • memset currently works fine with arbitrary bytes in an i8 (the same bet we're already making when compiling MaybeUninit<u8> to LLVM's i8)
  • if that ever becomes a problem, memset.pattern will be a viable alternative

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it matter what LLVM does in the future? We can just adapt our toolchain, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM currently has no explicitly documented way to memset with a byte that is uninit or contains provenance. So there's a risk that this might just not be something LLVM can even express in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If LLVM were to require that the byte for memset must be initialized, couldn't we just change the lowering of this intrinsic to not call memset, and do something else, like a loop?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we could provide our own implementation. That doesn't sound great for performance.

We can also revert the PR / go with the libs-only approach then, as long as we keep the intrinsic private and don't publicly expose its new capabilities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it wouldn't be great for performance but that's a problem we could also fix, and in any case I don't think it's useful to think about what internal APIs we should write on the basis of "maybe in the future LLVM decides to be bad at optimizing this".

If this discussion is being driven by the fact that this intrinsic was accidentally exposed, I wonder what libs-api would have to say about that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This subthread is just about the LLVM concerns, since we're reaching slightly deeper into an area that is poorly defined there, which we should only do deliberately IMO. Now that we have deliberated I'm okay with proceeding.

The discussion for the signature change should happen here.

@the8472 the8472 force-pushed the slice_fill_memset branch from e299ff9 to 54f0913 Compare October 3, 2025 13:42
@rustbot
Copy link
Collaborator

rustbot commented Oct 3, 2025

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

@rust-log-analyzer

This comment has been minimized.

@the8472 the8472 force-pushed the slice_fill_memset branch 2 times, most recently from f4791af to 19c5b08 Compare October 3, 2025 14:14
This lets us use it for any T since T may contain uninitialized bytes.
LLVM generally can do this on its own, but it helps miri and other backends.
@the8472 the8472 force-pushed the slice_fill_memset branch from 19c5b08 to 115aeee Compare October 3, 2025 14:35
@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Oct 3, 2025

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

@the8472
Copy link
Member Author

the8472 commented Oct 3, 2025

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Oct 3, 2025
specialize `slice::fill` to use memset
@rust-bors

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 3, 2025
@rust-bors
Copy link

rust-bors bot commented Oct 3, 2025

☀️ Try build successful (CI)
Build commit: 51de2e9 (51de2e90f91fdec6cf64ff542a4e15b9c177a797, parent: 8b6b15b877fbceb1ee5d9a5a4746e7515901574a)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (51de2e9): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.7% [-1.0%, -0.1%] 8
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (primary 0.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
7.6% [7.6%, 7.6%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-5.8% [-5.8%, -5.8%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.9% [-5.8%, 7.6%] 2

Cycles

Results (secondary 1.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.6% [3.3%, 4.0%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.2% [-2.2%, -2.2%] 1
All ❌✅ (primary) - - 0

Binary size

Results (primary -0.0%, secondary -0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.2% [0.2%, 0.2%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.1% [-0.1%, -0.0%] 7
Improvements ✅
(secondary)
-0.0% [-0.0%, -0.0%] 12
All ❌✅ (primary) -0.0% [-0.1%, 0.2%] 8

Bootstrap: 471.778s -> 472.471s (0.15%)
Artifact size: 388.26 MiB -> 388.31 MiB (0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 3, 2025
@RalfJung
Copy link
Member

RalfJung commented Oct 5, 2025

@craterbot check
to see if anything is broken by the intrinsic signature change

(We could work around that by making intrinsics::write_bytes not be the actual intrinsic but I'd rather avoid that if we can get away with changing the signature.)

@craterbot
Copy link
Collaborator

👌 Experiment pr-147294 created and queued.
🤖 Automatically detected try build 51de2e9
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-crater Status: Waiting on a crater run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 5, 2025
@craterbot
Copy link
Collaborator

🚧 Experiment pr-147294 is now running

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-crater Status: Waiting on a crater run to be completed. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants