-
Notifications
You must be signed in to change notification settings - Fork 13.8k
specialize slice::fill
to use memset
#147294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
21525aa
to
e299ff9
Compare
This comment has been minimized.
This comment has been minimized.
#[rustc_nounwind] | ||
#[rustc_intrinsic] | ||
pub const unsafe fn write_bytes<T>(dst: *mut T, val: u8, count: usize); | ||
pub const unsafe fn write_bytes<T, B>(dst: *mut T, val: B, count: usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does have a small chance of breaking stable code as the intrinsic is (accidentally) exposed on stable. Such code would have seen deprecation warnings since Rust 1.86.
macro spec_fill_int { | ||
($($type:ty)*) => {$( | ||
impl SpecFill<$type> for [$type] { | ||
#[inline] | ||
fn spec_fill(&mut self, value: $type) { | ||
if crate::intrinsics::is_val_statically_known(value) { | ||
let bytes = value.to_ne_bytes(); | ||
if value == <$type>::from_ne_bytes([bytes[0]; size_of::<$type>()]) { | ||
// SAFETY: The pointer is derived from a reference, so it's writable. | ||
unsafe { | ||
crate::intrinsics::write_bytes(self.as_mut_ptr(), bytes[0], self.len()); | ||
} | ||
return; | ||
} | ||
} | ||
for item in self.iter_mut() { | ||
*item = value; | ||
} | ||
} | ||
} | ||
)*} | ||
} | ||
|
||
spec_fill_int! { u16 i16 u32 i32 u64 i64 u128 i128 usize isize } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this approach (for i8
/u8
) suffice to fix the Miri performance issue, without any changes to the intrinsic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that'd work for the particular reported case but not for newtypes. slice::fill is generic, so it's nice if it can handle anything that's has a scalar abi, but we can't detect "scalar but always-initialized" in the library, so it has to be more general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like the motivation you have in mind goes beyond what is spelled out in the PR description.
(You link to #87891 but not the original PR that landed the change in the first place. So without digging through the history of this code it's not clear what you want this to do and why, apart from the Miri perf issue that you linked. Would be nice to make all that context easily accessible from the PR description.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found the original PR at #83245, but that also doesn't say anything, so I guess it's mostly "because we can".
🤷 The extended intrinsic isn't pretty but it's not terrible either. So if t-libs says this is worth a special case in the compiler that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously it only covered u8, i8 and bool. That leaves out other cases where people may want to initialize large chunks of AtomicU8, MaybeUninit<u8>
or custom newtypes.
It's some amount of "because we can" and some "avoid performance cliffs where approach A gets optimized and B doesn't".
#[rustc_nounwind] | ||
#[rustc_intrinsic] | ||
pub const unsafe fn write_bytes<T>(dst: *mut T, val: u8, count: usize); | ||
pub const unsafe fn write_bytes<T, B>(dst: *mut T, val: B, count: usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether we can allow this for arbitrary 1-sized types depends on whether LLVM intends to allow memset on arbitrary bytes. @nikic in a future where LLVM has a byte type or something else that has a size of 8 bits but can hold non-integral things such as provenance or poison/undef, do you expect memset will work on such values? It's argument type might have to change for that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not expect the memset argument type to change, but there is a separate memset.pattern intrinsic which accepts an arbitrary argument type and could be used for that purpose. It's not ready for general usage yet though. And there was some discussion about generalizing memset to effectively become memset.pattern in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so by landing this we'd be making the bet that
- memset currently works fine with arbitrary bytes in an i8 (the same bet we're already making when compiling
MaybeUninit<u8>
to LLVM'si8
) - if that ever becomes a problem,
memset.pattern
will be a viable alternative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it matter what LLVM does in the future? We can just adapt our toolchain, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM currently has no explicitly documented way to memset
with a byte that is uninit or contains provenance. So there's a risk that this might just not be something LLVM can even express in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If LLVM were to require that the byte for memset must be initialized, couldn't we just change the lowering of this intrinsic to not call memset, and do something else, like a loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we could provide our own implementation. That doesn't sound great for performance.
We can also revert the PR / go with the libs-only approach then, as long as we keep the intrinsic private and don't publicly expose its new capabilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it wouldn't be great for performance but that's a problem we could also fix, and in any case I don't think it's useful to think about what internal APIs we should write on the basis of "maybe in the future LLVM decides to be bad at optimizing this".
If this discussion is being driven by the fact that this intrinsic was accidentally exposed, I wonder what libs-api would have to say about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This subthread is just about the LLVM concerns, since we're reaching slightly deeper into an area that is poorly defined there, which we should only do deliberately IMO. Now that we have deliberated I'm okay with proceeding.
The discussion for the signature change should happen here.
e299ff9
to
54f0913
Compare
Some changes occurred in compiler/rustc_codegen_ssa |
This comment has been minimized.
This comment has been minimized.
f4791af
to
19c5b08
Compare
This lets us use it for any T since T may contain uninitialized bytes.
LLVM generally can do this on its own, but it helps miri and other backends.
19c5b08
to
115aeee
Compare
This comment has been minimized.
This comment has been minimized.
Some changes occurred in compiler/rustc_codegen_gcc Some changes occurred in compiler/rustc_codegen_cranelift cc @bjorn3 |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
specialize `slice::fill` to use memset
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (51de2e9): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 0.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary 1.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.0%, secondary -0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 471.778s -> 472.471s (0.15%) |
@craterbot check (We could work around that by making |
👌 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
🚧 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
This was attempted previously in #83245 but had to be reverted #87891 because the intrinsic didn't support uninitialized values.
This is my first attempt at modifying intrinsics and const-eval, so I have no clue if I'm doing this correctly.
It should also help const eval performance: rust-lang/miri#4616