-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write on &mut [u8] and Cursor<&mut [u8]> doesn't optimize very well. #44099
Comments
I thought I remembered a recent PR to encourage more memcpy use for optimization, but seeing #37573 it seems this has been an issue for a while. If we do special case small copies in trans, perhaps that optimization will be no longer needed. |
I managed to reduce this a bit further: #[inline]
fn write(buf: &mut [u8], data: &[u8]) {
// With this condition it optimizes to a store, without it we get memcpy calls.
if buf.len() < 1 {
return;
}
// This also gets rid of the memcpy
// let amt = data.len();
// But this doesn't.
// let amt = buf.len();
let amt = cmp::min(data.len(), buf.len());
buf.copy_from_slice(&data[..amt]);
}
pub fn write_byte(buf: &mut [u8], byte: u8) {
write(buf, &[byte]);
} It looks like the optimizer has some issues fully optimizing copy_from_slice if there is a possibility that |
Wrapping the innards of (a local copy of) copy_from_slice in EDIT: Actually that would change the behaviour slightly, and checking after the assert call doesn't work... |
Even further simplified: pub fn test(buf: &mut [u8], src: &[u8]) {
let amt = cmp::min(buf.len(), src.len());
// Copy 0 or 1 bytes.
let amt = cmp::min(amt, 1);
unsafe {
ptr::copy_nonoverlapping(src.as_ptr(), buf.as_mut_ptr(), amt);
}
} |
Looks like https://llvm.org/bugs/show_bug.cgi?id=31001 aka #37573 EDIT: LLVM bug is now llvm/llvm-project#30349 |
Indeed. Doesn't look like there has been any updates on that llvm bug in the meantime. Based on the discussion in the PR it seems adding a check to That said, maybe it would be an idea to add methods to read/write that reads or writes a single byte. Maybe we could also add a note to the copy_from_slice documentation that states that it may be sub-optimal for small-slices where the size isn't a compile-time constant. |
Calling write on a mutable slice (or one wrapped in a Cursor) with one, or a small amount of bytes results in function call to memcpy call after optimization (opt-level=3), rather than simply using a store as one would expect:
Results in:
copy_from_slice
seems to be part of the issue here, if I change the write implementation on mutable slices to use this instead ofcopy_from_slice
:the llvm ir looks much nicer:
The for loop will result in vector operations on longer slices, but I'm still unsure about whether doing this change could cause some slowdown on very long slices as the memcpy implementation may be more optimized for the specific system, and it doesn't really solve the underlying issue. There seems to be some problem with optimizing
copy_from_slice
calls that followsplit_at_mut
and probably some other calls that involve slice operations (I tried to alter the write function to use unsafe and creating a temporary slice using pointers instead, but that didn't help.)Happens on both nightly
rustc 1.21.0-nightly (2aeb5930f 2017-08-25)
and stable (1.19) x86_64-unknown-linux-gnu` (Not sure if memcpy behaviour could be different on other platforms).The text was updated successfully, but these errors were encountered: