-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate 280-byte memset from ReadDir iterator #103137
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,15 @@ use crate::ffi::{CStr, OsStr, OsString}; | |
use crate::fmt; | ||
use crate::io::{self, BorrowedCursor, Error, IoSlice, IoSliceMut, SeekFrom}; | ||
use crate::mem; | ||
#[cfg(any( | ||
target_os = "android", | ||
target_os = "linux", | ||
target_os = "solaris", | ||
target_os = "fuchsia", | ||
target_os = "redox", | ||
target_os = "illumos" | ||
))] | ||
use crate::mem::MaybeUninit; | ||
use crate::os::unix::io::{AsFd, AsRawFd, BorrowedFd, FromRawFd, IntoRawFd}; | ||
use crate::path::{Path, PathBuf}; | ||
use crate::ptr; | ||
|
@@ -584,33 +593,69 @@ impl Iterator for ReadDir { | |
}; | ||
} | ||
|
||
// Only d_reclen bytes of *entry_ptr are valid, so we can't just copy the | ||
// whole thing (#93384). Instead, copy everything except the name. | ||
let mut copy: dirent64 = mem::zeroed(); | ||
// Can't dereference entry_ptr, so use the local entry to get | ||
// offsetof(struct dirent, d_name) | ||
let copy_bytes = &mut copy as *mut _ as *mut u8; | ||
let copy_name = &mut copy.d_name as *mut _ as *mut u8; | ||
let name_offset = copy_name.offset_from(copy_bytes) as usize; | ||
let entry_bytes = entry_ptr as *const u8; | ||
let entry_name = entry_bytes.add(name_offset); | ||
ptr::copy_nonoverlapping(entry_bytes, copy_bytes, name_offset); | ||
// The dirent64 struct is a weird imaginary thing that isn't ever supposed | ||
// to be worked with by value. Its trailing d_name field is declared | ||
// variously as [c_char; 256] or [c_char; 1] on different systems but | ||
// either way that size is meaningless; only the offset of d_name is | ||
// meaningful. The dirent64 pointers that libc returns from readdir64 are | ||
// allowed to point to allocations smaller _or_ LARGER than implied by the | ||
// definition of the struct. | ||
// | ||
// As such, we need to be even more careful with dirent64 than if its | ||
// contents were "simply" partially initialized data. | ||
// | ||
// Like for uninitialized contents, converting entry_ptr to `&dirent64` | ||
// would not be legal. However, unique to dirent64 is that we don't even | ||
// get to use `addr_of!((*entry_ptr).d_name)` because that operation | ||
// requires the full extent of *entry_ptr to be in bounds of the same | ||
// allocation, which is not necessarily the case here. | ||
// | ||
// Absent any other way to obtain a pointer to `(*entry_ptr).d_name` | ||
// legally in Rust analogously to how it would be done in C, we instead | ||
// need to make our own non-libc allocation that conforms to the weird | ||
// imaginary definition of dirent64, and use that for a field offset | ||
// computation. | ||
macro_rules! offset_ptr { | ||
($entry_ptr:expr, $field:ident) => {{ | ||
const OFFSET: isize = { | ||
let delusion = MaybeUninit::<dirent64>::uninit(); | ||
let entry_ptr = delusion.as_ptr(); | ||
unsafe { | ||
ptr::addr_of!((*entry_ptr).$field) | ||
.cast::<u8>() | ||
.offset_from(entry_ptr.cast::<u8>()) | ||
} | ||
}; | ||
if true { | ||
// Cast to the same type determined by the else branch. | ||
$entry_ptr.byte_offset(OFFSET).cast::<_>() | ||
} else { | ||
#[allow(deref_nullptr)] | ||
{ | ||
ptr::addr_of!((*ptr::null::<dirent64>()).$field) | ||
} | ||
} | ||
}}; | ||
} | ||
|
||
// d_name is guaranteed to be null-terminated. | ||
let name = CStr::from_ptr(offset_ptr!(entry_ptr, d_name).cast()); | ||
let name_bytes = name.to_bytes(); | ||
if name_bytes == b"." || name_bytes == b".." { | ||
continue; | ||
} | ||
|
||
let entry = dirent64_min { | ||
d_ino: copy.d_ino as u64, | ||
d_ino: *offset_ptr!(entry_ptr, d_ino) as u64, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems like we could avoid the awkward else branch in offset_ptr! if this used ptr::addr_of!, which I think is fine for these fields that are properly typed (and in-bounds)? Or am I missing something? (Then offset_ptr can be char_offset_ptr and assume that we're casting to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wrote it this way because the way you suggest is (shockingly) not allowed. use std::ptr;
#[repr(C)]
struct Small { x: u16, y: u16 }
#[repr(C)]
struct Big { x: u16, y: u16, z: u16 }
fn main() {
unsafe {
let small = Box::into_raw(Box::new(Small { x: 0, y: 0 }));
let big = small as *const Big;
let y = *ptr::addr_of!((*big).y);
let _ = Box::from_raw(small);
println!("{}", y);
}
} error: Undefined Behavior: dereferencing pointer failed: alloc1550 has size 4, so pointer to 6 bytes starting at offset 0 is out-of-bounds
--> src/main.rs:13:18
|
13 | let y = *ptr::addr_of!((*big).y);
| ^^^^^^^^^^^^^^^^^^^^^^^ dereferencing pointer failed: alloc1550 has size 4, so pointer to 6 bytes starting at offset 0 is out-of-bounds
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In that (a) DGAF, it just adds 2 bytes to the value of Currently in Rust, (d) is the only one that is definitively not UB. Personally I think (b) should be the requirement (that's the case in a C There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I see. Yeah, that makes sense - I believe this is an area where the exact rules aren't yet nailed down and Miri is just being conservative here. I'll go ahead and r+ this implementation, but I agree that we need a better solution here, since in practice the current requirements are overly strict. (It may also be worth having a macro or function accessors like this in libc since presumably any user of readdir ends up needing this macro). |
||
#[cfg(not(any(target_os = "solaris", target_os = "illumos")))] | ||
d_type: copy.d_type as u8, | ||
d_type: *offset_ptr!(entry_ptr, d_type) as u8, | ||
}; | ||
|
||
let ret = DirEntry { | ||
return Some(Ok(DirEntry { | ||
entry, | ||
// d_name is guaranteed to be null-terminated. | ||
name: CStr::from_ptr(entry_name as *const _).to_owned(), | ||
name: name.to_owned(), | ||
dir: Arc::clone(&self.inner), | ||
}; | ||
if ret.name_bytes() != b"." && ret.name_bytes() != b".." { | ||
return Some(Ok(ret)); | ||
} | ||
})); | ||
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is less scary-looking than the ub-in-dead-code version IMO.Edit: actually nevermind, the map's closure is also UB due to the special situation we are in. When given the choice between obvious on first sight UB-in-dead-code, and UB-in-dead-code where you don't see the UB-iness immediately, I think it's better to prefer former.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I also considered:
Not sure if that seems better to you. It avoids the
allow(deref_nullptr)
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the UB causing version that used to be there earlier in the history, right? Yeah it might actually be good to have that as it makes it clear that this version was specifically not chosen, and gives the opportunity to put a 1-2 line comment about the UB-iness of the code. Even if it repeats the comment above, it helps bring the point across.
Also FTR, I reconsidered after my edit.
None::<* const dirent64>.map(|p| ptr::addr_of!((*p).$field)).unwrap()
isn't actually UB, unless my understanding of UB is wrong. All that you need is the type anyways.