-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set_len on a Vec<u8> of uninit is UB #79
Comments
Hmm.. Having a look at this, it looks like functions like /// # Safety
/// You may not deinitialise any T here.
unsafe fn mucast<T>(x: &mut [T]) -> &mut [MaybeUninit<T>] {
&mut *(x as *mut [T] as *mut [MaybeUninit<T>])
}
pub fn convert_latin1_to_utf8_partial(src: &[u8], dst: &mut [u8]) -> (usize, usize) {
convert_latin1_to_utf8_partial_uninit(src, mucast(dst))
}
pub fn convert_latin1_to_utf8_partial_uninit(src: &[u8], dst: &mut [MaybeUninit<u8>]) -> (usize, usize); It would need to promise that no previously init bytes are made uninit (in order to make the cast from This would be a useful change to make to consumers, as it would let them use the edit: These functions wouldn't have to be made public, of course, but they would need to exist in some form, and I think it might be useful for consumers. |
I think there's 2 questions here.
The answer to 1 seems to be "probably not, but it's not definitely not UB?" (see: rust-lang/unsafe-code-guidelines#77) 2 seems to be yes, going off the safety requirement on Thought that requirement could be closely tied to the answer to 1, so Vec could weaken that safety requirement. It would still be UB to observe an uninit u8 by value, but we don't seem to be doing that here (because miri passes). If 1. is not UB, then I don't think the public API would need to change at all, you'd just take a |
This is currently deep in no-mans-land. The reference clearly makes it UB, as does the documentation of So what Gankra writes is correct, in the sense that I think we should change the rules to allow this. But the current rules clearly don't allow it and I don't decide these rules by myself. |
...
AFAICT, the UB concern related to 2 is precisely about 1, and, therefore, if 1 isn't UB when T is a primitive integer, it seems to me that encoding_rs's use of |
Well, I guess the example given is distinct, since it has C code doing the writing to the uninitialized |
The safety comment of
That is a library API comment. I do not know when it was added (it has not always been there), but what it means seems fairly clear. In terms of language UB, it would be possible to allow |
spare_capacity_mut look perfect for this use case. While I agree nothing really bad should happen I don't see why user shouldn't assume it's UB, cause if code change or whatever, one day you do it for a Dropable item and you forget to do the set_len properly. I don't see any situation where you can't do the init before the set_len. |
encoding_rs currently has UB in the form of creating uninitialized u8's via set_len
Here are 2 examples where the UB is crystal clear:
encoding_rs/src/mem.rs
Lines 2007 to 2010 in dd9d99b
encoding_rs/src/mem.rs
Lines 2044 to 2047 in dd9d99b
set_len is also used in 7 functions in lib.rs, but I haven't looked at them very closely.
The docs for set_len explicitly say https://doc.rust-lang.org/std/vec/struct.Vec.html#method.set_len :
Some relevant discussion can be found here rust-lang/unsafe-code-guidelines#71
rustc itself has a lint specifically for this kind of thing: rust-lang/rust#75968
My understanding is this is currently considered UB, but this rule may be relaxed in the future to allow types where all bit patterns are valid to store uninitalized if they are not read from.
The text was updated successfully, but these errors were encountered: