Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update unintiialized RFC #2

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 60 additions & 26 deletions text/0000-uninitialized-uninhabited.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,17 @@
# Summary
[summary]: #summary

Deprecate `mem::uninitialized::<T>` and replace it with a `MaybeUninit<T>` type
for safer and more principled handling of uninitialized data.
Deprecate `mem::uninitialized::<T>` and `mem::zeroed::<T>` and replace them with
a `MaybeUninit<T>` type for safer and more principled handling of uninitialized
data.

# Motivation
[motivation]: #motivation

The problems with `uninitialized` centre around its usage with uninhabited
types. The concept of "uninitialized data" is extremely problematic when it
comes into contact with types like `!` or `Void`.
types, and its interaction with Rust's type layout invariants. The concept of
"uninitialized data" is extremely problematic when it comes into contact with
types like `!` or `Void`.

For any given type, there may be valid and invalid bit-representations. For
example, the type `u8` consists of a single byte and all possible bytes can be
Expand Down Expand Up @@ -53,6 +55,18 @@ fn mem::uninitialized::<!>() -> !
Yet calling this function does not diverge! It just breaks everything then eats
your laundry instead.

This problem is most prominent with `!` but also applies to other types that
have restrictions on the values they can carry. For example,
`Some(mem::uninitialized::<bool>()).is_none()` could actually return `true`
because uninitialized memory could violate the invariant that a `bool` is always
`[00000000]` or `[00000001]` -- and Rust relies on this invariant when doing
enum layout. So, `mem::uninitialized::<bool>()` is instantaneous undefined
behavior just like `mem::uninitialized::<!>()`. This also affects `mem::zeroed`
when considering types where the all-`0` bit pattern is not valid, like
references: `mem::zeroed::<&'static i32>()` is instantaneous undefined behavior.

## Tracking uninitializedness in the type

An alternative way of representing uninitialized data is through a union type:

```rust
Expand All @@ -63,14 +77,16 @@ union MaybeUninit<T> {
```

Instead of creating an "uninitialized value", we can create a `MaybeUninit`
initialized with `uninit = ()`. Then, once we know that the value in the union
initialized with `uninit: ()`. Then, once we know that the value in the union
is valid, we can extract it with `my_uninit.value`. This is a better way of
handling uninitialized data because it doesn't involve lying to the type system
and pretending that we have a value when we don't. It also better represents
what's actually going on: we never *really* have a value of type `T` when we're
using `uninitialized::<T>`, what we have is some memory that contains either a
value (`value: T`) or nothing (`uninit: ()`), with it being the programmer's
responsibility to keep track of which state we're in.
responsibility to keep track of which state we're in. Notice that creating a
`MaybeUninit<T>` is safe for any `T`! Only when accessing `my_uninit.value`,
we have to be careful to ensure this has been properly initialized.

To see how this can replace `uninitialized` and fix bugs in the process,
consider the following code:
Expand Down Expand Up @@ -143,72 +159,90 @@ library as a replacement.
Add the aforementioned `MaybeUninit` type to the standard library:

```rust
#[repr(transparent)]
union MaybeUninit<T> {
pub union MaybeUninit<T> {
uninit: (),
value: T,
value: ManuallyDrop<T>,
}
```

The type should have at least the following interface
([Playground link](https://play.rust-lang.org/?gist=81f5ab9a7e7107c9583de21382ef4333&version=nightly&mode=debug&edition=2015)):

```rust
impl<T> MaybeUninit<T> {
/// Create a new `MaybeUninit` in an uninitialized state.
///
/// Note that dropping a `MaybeUninit` will never call `T`'s drop code.
/// It is your responsibility to make sure `T` gets dropped if it got initialized.
pub fn uninitialized() -> MaybeUninit<T> {
MaybeUninit {
uninit: (),
}
}

/// Create a new `MaybeUninit` in an uninitialized state, with the memory being
/// filled with `0` bytes. It depends on `T` whether that already makes for
/// proper initialization. For example, `MaybeUninit<usize>::zeroed()` is initialized,
/// but `MaybeUninit<&'static i32>::zeroed()` is not because references must not
/// be null.
///
/// Note that dropping a `MaybeUninit` will never call `T`'s drop code.
/// It is your responsibility to make sure `T` gets dropped if it got initialized.
pub fn zeroed() -> MaybeUninit<T> {
let mut u = MaybeUninit::<T>::uninitialized();
unsafe { u.as_mut_ptr().write_bytes(0u8, 1); }
u
}

/// Set the value of the `MaybeUninit`. The overwrites any previous value without dropping it.
pub fn set(&mut self, val: T) -> &mut T {
pub fn set(&mut self, val: T) {
unsafe {
self.value = val;
&mut self.value
self.value = ManuallyDrop::new(val);
}
}

/// Take the value of the `MaybeUninit`, putting it into an uninitialized state.
/// Extract the value from the `MaybeUninit` container. This is a great way
/// to ensure that the data will get dropped, because the resulting `T` is
/// subject to the usual drop handling.
///
/// # Unsafety
///
/// It is up to the caller to guarantee that the the `MaybeUninit` really is in an initialized
/// state, otherwise undefined behaviour will result.
pub unsafe fn get(&self) -> T {
std::ptr::read(&self.value)
/// state, otherwise this will immediately cause undefined behavior.
pub unsafe fn into_inner(self) -> T {
std::ptr::read(&*self.value)
}

/// Get a reference to the contained value.
///
/// # Unsafety
///
/// It is up to the caller to guarantee that the the `MaybeUninit` really is in an initialized
/// state, otherwise undefined behaviour will result.
/// state, otherwise this will immediately cause undefined behavior.
pub unsafe fn get_ref(&self) -> &T {
&self.value
&*self.value
}

/// Get a mutable reference to the contained value.
///
/// # Unsafety
///
/// It is up to the caller to guarantee that the the `MaybeUninit` really is in an initialized
/// state, otherwise undefined behaviour will result.
/// state, otherwise this will immediately cause undefined behavior.
pub unsafe fn get_mut(&mut self) -> &mut T {
&mut self.value
&mut *self.value
}

/// Get a pointer to the contained value. This pointer will only be valid if the `MaybeUninit`
/// is in an initialized state.
/// Get a pointer to the contained value. Reading from this pointer will be undefined
/// behavior unless the `MaybeUninit` is initialized.
pub fn as_ptr(&self) -> *const T {
self as *const MaybeUninit<T> as *const T
unsafe { &*self.value as *const T }
}

/// Get a mutable pointer to the contained value. This pointer will only be valid if the
/// `MaybeUninit` is in an initialized state.
/// Get a mutable pointer to the contained value. Reading from this pointer will be undefined
/// behavior unless the `MaybeUninit` is initialized.
pub fn as_mut_ptr(&mut self) -> *mut T {
self as *mut MaybeUninit<T> as *mut T
unsafe { &mut *self.value as *mut T }
}
}
```
Expand Down