Skip to content

Commit 21f887f

Browse files
authored
Merge pull request #1892 from canndrew/uninitialized-uninhabited
Deprecate uninitialized in favor of a new MaybeUninit type
2 parents 16ea7f6 + 1b0ef45 commit 21f887f

File tree

1 file changed

+294
-0
lines changed

1 file changed

+294
-0
lines changed
+294
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,294 @@
1+
- Feature Name: `uninitialized_uninhabited`
2+
- Start Date: 2017-02-09
3+
- RFC PR: [rust-lang/rfcs#1892](https://github.com/rust-lang/rfcs/pull/1892)
4+
- Rust Issue: [rust-lang/rust#53491](https://github.com/rust-lang/rust/issues/53491)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
Deprecate `mem::uninitialized::<T>` and `mem::zeroed::<T>` and replace them with
10+
a `MaybeUninit<T>` type for safer and more principled handling of uninitialized
11+
data.
12+
13+
# Motivation
14+
[motivation]: #motivation
15+
16+
The problems with `uninitialized` centre around its usage with uninhabited
17+
types, and its interaction with Rust's type layout invariants. The concept of
18+
"uninitialized data" is extremely problematic when it comes into contact with
19+
types like `!` or `Void`.
20+
21+
For any given type, there may be valid and invalid bit-representations. For
22+
example, the type `u8` consists of a single byte and all possible bytes can be
23+
sensibly interpreted as a value of type `u8`. By contrast, a `bool` also
24+
consists of a single byte but not all bytes represent a `bool`: the
25+
bit vectors `[00000000]` (`false`) and `[00000001]` (`true`) are valid `bool`s
26+
whereas `[00101010]` is not. By further contrast, the type `!` has no valid
27+
bit-representations at all. Even though it's treated as a zero-sized type, the
28+
empty bit vector `[]` is not a valid representation and has no interpretation
29+
as a `!`.
30+
31+
As `bool` has both valid and invalid bit-representations, an uninitialized
32+
`bool` cannot be known to be invalid until it is inspected. At this point, if
33+
it is invalid, the compiler is free to invoke undefined behaviour. By contrast,
34+
an uninitialized `!` can only possibly be invalid. Without even inspecting such
35+
a value the compiler can assume that it's working in an impossible
36+
state-of-affairs whenever such a value is in scope. This is the logical basis
37+
for using a return type of `!` to represent diverging functions. If we call a
38+
function which returns `bool`, we can't assume that the returned value is
39+
invalid and we have to handle the possibility that the function returns.
40+
However if a function call returns `!`, we know that the function cannot
41+
sensibly return. Therefore we can treat everything after the call as dead code
42+
and we can write-off the scenario where the function *does* return as being
43+
undefined behaviour.
44+
45+
The issue then is what to do about `uninitialized::<T>()` where `T = !`?
46+
`uninitialized::<T>` is meaningless for uninhabited `T` and is currently
47+
instant undefined behaviour when `T = !` - even if the "value of type `!`" is
48+
never read. The type signature of `uninitialized::<!>` is, after all, that of a
49+
diverging function:
50+
51+
```rust
52+
fn mem::uninitialized::<!>() -> !
53+
```
54+
55+
Yet calling this function does not diverge! It just breaks everything then eats
56+
your laundry instead.
57+
58+
This problem is most prominent with `!` but also applies to other types that
59+
have restrictions on the values they can carry. For example,
60+
`Some(mem::uninitialized::<bool>()).is_none()` could actually return `true`
61+
because uninitialized memory could violate the invariant that a `bool` is always
62+
`[00000000]` or `[00000001]` -- and Rust relies on this invariant when doing
63+
enum layout. So, `mem::uninitialized::<bool>()` is instantaneous undefined
64+
behavior just like `mem::uninitialized::<!>()`. This also affects `mem::zeroed`
65+
when considering types where the all-`0` bit pattern is not valid, like
66+
references: `mem::zeroed::<&'static i32>()` is instantaneous undefined behavior.
67+
68+
## Tracking uninitializedness in the type
69+
70+
An alternative way of representing uninitialized data is through a union type:
71+
72+
```rust
73+
union MaybeUninit<T> {
74+
uninit: (),
75+
value: T,
76+
}
77+
```
78+
79+
Instead of creating an "uninitialized value", we can create a `MaybeUninit`
80+
initialized with `uninit: ()`. Then, once we know that the value in the union
81+
is valid, we can extract it with `my_uninit.value`. This is a better way of
82+
handling uninitialized data because it doesn't involve lying to the type system
83+
and pretending that we have a value when we don't. It also better represents
84+
what's actually going on: we never *really* have a value of type `T` when we're
85+
using `uninitialized::<T>`, what we have is some memory that contains either a
86+
value (`value: T`) or nothing (`uninit: ()`), with it being the programmer's
87+
responsibility to keep track of which state we're in. Notice that creating a
88+
`MaybeUninit<T>` is safe for any `T`! Only when accessing `my_uninit.value`,
89+
we have to be careful to ensure this has been properly initialized.
90+
91+
To see how this can replace `uninitialized` and fix bugs in the process,
92+
consider the following code:
93+
94+
```rust
95+
fn catch_an_unwind<T, F: FnOnce() -> T>(f: F) -> Option<T> {
96+
let mut foo = unsafe {
97+
mem::uninitialized::<T>()
98+
};
99+
let mut foo_ref = &mut foo as *mut T;
100+
101+
match std::panic::catch_unwind(|| {
102+
let val = f();
103+
unsafe {
104+
ptr::write(foo_ref, val);
105+
}
106+
}) {
107+
Ok(()) => Some(foo);
108+
Err(_) => None
109+
}
110+
}
111+
```
112+
113+
Naively, this code might look safe. The problem though is that by the time we
114+
get to `let mut foo_ref` we're already saying we have a value of type `T`. But
115+
we don't, and for `T = !` this is impossible. And so if this function is called
116+
with a diverging callback it will invoke undefined behaviour before it even
117+
gets to `catch_unwind`.
118+
119+
We can fix this by using `MaybeUninit` instead:
120+
121+
```rust
122+
fn catch_an_unwind<T, F: FnOnce() -> T>(f: F) -> Option<T> {
123+
let mut foo: MaybeUninit<T> = MaybeUninit {
124+
uninit: (),
125+
};
126+
let mut foo_ref = &mut foo as *mut MaybeUninit<T>;
127+
128+
match std::panic::catch_unwind(|| {
129+
let val = f();
130+
unsafe {
131+
ptr::write(&mut (*foo_ref).value, val);
132+
}
133+
}) {
134+
Ok(()) => {
135+
unsafe {
136+
Some(foo.value)
137+
}
138+
},
139+
Err(_) => None
140+
}
141+
}
142+
```
143+
144+
Note the difference: we've moved the unsafe block to the part of the code which is
145+
actually unsafe - where we have to assert to the compiler that we have a valid
146+
value. And we only ever tell the compiler we have a value of type `T` where we
147+
know we actually do have a value of type `T`. As such, this is fine to use with
148+
any `T`, including `!`. If the callback diverges then it's not possible to get
149+
to the `unsafe` block and try to read the non-existant value.
150+
151+
Given that it's so easy for code using `uninitialzed` to hide bugs like this,
152+
and given that there's a better alternative, this RFC proposes deprecating
153+
`uninitialized` and introducing the `MaybeUninit` type into the standard
154+
library as a replacement.
155+
156+
# Detailed design
157+
[design]: #detailed-design
158+
159+
Add the aforementioned `MaybeUninit` type to the standard library:
160+
161+
```rust
162+
pub union MaybeUninit<T> {
163+
uninit: (),
164+
value: ManuallyDrop<T>,
165+
}
166+
```
167+
168+
The type should have at least the following interface
169+
([Playground link](https://play.rust-lang.org/?gist=81f5ab9a7e7107c9583de21382ef4333&version=nightly&mode=debug&edition=2015)):
170+
171+
```rust
172+
impl<T> MaybeUninit<T> {
173+
/// Create a new `MaybeUninit` in an uninitialized state.
174+
///
175+
/// Note that dropping a `MaybeUninit` will never call `T`'s drop code.
176+
/// It is your responsibility to make sure `T` gets dropped if it got initialized.
177+
pub fn uninitialized() -> MaybeUninit<T> {
178+
MaybeUninit {
179+
uninit: (),
180+
}
181+
}
182+
183+
/// Create a new `MaybeUninit` in an uninitialized state, with the memory being
184+
/// filled with `0` bytes. It depends on `T` whether that already makes for
185+
/// proper initialization. For example, `MaybeUninit<usize>::zeroed()` is initialized,
186+
/// but `MaybeUninit<&'static i32>::zeroed()` is not because references must not
187+
/// be null.
188+
///
189+
/// Note that dropping a `MaybeUninit` will never call `T`'s drop code.
190+
/// It is your responsibility to make sure `T` gets dropped if it got initialized.
191+
pub fn zeroed() -> MaybeUninit<T> {
192+
let mut u = MaybeUninit::<T>::uninitialized();
193+
unsafe { u.as_mut_ptr().write_bytes(0u8, 1); }
194+
u
195+
}
196+
197+
/// Set the value of the `MaybeUninit`. The overwrites any previous value without dropping it.
198+
pub fn set(&mut self, val: T) {
199+
unsafe {
200+
self.value = ManuallyDrop::new(val);
201+
}
202+
}
203+
204+
/// Extract the value from the `MaybeUninit` container. This is a great way
205+
/// to ensure that the data will get dropped, because the resulting `T` is
206+
/// subject to the usual drop handling.
207+
///
208+
/// # Unsafety
209+
///
210+
/// It is up to the caller to guarantee that the the `MaybeUninit` really is in an initialized
211+
/// state, otherwise this will immediately cause undefined behavior.
212+
pub unsafe fn into_inner(self) -> T {
213+
std::ptr::read(&*self.value)
214+
}
215+
216+
/// Get a reference to the contained value.
217+
///
218+
/// # Unsafety
219+
///
220+
/// It is up to the caller to guarantee that the the `MaybeUninit` really is in an initialized
221+
/// state, otherwise this will immediately cause undefined behavior.
222+
pub unsafe fn get_ref(&self) -> &T {
223+
&*self.value
224+
}
225+
226+
/// Get a mutable reference to the contained value.
227+
///
228+
/// # Unsafety
229+
///
230+
/// It is up to the caller to guarantee that the the `MaybeUninit` really is in an initialized
231+
/// state, otherwise this will immediately cause undefined behavior.
232+
pub unsafe fn get_mut(&mut self) -> &mut T {
233+
&mut *self.value
234+
}
235+
236+
/// Get a pointer to the contained value. Reading from this pointer will be undefined
237+
/// behavior unless the `MaybeUninit` is initialized.
238+
pub fn as_ptr(&self) -> *const T {
239+
unsafe { &*self.value as *const T }
240+
}
241+
242+
/// Get a mutable pointer to the contained value. Reading from this pointer will be undefined
243+
/// behavior unless the `MaybeUninit` is initialized.
244+
pub fn as_mut_ptr(&mut self) -> *mut T {
245+
unsafe { &mut *self.value as *mut T }
246+
}
247+
}
248+
```
249+
250+
Deprecate `uninitialized` with a deprecation messages that points people to the
251+
`MaybeUninit` type. Make calling `uninitialized` on an empty type trigger a
252+
runtime panic which also prints the deprecation message.
253+
254+
# How We Teach This
255+
[how-we-teach-this]: #how-we-teach-this
256+
257+
Correct handling of uninitialized data is an advanced topic and should probably
258+
be left to The Rustonomicon. There should be a paragraph somewhere therein
259+
introducing the `MaybeUninit` type.
260+
261+
The documentation for `uninitialized` should explain the motivation for these
262+
changes and direct people to the `MaybeUninit` type.
263+
264+
# Drawbacks
265+
[drawbacks]: #drawbacks
266+
267+
This will be a rather large breaking change as a lot of people are using
268+
`uninitialized`. However, much of this code already likely contains subtle
269+
bugs.
270+
271+
# Alternatives
272+
[alternatives]: #alternatives
273+
274+
* Not do this.
275+
* Just make `uninitialized::<!>` panic instead (making `!`'s behaviour
276+
surprisingly inconsistent with all the other types).
277+
* Introduce an `Inhabited` auto-trait for inhabited types and add it as a bound
278+
to the type argument of `uninitialized`.
279+
* Disallow using uninhabited types with `uninitialized` by making it behave
280+
like `transmute` does today - by having restrictions on its type arguments
281+
which are enforced outside the trait system.
282+
283+
# Unresolved questions
284+
[unresolved]: #unresolved-questions
285+
286+
None known.
287+
288+
# Future directions
289+
290+
Ideally, Rust's type system should have a way of talking about initializedness
291+
statically. In the past there have been proposals for new pointer types which
292+
could safely handle uninitialized data. We should seriously consider pursuing
293+
one of these proposals.
294+

0 commit comments

Comments
 (0)