|
| 1 | +- Feature Name: `uninitialized_uninhabited` |
| 2 | +- Start Date: 2017-02-09 |
| 3 | +- RFC PR: [rust-lang/rfcs#1892](https://github.com/rust-lang/rfcs/pull/1892) |
| 4 | +- Rust Issue: [rust-lang/rust#53491](https://github.com/rust-lang/rust/issues/53491) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +Deprecate `mem::uninitialized::<T>` and `mem::zeroed::<T>` and replace them with |
| 10 | +a `MaybeUninit<T>` type for safer and more principled handling of uninitialized |
| 11 | +data. |
| 12 | + |
| 13 | +# Motivation |
| 14 | +[motivation]: #motivation |
| 15 | + |
| 16 | +The problems with `uninitialized` centre around its usage with uninhabited |
| 17 | +types, and its interaction with Rust's type layout invariants. The concept of |
| 18 | +"uninitialized data" is extremely problematic when it comes into contact with |
| 19 | +types like `!` or `Void`. |
| 20 | + |
| 21 | +For any given type, there may be valid and invalid bit-representations. For |
| 22 | +example, the type `u8` consists of a single byte and all possible bytes can be |
| 23 | +sensibly interpreted as a value of type `u8`. By contrast, a `bool` also |
| 24 | +consists of a single byte but not all bytes represent a `bool`: the |
| 25 | +bit vectors `[00000000]` (`false`) and `[00000001]` (`true`) are valid `bool`s |
| 26 | +whereas `[00101010]` is not. By further contrast, the type `!` has no valid |
| 27 | +bit-representations at all. Even though it's treated as a zero-sized type, the |
| 28 | +empty bit vector `[]` is not a valid representation and has no interpretation |
| 29 | +as a `!`. |
| 30 | + |
| 31 | +As `bool` has both valid and invalid bit-representations, an uninitialized |
| 32 | +`bool` cannot be known to be invalid until it is inspected. At this point, if |
| 33 | +it is invalid, the compiler is free to invoke undefined behaviour. By contrast, |
| 34 | +an uninitialized `!` can only possibly be invalid. Without even inspecting such |
| 35 | +a value the compiler can assume that it's working in an impossible |
| 36 | +state-of-affairs whenever such a value is in scope. This is the logical basis |
| 37 | +for using a return type of `!` to represent diverging functions. If we call a |
| 38 | +function which returns `bool`, we can't assume that the returned value is |
| 39 | +invalid and we have to handle the possibility that the function returns. |
| 40 | +However if a function call returns `!`, we know that the function cannot |
| 41 | +sensibly return. Therefore we can treat everything after the call as dead code |
| 42 | +and we can write-off the scenario where the function *does* return as being |
| 43 | +undefined behaviour. |
| 44 | + |
| 45 | +The issue then is what to do about `uninitialized::<T>()` where `T = !`? |
| 46 | +`uninitialized::<T>` is meaningless for uninhabited `T` and is currently |
| 47 | +instant undefined behaviour when `T = !` - even if the "value of type `!`" is |
| 48 | +never read. The type signature of `uninitialized::<!>` is, after all, that of a |
| 49 | +diverging function: |
| 50 | + |
| 51 | +```rust |
| 52 | +fn mem::uninitialized::<!>() -> ! |
| 53 | +``` |
| 54 | + |
| 55 | +Yet calling this function does not diverge! It just breaks everything then eats |
| 56 | +your laundry instead. |
| 57 | + |
| 58 | +This problem is most prominent with `!` but also applies to other types that |
| 59 | +have restrictions on the values they can carry. For example, |
| 60 | +`Some(mem::uninitialized::<bool>()).is_none()` could actually return `true` |
| 61 | +because uninitialized memory could violate the invariant that a `bool` is always |
| 62 | +`[00000000]` or `[00000001]` -- and Rust relies on this invariant when doing |
| 63 | +enum layout. So, `mem::uninitialized::<bool>()` is instantaneous undefined |
| 64 | +behavior just like `mem::uninitialized::<!>()`. This also affects `mem::zeroed` |
| 65 | +when considering types where the all-`0` bit pattern is not valid, like |
| 66 | +references: `mem::zeroed::<&'static i32>()` is instantaneous undefined behavior. |
| 67 | + |
| 68 | +## Tracking uninitializedness in the type |
| 69 | + |
| 70 | +An alternative way of representing uninitialized data is through a union type: |
| 71 | + |
| 72 | +```rust |
| 73 | +union MaybeUninit<T> { |
| 74 | + uninit: (), |
| 75 | + value: T, |
| 76 | +} |
| 77 | +``` |
| 78 | + |
| 79 | +Instead of creating an "uninitialized value", we can create a `MaybeUninit` |
| 80 | +initialized with `uninit: ()`. Then, once we know that the value in the union |
| 81 | +is valid, we can extract it with `my_uninit.value`. This is a better way of |
| 82 | +handling uninitialized data because it doesn't involve lying to the type system |
| 83 | +and pretending that we have a value when we don't. It also better represents |
| 84 | +what's actually going on: we never *really* have a value of type `T` when we're |
| 85 | +using `uninitialized::<T>`, what we have is some memory that contains either a |
| 86 | +value (`value: T`) or nothing (`uninit: ()`), with it being the programmer's |
| 87 | +responsibility to keep track of which state we're in. Notice that creating a |
| 88 | +`MaybeUninit<T>` is safe for any `T`! Only when accessing `my_uninit.value`, |
| 89 | +we have to be careful to ensure this has been properly initialized. |
| 90 | + |
| 91 | +To see how this can replace `uninitialized` and fix bugs in the process, |
| 92 | +consider the following code: |
| 93 | + |
| 94 | +```rust |
| 95 | +fn catch_an_unwind<T, F: FnOnce() -> T>(f: F) -> Option<T> { |
| 96 | + let mut foo = unsafe { |
| 97 | + mem::uninitialized::<T>() |
| 98 | + }; |
| 99 | + let mut foo_ref = &mut foo as *mut T; |
| 100 | + |
| 101 | + match std::panic::catch_unwind(|| { |
| 102 | + let val = f(); |
| 103 | + unsafe { |
| 104 | + ptr::write(foo_ref, val); |
| 105 | + } |
| 106 | + }) { |
| 107 | + Ok(()) => Some(foo); |
| 108 | + Err(_) => None |
| 109 | + } |
| 110 | +} |
| 111 | +``` |
| 112 | + |
| 113 | +Naively, this code might look safe. The problem though is that by the time we |
| 114 | +get to `let mut foo_ref` we're already saying we have a value of type `T`. But |
| 115 | +we don't, and for `T = !` this is impossible. And so if this function is called |
| 116 | +with a diverging callback it will invoke undefined behaviour before it even |
| 117 | +gets to `catch_unwind`. |
| 118 | + |
| 119 | +We can fix this by using `MaybeUninit` instead: |
| 120 | + |
| 121 | +```rust |
| 122 | +fn catch_an_unwind<T, F: FnOnce() -> T>(f: F) -> Option<T> { |
| 123 | + let mut foo: MaybeUninit<T> = MaybeUninit { |
| 124 | + uninit: (), |
| 125 | + }; |
| 126 | + let mut foo_ref = &mut foo as *mut MaybeUninit<T>; |
| 127 | + |
| 128 | + match std::panic::catch_unwind(|| { |
| 129 | + let val = f(); |
| 130 | + unsafe { |
| 131 | + ptr::write(&mut (*foo_ref).value, val); |
| 132 | + } |
| 133 | + }) { |
| 134 | + Ok(()) => { |
| 135 | + unsafe { |
| 136 | + Some(foo.value) |
| 137 | + } |
| 138 | + }, |
| 139 | + Err(_) => None |
| 140 | + } |
| 141 | +} |
| 142 | +``` |
| 143 | + |
| 144 | +Note the difference: we've moved the unsafe block to the part of the code which is |
| 145 | +actually unsafe - where we have to assert to the compiler that we have a valid |
| 146 | +value. And we only ever tell the compiler we have a value of type `T` where we |
| 147 | +know we actually do have a value of type `T`. As such, this is fine to use with |
| 148 | +any `T`, including `!`. If the callback diverges then it's not possible to get |
| 149 | +to the `unsafe` block and try to read the non-existant value. |
| 150 | + |
| 151 | +Given that it's so easy for code using `uninitialzed` to hide bugs like this, |
| 152 | +and given that there's a better alternative, this RFC proposes deprecating |
| 153 | +`uninitialized` and introducing the `MaybeUninit` type into the standard |
| 154 | +library as a replacement. |
| 155 | + |
| 156 | +# Detailed design |
| 157 | +[design]: #detailed-design |
| 158 | + |
| 159 | +Add the aforementioned `MaybeUninit` type to the standard library: |
| 160 | + |
| 161 | +```rust |
| 162 | +pub union MaybeUninit<T> { |
| 163 | + uninit: (), |
| 164 | + value: ManuallyDrop<T>, |
| 165 | +} |
| 166 | +``` |
| 167 | + |
| 168 | +The type should have at least the following interface |
| 169 | +([Playground link](https://play.rust-lang.org/?gist=81f5ab9a7e7107c9583de21382ef4333&version=nightly&mode=debug&edition=2015)): |
| 170 | + |
| 171 | +```rust |
| 172 | +impl<T> MaybeUninit<T> { |
| 173 | + /// Create a new `MaybeUninit` in an uninitialized state. |
| 174 | + /// |
| 175 | + /// Note that dropping a `MaybeUninit` will never call `T`'s drop code. |
| 176 | + /// It is your responsibility to make sure `T` gets dropped if it got initialized. |
| 177 | + pub fn uninitialized() -> MaybeUninit<T> { |
| 178 | + MaybeUninit { |
| 179 | + uninit: (), |
| 180 | + } |
| 181 | + } |
| 182 | + |
| 183 | + /// Create a new `MaybeUninit` in an uninitialized state, with the memory being |
| 184 | + /// filled with `0` bytes. It depends on `T` whether that already makes for |
| 185 | + /// proper initialization. For example, `MaybeUninit<usize>::zeroed()` is initialized, |
| 186 | + /// but `MaybeUninit<&'static i32>::zeroed()` is not because references must not |
| 187 | + /// be null. |
| 188 | + /// |
| 189 | + /// Note that dropping a `MaybeUninit` will never call `T`'s drop code. |
| 190 | + /// It is your responsibility to make sure `T` gets dropped if it got initialized. |
| 191 | + pub fn zeroed() -> MaybeUninit<T> { |
| 192 | + let mut u = MaybeUninit::<T>::uninitialized(); |
| 193 | + unsafe { u.as_mut_ptr().write_bytes(0u8, 1); } |
| 194 | + u |
| 195 | + } |
| 196 | + |
| 197 | + /// Set the value of the `MaybeUninit`. The overwrites any previous value without dropping it. |
| 198 | + pub fn set(&mut self, val: T) { |
| 199 | + unsafe { |
| 200 | + self.value = ManuallyDrop::new(val); |
| 201 | + } |
| 202 | + } |
| 203 | + |
| 204 | + /// Extract the value from the `MaybeUninit` container. This is a great way |
| 205 | + /// to ensure that the data will get dropped, because the resulting `T` is |
| 206 | + /// subject to the usual drop handling. |
| 207 | + /// |
| 208 | + /// # Unsafety |
| 209 | + /// |
| 210 | + /// It is up to the caller to guarantee that the the `MaybeUninit` really is in an initialized |
| 211 | + /// state, otherwise this will immediately cause undefined behavior. |
| 212 | + pub unsafe fn into_inner(self) -> T { |
| 213 | + std::ptr::read(&*self.value) |
| 214 | + } |
| 215 | + |
| 216 | + /// Get a reference to the contained value. |
| 217 | + /// |
| 218 | + /// # Unsafety |
| 219 | + /// |
| 220 | + /// It is up to the caller to guarantee that the the `MaybeUninit` really is in an initialized |
| 221 | + /// state, otherwise this will immediately cause undefined behavior. |
| 222 | + pub unsafe fn get_ref(&self) -> &T { |
| 223 | + &*self.value |
| 224 | + } |
| 225 | + |
| 226 | + /// Get a mutable reference to the contained value. |
| 227 | + /// |
| 228 | + /// # Unsafety |
| 229 | + /// |
| 230 | + /// It is up to the caller to guarantee that the the `MaybeUninit` really is in an initialized |
| 231 | + /// state, otherwise this will immediately cause undefined behavior. |
| 232 | + pub unsafe fn get_mut(&mut self) -> &mut T { |
| 233 | + &mut *self.value |
| 234 | + } |
| 235 | + |
| 236 | + /// Get a pointer to the contained value. Reading from this pointer will be undefined |
| 237 | + /// behavior unless the `MaybeUninit` is initialized. |
| 238 | + pub fn as_ptr(&self) -> *const T { |
| 239 | + unsafe { &*self.value as *const T } |
| 240 | + } |
| 241 | + |
| 242 | + /// Get a mutable pointer to the contained value. Reading from this pointer will be undefined |
| 243 | + /// behavior unless the `MaybeUninit` is initialized. |
| 244 | + pub fn as_mut_ptr(&mut self) -> *mut T { |
| 245 | + unsafe { &mut *self.value as *mut T } |
| 246 | + } |
| 247 | +} |
| 248 | +``` |
| 249 | + |
| 250 | +Deprecate `uninitialized` with a deprecation messages that points people to the |
| 251 | +`MaybeUninit` type. Make calling `uninitialized` on an empty type trigger a |
| 252 | +runtime panic which also prints the deprecation message. |
| 253 | + |
| 254 | +# How We Teach This |
| 255 | +[how-we-teach-this]: #how-we-teach-this |
| 256 | + |
| 257 | +Correct handling of uninitialized data is an advanced topic and should probably |
| 258 | +be left to The Rustonomicon. There should be a paragraph somewhere therein |
| 259 | +introducing the `MaybeUninit` type. |
| 260 | + |
| 261 | +The documentation for `uninitialized` should explain the motivation for these |
| 262 | +changes and direct people to the `MaybeUninit` type. |
| 263 | + |
| 264 | +# Drawbacks |
| 265 | +[drawbacks]: #drawbacks |
| 266 | + |
| 267 | +This will be a rather large breaking change as a lot of people are using |
| 268 | +`uninitialized`. However, much of this code already likely contains subtle |
| 269 | +bugs. |
| 270 | + |
| 271 | +# Alternatives |
| 272 | +[alternatives]: #alternatives |
| 273 | + |
| 274 | +* Not do this. |
| 275 | +* Just make `uninitialized::<!>` panic instead (making `!`'s behaviour |
| 276 | + surprisingly inconsistent with all the other types). |
| 277 | +* Introduce an `Inhabited` auto-trait for inhabited types and add it as a bound |
| 278 | + to the type argument of `uninitialized`. |
| 279 | +* Disallow using uninhabited types with `uninitialized` by making it behave |
| 280 | + like `transmute` does today - by having restrictions on its type arguments |
| 281 | + which are enforced outside the trait system. |
| 282 | + |
| 283 | +# Unresolved questions |
| 284 | +[unresolved]: #unresolved-questions |
| 285 | + |
| 286 | +None known. |
| 287 | + |
| 288 | +# Future directions |
| 289 | + |
| 290 | +Ideally, Rust's type system should have a way of talking about initializedness |
| 291 | +statically. In the past there have been proposals for new pointer types which |
| 292 | +could safely handle uninitialized data. We should seriously consider pursuing |
| 293 | +one of these proposals. |
| 294 | + |
0 commit comments