Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate uninitialized in favor of a new MaybeUninit type #1892

Merged
merged 20 commits into from
Aug 19, 2018
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
233 changes: 233 additions & 0 deletions text/0000-uninitialized-uninhabited.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
- Feature Name: uninitialized-uninhabited
- Start Date: 2017-02-09
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

Deprecate the usage of `mem::uninitialized::<T>` for possibly-uninhabited `T`.
Specifically:
* Add a built-in `Inhabited` trait which is automatically implemented for
types known to have at least 1 possible value.
* Require an `Inhabited` bound on `T` for `uninitialized::<T>`.
* Add a `MaybeUninit<T>` union to the standard library for representing
possibly-initialized data in a more type-system-aware way.

# Motivation
[motivation]: #motivation

The concept of "uninitialized data" is extremely problematic when it comes into
contact with uninhabited types.

For any given type, there may be valid and invalid bit-representations. For
example, the type `u8` consists of a single byte and all possible bytes can be
sensibly interpreted as a value of type `u8`. By contrast, a `bool` also
consists of a single byte but not all bytes represent a `bool`: the
bit vectors `[00000000]` (`false`) and `[00000001]` (`true`) are valid `bool`s
whereas `[00101010]` is not. By further contrast, the type `!` has no valid
bit-representations at all. Even though it's treated as a zero-sized type, the
empty bit vector `[]` is not a valid representation and has no interpretation
as a `!`.

As `bool` has both valid and invalid bit-representations, an uninitialized
`bool` cannot be known to be invalid until it is inspected. At this point, if
it is invalid, the compiler is free to invoke undefined behaviour. By contrast,
an uninitialized `!` can only possibly be invalid. Without even inspecting such
a value the compiler can assume that it's working in an impossible
state-of-affairs whenever such a value is in scope. This is the logical basis
for using a return type of `!` to represent diverging functions. If we call a
function which returns `bool`, we can't assume that the returned value is
invalid and we have to handle the possibility that the function returns.
However if a function call returns `!`, we know that the function cannot
sensibly return. Therefore we can treat everything after the call as dead code
and we can write-off the scenario where the function *does* return as being
undefined behaviour.

The issue then is what to do about `uninitialized::<T>()` where `T = !`?
`uninitialized::<T>` is meaningless for uninhabited `T` and is currently
instant undefined behaviour when `T = !` - even if the "value of type `!`" is
never read. The type signature of `uninitialized::<!>` is, after all, that of a
diverging function:

```rust
fn mem::uninitialized::<!>() -> !
```

Yet calling this function does not diverge! It just breaks everything then eats
your laundry instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow, this seems preferable to folding my laundry at the moment...


This RFC proposes restricting the use of `uninitialized` to types that are
known to be inhabited. I also propose an addition to the standard library of a
`MaybeUninit` type which offers a much more principled way of handling
uninitialized data and can be used to hold uninitialized uninhabited types
without lying to the type system.

# Detailed design
[design]: #detailed-design

Add the following trait as a lang item:

```rust
#[lang="inhabited"]
trait Inhabited {}
```

This trait is automatically implemented for all inhabited types.

Change the type of `uninitialized` to:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conflicts somewhat with Inhabited being an auto-trait (a-la Sized), or rather T is already Inhabited unless specified otherwise (i.e. T: ?Inhabited)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Does ! impl Sized? Does that even make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mark-i-m Yes, size_of::<!>() == 0, More generally though, for any given size every element of ! has that size. Similarly, any two elements of ! have the same size. These are both trivially true since ! has no elements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are defining size_of::<T>() == n iff for all values v of T that v takes n bits to express, and since there are no values of T = !, this vacuously true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, yeah. There's two subtly different definitions you could give of Sized but ! satisfies both of them vacuously.


```rust
pub unsafe fn uninitialized<T: Inhabited>() -> T
```

Before enforcing this change we should have a future-compatibility warning
cycle and urge people to switch to `MaybeUninit` where possible.

Add a new type to the standard library:

```rust
union MaybeUninit<T> {
uninit: (),
value: T,
}
```

For an example of how this type can replace `uninitialized`, consider the
following code:

```rust
fn catch_an_unwind<T, F: FnOnce() -> T>(f: F) -> Option<T> {
let mut foo = unsafe {
mem::uninitialized::<T>()
};
let mut foo_ref = &mut foo as *mut T;

match std::panic::catch_unwind(|| {
let val = f();
unsafe {
ptr::write(foo_ref, val);
}
}) {
Ok(()) => Some(foo);
Err(_) => None
}
}
```

The problem here is, by the time we get to the third line we're already saying
we have a value of type `T`; which we don't, and which for `T = !` is
impossible. We can use `MaybeUninit` instead like this:

```rust
fn catch_an_unwind<T, F: FnOnce() -> T>(f: F) -> Option<T> {
let mut foo: MaybeUninit<T> = MaybeUninit {
uninit: (),
};
let mut foo_ref = &mut foo as *mut MaybeUninit<T>;

match std::panic::catch_unwind(|| {
let val = f();
unsafe {
ptr::write(&mut (*foo_ref).value, val);
}
}) {
Ok(()) => {
unsafe {
Some(foo.value)
}
},
Err(_) => None
}
}
```

Here, we've moved the `unsafe` block to the return position where we actually
know we have a `T`. This is fine for use with `!` because we can only reach
the line that extracts `value` from `foo` if `f` returned normally (which it
couldn't have).

# How We Teach This
[how-we-teach-this]: #how-we-teach-this

Correct handling of uninitialized data is an advanced topic and should probably be
left to The Rustonomicon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are introducing a new trait Inhabited here. It should be explained what "inhabited" means in Rust.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can define a type as a set of values and operations valid on those values. For example, u32 is the set of unsigned 32-bit values and Option<bool> is the set containing None, Some(true), and Some(false).

Under this definition a type is inhabited if the set of values is non-empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err I mean it should be explained in the "How We Teach This" section (and thus written in the 'nomicon if RFC is merged), not explain it to reviewers 😂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, lol, I misunderstood 😂

Yes I agree 👍

The documentation for `uninitialized` should explain the motivation for these
changes and direct people to the `MaybeUninit` type.

# Drawbacks
[drawbacks]: #drawbacks

This could be a rather large breaking change depending on how many people are
currently calling `uninitialized::<T>` with a generic `T`. However all such
code is already somewhat future-incompatible as it will malfunction (or panic)
if used with `!`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Inhabited is auto-trait, like Sized, I do not see how that’s problematic in any way.


Another drawback is that the `Inhabited` trait leaks private information about
types. Consider a type with the following definition:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced this is more serious than already-existing leaks... for example, you can already find out the size of a type. Is there any fundamental difference with this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, it's the same as Sized in this regard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense 😄


```rust
pub struct AmIInhabited {
_priv: (),
}
```

If this type is exported from its crate or module it will also export an impl
for `Inhabited`. Now suppose the definition is changed to:

```rust
pub struct AmIInhabited {
_priv: !,
}
```

The author of the crate may expect this change to be private and its effects
contained to within the crate. But in making this change they've also stopped
exporting the `Inhabited` impl, causing potential breakages for downstream
users.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, that’s pretty much the same story with Sized. I.e. you cannot change pub struct Sized { a: [u8; 42] } to pub struct Sized { a: [u8] }.


Although this is a problem in principal it's unlikely to come up much in
practice. I doubt it's common for someone to change an inhabited exported type
to being uninhabited; and any library consumers would already be unable to use
`uninitialized` with a generic `T`, they'd have to be using it with the
exported type specifically to hit a regression.

# Alternatives
[alternatives]: #alternatives

* Not do this.
* Just make `uninitialized::<!>` panic instead (making `!`'s behaviour
surprisingly inconsistent with all the other types).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a compile-time error / lint?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because then a lot of blanket impls could break because they happen to cover ! but use uninitialized. Rather, by making it a panic, things continue to compile, and only if you happen to use the unusual special case of uninitialized::<!> do things break.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, uninitialized::<!>() could be changed to panic without needing an RFC, right? (Since panicking, like anything else, is an allowed manifestation of UB.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but I think part of the intent of this RFC was to make uninitialized::<!>() not UB. Reducing the amount of possible UB in rust is a noble goal, IMHO, even if that reduction means explicitly defining things to panic.

* Adopt these rules but not the `Inhabited` trait. Instead make `uninitialized`
behave like `transmute` does today by having the restrictions on its type
arguments enforced outside the trait system.
* Not add the `Inhabited` trait. Instead add `MaybeUninit` as a lang item,
adopt the `Transmute` trait RFC, and change the signature of `uninitialized` to

```rust
pub unsafe fn uninitialized<T>() -> T
where MaybeUninit<T>: Transmute<T>
```
to get the same effect.
* Rename `Inhabited` to `Uninitialized` and move the `uninitialized` function
to being an unsafe method on the trait.

# Unresolved questions
[unresolved]: #unresolved-questions

None known.

# Future directions

Ideally, Rust's type system should have a way of talking about initializedness
statically. In the past there have been proposals for new pointer types which
could safely handle uninitialized data. We should seriously consider pursuing
one of these proposals.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I totally agree! This would make static muts much more powerful while still being safe.


This RFC could be a possible stepping-stone to completely deprecating
`uninitialized`. We would need to see how `MaybeUninit` is used in practice
beforehand, and it would be nice (though not strictly necessary) to also
implement the additional pointer types beforehand so that users of
`uninitialized` have the best possible options available to migrate to.