Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New RFC: Collection Transmute #2756

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 152 additions & 0 deletions text/0000-collection-transmute.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
- Feature Name: collection-transmute
- Start Date: 2019-09-05
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

Add a `transmute(self)` method to `Vec`, `VecDeque` and `BinaryHeap`.

# Motivation
[motivation]: #motivation

The mentioned types cannot safely be transmuted by `mem::transmute`. Adding a
method for that purpose will hopefully discourage users to try anyway.

E.g. the following code is UB:

```rust
let x = vec![0u32; 2];
let y = unsafe { std::mem::transmute::<_, Vec<[u8; 4]>>(x) };
```

This is explained in the docs for [`Vec`], but with an unsound solution. The
way to do this correctly turns out surprisingly tricky:

```rust
let x = vec![0u32; 2];
let y = unsafe {
let y: &mut Vec<_> = &mut *ManuallyDrop::new(x);
Vec::from_raw_parts(y.as_mut_ptr() as *mut [u8; 4],
y.len(),
y.capacity())
};
```

Though the code is not too large, there are a good number of things to get
wrong – this solution was iteratively created by soundness-knowledgeable
Rustaceans, with multiple wrong attempts. So this method seems like a good
candidate for inclusion into `std`.

This also applies to `VecDeque` and `BinaryHeap`, which are implemented in
terms of `Vec`.

[`Vec`]: https://doc.rust-lang.org/std/vec/struct.Vec.html

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

The types `std::vec::Vec`, `std::collections::VecDeque` and
`std::collections::BinaryHeap` all get a new unsafe `transmute` method that
takes `self` by value and returns a new `Vec`/`VecDeque`/`BinaryHeap` with a
caller-chosen item type.

The API might look like this (exemplified for `Vec`):

```rust
impl<T> Vec<T> {
/// Transmute this `Vec` to a different item type
///
/// # Safety
///
/// Calling this function requires the target item type be compatible with
/// `Self::Item` (see [`mem::transmute`]).
Copy link
Contributor

@gnzlbg gnzlbg Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to just say something like:

Safe if and only if transmute::<T,I> is safe and align_of::<T>() == align_of::<I>().

instead of using words like "compatible" which might not have a clear meaning for everybody.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @rust-lang/wg-unsafe-code-guidelines

///
/// # Examples
///
/// transmute a `Vec` of 32-bit integers to their byte representations:
///
/// ```
/// let x = vec![0u32; 5];
/// let y = unsafe { x.transmute::<[u8; 4]>() };
/// assert_eq!(5, y.len());
/// assert_eq!([0, 0, 0, 0], y[0]);
/// ```
///
/// [`mem::transmute`]: ../../std/mem/fn.transmute.html
unsafe fn transmute<I>(self) -> Vec<I> {
..
}
```

This would mean our example above would become:

```rust
let x = vec![0i32; 2];
let y = unsafe { x.transmute::<[u8; 4]>() };
```

The documentation of `mem::transmute` should link to the new methods.

A clippy lint can catch offending calls to `mem::transmute` and suggest using
the inherent `transmute` method where applicable.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

The implementation for `Vec` copies the above solution. The methods for
`VecDeque` and `BinaryHeap` use the `Vec` method on their data.

# Drawbacks
[drawbacks]: #drawbacks

Adding a new method to `std` increases code size and needs to be maintained.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this increases the amount of code in std::collections, these generic methods should not incur a code-size penalty if they are not used.

However, this seems to be a minor inconvenience when compared to the safety
benefit.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

As explained above, the method is useful, yet non-obvious. There were multiple
wrong implementation attempts before that were not chosen for being unsound.

We could do nothing, but this means users wanting to transmute the items of a
`Vec` will be left without an obvious solution which will lead to unsound code
if they get it wrong.

We could document the correct solution instead of putting it into `std`. This
would lead to worse ergonomics.

We could create a trait to hold the `transmute` method. This would allow more
generic usage, but might lead to worse ergonomics due to type inference
uncertainty.

It would even be possible to provide a default implementation using
`mem::transmute`, but having a default implementation that might be unsound for
some types is a footgun waiting to happen.

# Prior art
[prior-art]: #prior-art

`mem::transmute` offers the same functionality for many other types. We have
added similar methods to different types where useful, see the various
iterator-like methods in `std`. @Shnatsel and @danielhenrymantilla came up
with the solution together in a
[clippy issue](https://github.com/rust-lang/rust-clippy/issues/4484).

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- are there other types that would benefit from such a method? What about
`HashSet`, `HashMap` and `LinkedList`?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the argument against doing this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For HashMap and HashSet, you shouldn't be able to do this because it would likely break the hash and values would be in the wrong buckets. HashMap provides no api to fix this. LinkedList is more amenable to this change, as is VecDeque.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KrishnaSannasi

For HashMap and HashSet, you shouldn't be able to do this because it would likely break the hash and values would be in the wrong buckets.

We define transmute between two types to be semantically equivalent to a bitwise move.

While reading the RFC it was unclear to me whether the semantics of the proposed methods are a bitwise move of the collection types, or of the elements themselves.

For the types covered by the RFC, even if the semantics are to perform a bitwise move of the element types, we can optimize that and make the operations O(1) in time and space, so there isn't really a difference in practice.

Whether we can do this for HashMap and HashSet would depend on what the semantics are. For example, if the semantics are to perform a bitwise transmute of each element, then we can just implement these transmutes as a .into_iter().map(|x| transmute(x)).collect(). OTOH if these should be O(1), then we can't.

Copy link

@RustyYato RustyYato Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect these to be O(1), we could add other methods like HashMap::map which would try and reuse the allocation but be semantically equivalent to into_iter().map(...).collect()to , but that is a separate issue.

- would it make sense to add implementations to types where `mem::transmute` is
acceptable, to steer people away from the latter?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by this. Could you elaborate what you mean by "types where mem::transmute is acceptable" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean types where it isn't insta-UB like Vec.

- this RFC does not deal with collection types in crates such as
[`SmallVec`](https://docs.rs/smallvec), though it is likely an implementation
in `std` might motivate the maintainers to include similar methods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I don't think this is an unresolved question. It is kind of up to those crates to keep up with the libstd collection API. It is normal that they will lag a bit behind, but for the APIs proposed here keeping up is only one PR away.


# Future possibilities
[future-possibilities]: #future-possibilities

The author cannot think of anything not already outlined above.