New RFC: Collection Transmute #2756

llogiq · 2019-09-03T06:48:41Z

Centril · 2019-09-03T07:08:15Z

BurntSushi · 2019-09-03T11:31:05Z

Nice! Thanks for writing this up.

Though the code is not too large, there are a good number of things to get wrong – this solution was iteratively created by soundness-knowledgeable Rustaceans, with multiple wrong attempts.

Could you include text in the RFC that documents this progression? In particular, why is the naive approach unsound, and why is the final result correct?

In addition to that, I don't see alignment mentioned in this RFC, and I'd kind of expect that to be an issue somewhere. Or am I misunderstanding?

Additionally, are there any problems with the fact that you might create an allocation for Vec<T>, but wind up freeing that allocation as a Vec? It seems to me like that is not universally okay. I think size_of::<T>() == size_of::() and align_of::<T>() == align_of::() both need to hold, where I think the latter is an additional restriction over transmute, which AFAIK does not require the alignments to be equivalent. Are there more conditions that need to hold?

hanna-kruppe · 2019-09-03T12:04:33Z

I'd rather just declare mem::transmute to be OK. A new method with the same name and purpose as mem::transmute but subtly different meaning makes it difficult to remember which one to use, and it doesn't help people who (in the past or in the future) write it "incorrectly".

As far as I know, transmuting a Vec (if the item types are sufficiently compatible) is only UB today because we're not committed to Vec<T> and Vec having the same memory layout for all T and U, but we can easily guarantee that that if we so choose:

The current implementation (e.g., Vec's source code, rustc's layout algorithm) poses no obstacles to such transmutes, it's just a matter of guaranteeing it will remain so
There's no reason to expect that we'll ever want to not make it true (e.g. it might make sense to randomize layout of user-defined structs or determine it by PGO, but for Vec neither point seems likely to be useful)
If needed (e.g. because we add struct layout randomization), we can always add something extra to Vec to make it true, e.g., a repr(C) attribute or the hypothetical repr(in_order) (if the FFI implications of repr(C) are undesirable)

One remaining benefit of a specialized transmute method would that it could do more sanity checks, such as the item types having the same size and alignment. This is significant but I'm not so sure it outweighs the aforementioned drawbacks of having a separate but similar method.

burdges · 2019-09-03T12:38:59Z

I agree with @BurntSushi about enforcing size_of::<T>() == size_of::(). We must enforce this because mem::transmute specifies that it checks type sizes, which morally unsafe { mem::transmute::<_,Vec<u64>>(vec![0u8]) } violates.

I think however an inequality suffices for the alignment, probably align_of::() < align_of::<T>() no? If we dislike this, then Vec::transmute could reallocate whenever it must fix alignment, but maybe transmute users would prefer the inequality.

BurntSushi · 2019-09-03T12:49:34Z

I got the equality (as opposed to inequality) from the docs on GlobalAlloc::dealloc. Specifically:

layout must be the same layout that was used to allocate that block of memory

Where layout specifies the alignment. A strict reading of that suggests that the alignment you allocate with must be the same as the alignment that you deallocate with. Whether this is actually required for the underlying memory allocator(s) used, I don't know.

burdges · 2019-09-03T12:55:30Z

An inequality would prove useful, so maybe @RalfJung can say if it suffices.

RalfJung · 2019-09-03T13:19:16Z

You'll have to ask whoever wrote the allocator docs. I have no idea why alignment is even required to match.

(I'll only be able to read the full thing and respond properly to it when I am back from vacation.)

sfackler · 2019-09-03T13:30:21Z

The Windows system allocator implementation does rely on getting the same alignment when deallocating as allocating I believe.

Diggsey · 2019-09-03T14:57:52Z

This is not just a windows thing: any time that you need a greater alignment than the underlying allocator can provide, you have to allocate a slightly larger chunk of memory and then offset the returned pointer to the required alignment. When it comes to freeing that memory, you need some way to undo that offset.

RalfJung · 2019-09-03T15:04:52Z

Just the alignment is not enough information though to compute what the offset was.
When requesting a 8-byte-aligned allocation where we really need an alignment of 16, we need to know whether the offset-by-8 was necessary or not. Just knowing "oh, 16-aligned was requested" does not help.

llogiq · 2019-09-03T15:30:50Z

Thanks to all of you for this illuminating discussion! This convinces me that a) the current design isn't courageous enough, I should go with an unsafe Transmute trait instead, b) implementations should check what they can, preferrably at compile-time, and c) we should discourage mem::transmute usage, leaving it only as a last resort.

Regarding the alignment question, we'd need the information from the allocator if reducing alignment is acceptable.

Diggsey · 2019-09-03T15:48:48Z

Just the alignment is not enough information though to compute what the offset was.
When requesting a 8-byte-aligned allocation where we really need an alignment of 16, we need to know whether the offset-by-8 was necessary or not. Just knowing "oh, 16-aligned was requested" does not help.

You need the alignment to determine whether an offset was applied or not though, and that's exactly what libstd does: https://github.com/rust-lang/rust/blob/master/src/libstd/sys/windows/alloc.rs#L47-L56

Without that, you end up having to add overhead to every single allocation, not just those with unusually large alignment requirements.

clarfonthey · 2019-09-03T17:27:18Z

Kind of unsure how this could be used properly in the case of BinaryHeap.

burdges · 2019-09-03T20:43:28Z

How much do we know about when transmute gets used? I assume the most common usages are firstly to convert to/from a byte array, and secondly to violate another crate's visibility rules, like by adding or removing a private wrapper type. I think size_of::<T>() == size_of::() suffices for the second, so the most restrictive form sounds useful.

llogiq · 2019-09-03T21:29:01Z

@clarfon it's an unsafe operation. One could pair it with a heapify operation to restore the heap property if so desired, in cases where the types' orderings differ.

clarfonthey · 2019-09-03T21:40:37Z

@llogiq Right, but BinaryHeap doesn't offer that in any public APIs, and at that rate, you're mostly just reimplementing what BinaryHeap has to offer anyway. Either the internals of the heap should be exposed (seems out of scope for libstd) or anyone wanting to transmute their heap should just implement one on top of Vec.

comex · 2019-09-04T00:38:56Z

How exactly is the size/alignment check to be implemented? An extension of the special-case intrinsicck routine that currently checks std::mem::transmute? Is there any way to use const generics to integrate it properly with the trait system? (I tried, but it seems like the necessary functionality isn't implemented yet.)

djc · 2019-09-04T13:37:41Z

This may be obvious to most people involved here so far, but it might be useful to mention one or two use cases for having a transmute operation on these collections in the first place.

ExpHP · 2019-09-04T15:06:38Z

A Transmute trait sounds useless. Let me explain:

There are some cases where it would be highly desirable for straight-up transmutation to be well-defined. In particular, when switching a generic type argument between #[repr(transparent)] types; I'm not sure that #[repr(transparent)] is specified in a way such that ArbitraryStruct<usize> and ArbitraryStruct<Node> are guaranteed to have the same representation; but it ought to be!

// Newtype index wrapper for a graph node.
#[repr(transparent)]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
struct Node(usize);

// It is highly desirable for all of these to be valid operations
transmute::<Vec<Vec<usize>>, Vec<Vec<Node>>>(vecs);
transmute::<Vec<HashSet<usize>>, Vec<HashSet<Node>>>(sets);
transmute::<&HashSet<usize>, &HashSet<Node>>(set);

If we tried to solve the problem of UB Vec transmutes by introducing a Transmute trait, then

either: all of these transmutes have the same tricky UB that we'd be discouraging against mem::transmute for, and so we merely brushed a tiny bit of dirt under the rug.
or: the trait impls do not actually transmute. The first two transmutes would have to cost O(n), and the transmute behind references cannot be implemented.

RustyYato · 2019-09-04T15:51:46Z

I'm not sure that #[repr(transparent)] is specified in a way such that ArbitraryStruct and ArbitraryStruct are guaranteed to have the same representation; but it ought to be!

No it should not!

trait Foo {
    type Bar;
}

struct Baz<T: Foo>(T, T::Bar);

Baz can have different representations even for transparent types.

hanna-kruppe · 2019-09-04T16:07:48Z

@KrishnaSannasi

Baz can have different representations even for transparent types.

That is not a transparent newtype, though. repr(transparent) does guarantee equal layout to the sole non-ZST field (which may not be the same as the type parameter, if any, but that is a minor point).

However, @ExpHP, transmuting transparent newtypes may still not be safe because the newtype may have additional invariants over the wrapped type. For example, NonZeroU32 is a repr(transparent) struct containing u32.

danielhenrymantilla · 2019-10-25T17:37:35Z

Note that Vec::transmute had two things that made it error-prone:

#[repr(Rust)] transmute, "fixed" by Vec::{into,from}_raw_parts ✔️
Layout<T> == Layout<V> (including equal alignment) ⚠️ which is a check people may miss if they implement Vec::transmute as @shepmaster suggested.
- This point maybe can be addressed with good enough documentation 🤷‍♀️

Shnatsel · 2019-10-25T18:39:21Z

Layout<T> == Layout<V> is the especially tricky part. Outside owned collections it is totally fine to cast to a lower alignment, i.e. it's OK to view a u32 as [u8; 4], but that's not the case for collections that own their allocation.

Linting in rustc or Clippy for the specific pattern of obtaining a pointer from an owned collection, casting it to an incompatible alignment and creating another collection from it would solve the problem, but I'm not sure how feasible that is.

Clippy already has a lint for casts between pointers with incompatible layout, but it would not be triggered for casting a pointer to u32 as pointer to [u8; 4], and yet that triggers undefined behavior as soon as an owning collection constructed from the pointer with altered alignment is deallocated.

SimonSapin · 2019-10-25T18:45:19Z

That’s a good point, an explicit method would also be an opportunity to assert! the the layouts match. (And since the types are monomorphized, layouts should be constant and that branch should optimize away.)

llogiq · 2019-10-25T19:28:33Z

We could at least lint pointers that are used in Vec::from_raw_parts. Or other functions we identify.

joshtriplett · 2020-09-30T17:28:23Z

We discussed this in the @rust-lang/lang meeting today.

We don't think this is a lang concern; this is entirely up to @rust-lang/libs.

We do think there'd be value in project-safe-transmute taking a look at this, and ensuring there are appropriate mechanisms to do safe transmutes.

Apart from that, it seems reasonable to have mechanisms to do unsafe transmute here. Such a mechanism could use either from_raw_parts or a top-level transmute, depending on type layouts. Either way, this doesn't seem like a lang issue.

the8472 · 2020-10-02T21:01:06Z

Note that simple iterators already do almost the right thing for vec. https://rust.godbolt.org/z/537KEs
It does not call any external functions such as alloc/free. I'm not sure why the assembly still contains those loops, they seem to do nothing.

Mark-Simulacrum · 2021-06-22T17:27:26Z

Dropping T-lang per #2756 (comment)

dtolnay · 2021-06-23T03:14:18Z

We discussed this RFC in the most recent @rust-lang/libs-api meeting.

It was concerning that the code examples of "doing it correctly" in the RFC are all UB due to the alignment issue raised in the thread. You can't transmute by value to a vector with a different element alignment (like Vec<i32> to Vec<[u8; 4]>) since when dropping the resulting vector's heap allocation the Layout provided to the allocator is required to match the Layout it was allocated with.

We are interested in collection transmute APIs in general (both safe and unsafe) but would prefer to consider them only once there are trait bounds from #2981 to incorporate into the design.

For now, as noted in #2756 (comment), into_raw_parts + from_raw_parts is the recommended idiom for performing such transmute.

@rfcbot fcp close

rfcbot · 2021-06-23T03:14:19Z

Team member @dtolnay has proposed to close this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot · 2021-06-23T17:14:58Z

🔔 This is now entering its final comment period, as per the review above. 🔔

Lokathor · 2021-06-23T17:24:22Z

A comment: If you are willing to use a crate then many vec transmutations are already available: https://docs.rs/bytemuck/1.7.0/bytemuck/allocation/fn.try_cast_vec.html

rfcbot · 2021-07-03T17:27:07Z

The final comment period, with a disposition to close, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC is now closed.

llogiq force-pushed the collection-transmute branch from d11da3b to 4a44ee1 Compare September 3, 2019 06:49

llogiq mentioned this pull request Sep 3, 2019

New lint: transmute(Vec) is undefined behavior rust-lang/rust-clippy#4484

Closed

New RFC: Collection Transmute

15256d3

llogiq force-pushed the collection-transmute branch from 4a44ee1 to 15256d3 Compare September 3, 2019 07:00

Centril added A-collections Proposals about collection APIs A-impls-libstd Standard library implementations related proposals. A-unsafe Unsafe related proposals & ideas T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Sep 3, 2019

Centril added the T-lang Relevant to the language team, which will review and decide on the RFC. label Sep 3, 2019

RustyYato mentioned this pull request Nov 3, 2019

RFC: Vec::recycle #2802

Closed

mickvangelderen mentioned this pull request Feb 15, 2020

cast to lower-level buffer ref rustgd/cgmath#497

Open

KodrAus added the Libs-Tracked Libs issues that are tracked on the team's project board. label Jul 29, 2020

hanna-kruppe mentioned this pull request Aug 3, 2020

Mapping and collecting an owned Vec to an item with the same size should reuse its allocation rust-lang/rust#75087

Closed

yaahc added the I-nominated label Jun 8, 2021

Mark-Simulacrum removed the T-lang Relevant to the language team, which will review and decide on the RFC. label Jun 22, 2021

rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-close This RFC is in PFCP or FCP with a disposition to close it. labels Jun 23, 2021

m-ou-se removed the I-nominated label Jun 23, 2021

rfcbot added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Jun 23, 2021

rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Jun 23, 2021

rfcbot added finished-final-comment-period The final comment period is finished for this RFC. to-announce and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Jul 3, 2021

rfcbot added closed This FCP has been closed (as opposed to postponed) and removed disposition-close This RFC is in PFCP or FCP with a disposition to close it. labels Jul 3, 2021

rfcbot closed this Jul 3, 2021

nicholasbishop mentioned this pull request Nov 22, 2021

Change Handle representation to be non-nullable so that Option<Handle> is FFI-safe rust-osdev/uefi-rs#309

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New RFC: Collection Transmute #2756

New RFC: Collection Transmute #2756

llogiq commented Sep 3, 2019 •

edited

Loading

Centril commented Sep 3, 2019

BurntSushi commented Sep 3, 2019 •

edited

Loading

hanna-kruppe commented Sep 3, 2019

burdges commented Sep 3, 2019

BurntSushi commented Sep 3, 2019

burdges commented Sep 3, 2019

RalfJung commented Sep 3, 2019 •

edited

Loading

sfackler commented Sep 3, 2019

Diggsey commented Sep 3, 2019

RalfJung commented Sep 3, 2019

llogiq commented Sep 3, 2019

Diggsey commented Sep 3, 2019

clarfonthey commented Sep 3, 2019 •

edited

Loading

burdges commented Sep 3, 2019 •

edited

Loading

llogiq commented Sep 3, 2019

clarfonthey commented Sep 3, 2019

comex commented Sep 4, 2019

djc commented Sep 4, 2019

ExpHP commented Sep 4, 2019 •

edited

Loading

RustyYato commented Sep 4, 2019

hanna-kruppe commented Sep 4, 2019 •

edited

Loading

danielhenrymantilla commented Oct 25, 2019

Shnatsel commented Oct 25, 2019

SimonSapin commented Oct 25, 2019

llogiq commented Oct 25, 2019

joshtriplett commented Sep 30, 2020

the8472 commented Oct 2, 2020

Mark-Simulacrum commented Jun 22, 2021

dtolnay commented Jun 23, 2021

rfcbot commented Jun 23, 2021 •

edited by yaahc

Loading

rfcbot commented Jun 23, 2021

Lokathor commented Jun 23, 2021

rfcbot commented Jul 3, 2021

New RFC: Collection Transmute #2756

New RFC: Collection Transmute #2756

Conversation

llogiq commented Sep 3, 2019 • edited Loading

Centril commented Sep 3, 2019

BurntSushi commented Sep 3, 2019 • edited Loading

hanna-kruppe commented Sep 3, 2019

burdges commented Sep 3, 2019

BurntSushi commented Sep 3, 2019

burdges commented Sep 3, 2019

RalfJung commented Sep 3, 2019 • edited Loading

sfackler commented Sep 3, 2019

Diggsey commented Sep 3, 2019

RalfJung commented Sep 3, 2019

llogiq commented Sep 3, 2019

Diggsey commented Sep 3, 2019

clarfonthey commented Sep 3, 2019 • edited Loading

burdges commented Sep 3, 2019 • edited Loading

llogiq commented Sep 3, 2019

clarfonthey commented Sep 3, 2019

comex commented Sep 4, 2019

djc commented Sep 4, 2019

ExpHP commented Sep 4, 2019 • edited Loading

RustyYato commented Sep 4, 2019

hanna-kruppe commented Sep 4, 2019 • edited Loading

danielhenrymantilla commented Oct 25, 2019

Shnatsel commented Oct 25, 2019

SimonSapin commented Oct 25, 2019

llogiq commented Oct 25, 2019

joshtriplett commented Sep 30, 2020

the8472 commented Oct 2, 2020

Mark-Simulacrum commented Jun 22, 2021

dtolnay commented Jun 23, 2021

rfcbot commented Jun 23, 2021 • edited by yaahc Loading

rfcbot commented Jun 23, 2021

Lokathor commented Jun 23, 2021

rfcbot commented Jul 3, 2021

llogiq commented Sep 3, 2019 •

edited

Loading

BurntSushi commented Sep 3, 2019 •

edited

Loading

RalfJung commented Sep 3, 2019 •

edited

Loading

clarfonthey commented Sep 3, 2019 •

edited

Loading

burdges commented Sep 3, 2019 •

edited

Loading

ExpHP commented Sep 4, 2019 •

edited

Loading

hanna-kruppe commented Sep 4, 2019 •

edited

Loading

rfcbot commented Jun 23, 2021 •

edited by yaahc

Loading