-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safer Transmute #2981
Safer Transmute #2981
Conversation
Continuation of rust-lang/project-safe-transmute#5
This is a fantastic effort! My initial reaction is that the mechanism looks good, but the exact machinery appears to be reaching into the weirdness bucket a little in a way that external libraries sometimes need to, but might not be necessary or desirable for Annotating the stability guarantees of a datatype at its declaration with an attribute seems like a good approach, but using Perhaps the motivating example could become something like: // Let's assume `TransmuteFrom`/`TransmuteInto` are in the prelude...
#[stable_transmute] // declare `Foo` to be *stably* transmutable, as a shortcut for `#[stable_transmute(From, Into)]`
#[repr(C)]
pub struct Foo(pub u8, pub u16);
.. Having stability guarantees expressed as types seems like a good approach so you can use them in where clauses, but using unit structs and Having stability guarantees described by the I don't think any of these really have any bearing on the safe transmute mechanism itself, it's just the tools we want to use to express them. I do personally like the idea of making |
No, we should strive to avoid ever marking things as unsafe "just because", if they can't actually lead to danger. NeglectStability can at worst cause a broken build. |
This isn't really "just because", it's creating a clear distinction between transmutes that are fully stable on both sides, and those that have opted out of any guarantees. On another note, I'm not sure I understand why EDIT: Thinking about this more, I think my problem is that the impression I’m getting from the RFC is that |
@KodrAus Thanks for reading and reviewing!
Yes! You will need to refer to these traits any time you implement them manually. You might implement them manually because you wish to provide a more restrictive stability promise, or because the
I think you mean transmute options, not stability guarantees: the individual transmute options are represented as unit structs, and are specified in groups as tuples. We create a unit type for each of the individual transmute options, because transmute options must be usable at a type-level (because they affect the static behavior of You are correct that actually instantiating these individual options is rarely (if ever) useful. It therefore might make more sense to encode them as void enums, rather than unit structs; e.g.: pub enum NeglectStability {}
This RFC is already very long, and it didn't seem necessary or interesting to actually write out implementations of My preference would be to use ad hoc sum types to encode multiple options (since they aren't order-sensitive, like tuples) and the never type to encode neglecting nothing. Unfortunately, ad hoc sum types do not exist in Rust, and the never type remains unstable. If a better alternative for grouping options emerges in the future, we can simply implement
Yes! Anybody generic building abstractions over
I anticipate an early proliferation of I also expect some continued use by rustaceans on their own types: You might not want to make a public guarantee of your type's transmutability, but still wish to transmute it within the confines of your own crate.
I'm very hesitant about this suggestion. @Lokathor is right: |
Ah ok so it is expected that the
I think we shouldn’t understate this, since it seems like a library would never want to use But yes, that is a valid point. I was mostly trying to mentally group all the neglect options together to separate them from the fully safe, fully stable case. I would be interested to see if there’s a smaller initial surface area that we could commit to that was still broadly useful enough. As an example, how useful do you think an initial stable implementation that only:
Having said all that though I do want to say I think this is a really worthwhile, and very carefully and cleverly produced piece of API! |
Yes, you need to refer to these traits to implement them.
Yeah, this roadmap would probably be fine for casual users. However, a not-insignificant fraction of users will want to manually implement these traits, either because they want to provide a more fine-grained stability promise, or because the
Thanks! ❤️ |
The neglect options shouldn't be held back. For example, I would need them to replace the current |
@Lokathor why does that mean it needs to be stabilized together? |
I don't think you could possibly have an API with this level of inter-complexity and then say that some part of it is stable and other parts aren't fit to be stable. Because it's basically one single design that works as a whole, once you lock in part of it, you've locked in all of it. In other words, I don't think there can be a scenario where part of the concept is fit to be stable without it all being fit to be stable. And so holding back part of the design would make the API simply incomplete for no gain. |
text/0000-safer-transmute.md
Outdated
##### Preserve or Shrink Size | ||
[owned-size]: #Preserve-or-Shrink-Size | ||
|
||
It's completely sound to transmute into a type with fewer bytes than the source type; e.g.: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that this could be a major footgun. Currently, std::mem::transmute
enforces that the source and target types have the same size. This 'safe' transmute is actually providing fewer guarantees than an 'unsafe' transmute.
Concretely, if I have a type
#[derive(PromiseTransmutableFrom, PromiseTransmutableInto)]
#[repr(C)]
pub struct MyStruct {
pub field: u16
}
then changing field
to a u8
will make any safe transmutes in downstream crates silently truncate the source value. Of course, this kind of change is Semver-incompatible - but the normal breakage caused by this type of change is usually visible:
- Code that tries to construct a
MyStruct
with au16
value forfield
will give a type error. - Code that relies on
Fro/Into
impls on the original type (which are no longer present on the new type) will stop compiling. - Code that uses
TryFrom/TryInto
impl for an incompatible type will result in anErr
value.
Of course, code using an as
coercion will continue to compile, but using as
explicitly indicates possible truncation.
By contrast, a call to transmute_into()
will silently start producing a legal value of MyStruct
, but with a meaningless value for field
.
I'm not sure what the best way to resolve this is. However, I think it would be a good idea to require some kind of explicit opt-in for possible truncating transmutes at the call site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 'safe' transmute is actually providing fewer guarantees than an 'unsafe' transmute.
We use the term "transmute" in the broad sense of bit-reinterpretation conversions—a space that encompases pointer casts, union
, mem::transmute_copy
and mem::transmute
. Formally, we define a transmute as the behavior of this function:
#[inline(always)]
unsafe fn transmute<Src, Dst>(src: Src) -> Dst
{
#[repr(C)]
union Transmute<Src, Dst> {
src: ManuallyDrop<Src>,
dst: ManuallyDrop<Dst>,
}
ManuallyDrop::into_inner(Transmute { src: ManuallyDrop::new(src) }.dst)
}
The mem::transmute
intrinsic is the most restrictive of these bit-reinterpretation mechanisms: it cannot be used in generic contexts, and the sizes of its source and destination types must be known before monomorphization time. Accordingly, the red flags you describe are only apparent when mem::transmute
, in particular, was used to do the conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, a truncating transmute should be an Opt Out thing, not the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accordingly, the red flags you describe are only apparent when mem::transmute, in particular, was used to do the conversion.
I disagree - I think silently introducing truncation is generally undesirable. Currently, this kind of silent truncation (AFAIK) only occurs through as
coercions, which only works with primitives. With safe transmutes, this could occur through nested types (e.g. struct Foo(Bar)
).
Additionally, the motiviation for this RFC states:
This RFC provides mechanisms that make many currently-unsafe transmutations entirely safe. For transmutations that are not entirely safe, this RFC's mechanisms make mistakes harder to make.
If the intention is to allow replacing many calls to std::mem::transmute
with a safe transmute_into()
, it's concerning to me that doing so will introduce new types of silent failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By contrast, a call to transmute_into() will silently start producing a legal value of MyStruct, but with a meaningless value for field.
The is scoped to safety and stability, but not sensibility. It does not attempt to stop users from breaking their stability promise in the way you describe, nor from reordering their fields (which will not break the stability promise, but will break end-user code even more subtly), nor from relying on endianness, etc.
There are three major reasons for this:
First, because establishing a default behavior can only be done once, and safe-and-stable is a reasonable default: both safety and stability have clear-cut definitions. There is no clear-cut definition of sensibility. Requiring consensus on which transmutations are sensible would pose a major obstacle to stabilization. Safe-and-stable is a line that can be drawn without additional fuss.
Second, because whether a transmutation is sensible depends on context. I think you might be surprised at how common space-adjusting reinterpretations are! They are exploited in a bunch of our RFC extensions, including this, this, and this.
Third, because it establishes an elegant correspondence between union
s and TransmuteFrom
: a union access is safe iff transmuting the union to that field type is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately: Forbidding size-readjustment by default and providing an option to opt-out of the check also violates this RFC's notional machine.
In the current RFC, ()
denotes a default that's reasonable in all contexts. If you stray from that default, you are taking some responsibility upon yourself to ensure you aren't violating soundness, safety, or stability. For each option you provide, you are declaring your responsibility to ensure you aren't violating soundness, safety or stability.
I strongly dislike that a default no-size-readjustment does not follow this pattern. It is not a reasonable default in certain contexts, and specifying NeglectSize
does not connote any soundness, safety or stability hazard which the end-user must defend against. In these (rather common) contexts, it does not even denote a correctness hazard. It's just line noise.
@Aaron1011 I don't disagree that size-readjustment can be a footgun. I notice that the hazard you describe is associated with calling transmute_from
on a value. The zerocopy-like context I cite doesn't actually involve calling transmute_from
; TransmuteFrom
is just used as a where
bound. The same is true for slice casting..
The dangers of size-adjustments seem like they may only be germane to instances where a value is being transmuted. I therefore wonder if this footgun would better be mitigated by a warning lint. Lints walk the AST after type inference, and thus have the full ability to introspect the types associated with bindings. The hazard you describe is associated with calling transmute_from
on a value. A lint would achieve the goal of making your described breaking non-silent, and could localize the issue to the particular calls of transmute_from
that introduce it.
A lint could also point out other possible footguns, like endianness-related hazards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jswrenn: Thanks for the extremely detailed replies! You've convinced me that adding a size-readjustment restriction at the type level has a number of issues. I think a lint would address all of my concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the extremely detailed replies!
Thanks for the incisive question!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the documentation for safe transmute, we should try to steer people away from using it for "normal" type conversion.
I think there's some danger of this RFC being "too ergonomic" compared to other areas of the language: From/To traits are the "sensible" way to convert values, but they cannot be automatically derived, and implementing them can be verbose. I could see this leading to over-use of transmute_into()
simply because it's more effort to do the conversion properly!
Still I don't think that's a good enough reason not to go ahead with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Aaron1011, bb9b988 implements the discussed lint.
I could see this leading to over-use of transmute_into() simply because it's more effort to do the conversion properly!
@Diggsey, I suspect the addition of the Scope
parameter to TransmuteInto
saps much of the ease of just calling .transmute_into()
. Still, your comment makes me wonder if we could simply delete the transmute_into
and unsafe_transmute_into
methods from this RFC without any drawbacks...
Thanks @Lokathor. I do think there is gain in finding an uncontentious starting point to stabilize that serves real usecases and avoids blocking bringing the benefits to anybody on the fringes of the design. I can appreciate that this API is really quite bound together though and that the actual feature is really the stability system to support safe transmutation. So to summarize my thoughts so far:
I’m going to stop actively commenting on this thread for a bit now just to avoid creating more noise so others can review and comment. Thanks for all your effort so far! ❤️ |
I don't have any substantive comments on the specific API other than that I read the entire RFC and really liked it. It seems well designed and thought out, and I've read the APIs of a number of the available safe transmutation proc macro crates. If the features described in this RFC are released it'll be my favorite new Rust feature ever. So many of my design ideas for high performance software revolve around being extremely careful with memory layout and being able to re-use the same layouts for manipulating values in the program and for storage or networking, in order to avoid loading and serialization time, which I think is often taken as a given precisely because the casting style is so unsafe and difficult. Example areas where this kind of stuff is useful include mmap-able file formats that allow the OS to page in needed sections on demand without having to re-implement paging and caches in userspace, high-performance networking using low-latency cards with message processing times in the microseconds, shared memory communication between multiple processes, and systems for dealing with huge data sets where every byte of overhead counts so data is packed together very cleverly in compact layouts in a byte array. I really like how the API in this RFC cleverly manages to cover so many useful cases. For example I'm a big fan of the availability of the Other than less safe transmutation or (hopefully carefully checked) libraries with proc macro wrappers for limited cases, the other available alternative is to generate wrapper structs for byte buffers with getters and setters. For example the Rust Cap'n'proto implementation works like this. There's a number of downsides of this approach: First of all working with these types can feel a lot less nice than normal structs, especially if you want to use them for all your internal logic to avoid ever having to copy them into an internal format. Secondly you have to generate a ton of code, which presumably generates lots of IR which LLVM has to then doggedly inline away and then optimize your byte buffer indexing logic down to something efficient, probably failing to optimize a lot of redundant bounds checks. I haven't benchmarked this but I'd guess that it leads to bad compile times and bounds checks on nearly every field access (unless you generate code using unsafe indexing, at which point you might as well just transmute). You might think that the circumstances where you can parse by casting are really limited, since your protocol has to match the alignments and layout of Rust's built in types, but there are tricks that allow it to be more broadly applicable. For example you can write wrapper structs around byte arrays of the correct size which have methods to convert them to a more Rust-safe form, for example a message protocol with 6-byte unaligned timestamp fields can use a One element of this RFC that's interesting and different than many of the crates is the ability to do transmutations where neither end is a buffer of bytes. Most of my ideas have only needed the buffer case, but I like the generality and I can see a few benefits. First, just generality to different slice alignments is nice, for the custom protocol or packing use case you can often decide that for performance/convenience everything will be aligned to 4 or 8 bytes, then you can do one allocation or cast up front into a So yah, I'm really excited about this RFC. See also the engagement around @BurntSushi tweeting about a draft of this for more people being excited about Rust adding this kind of functionality. |
Updates:
|
Author of zerocopy here. This is kind of a surreal experience for me. When I created zerocopy over two years ago, the goal was always to use zerocopy as a proof of concept to justify a more general proposal like the one presented in this RFC. I have to pinch myself to remind myself that this is actually real - that we've actually finally gotten to a concrete proposal for a full-fledged safe transmute system. The zerocopy crate can die happy now :) On a more concrete note, it will be possible to redefine the features of the zerocopy crate entirely in terms of the traits defined in this RFC. It's unclear whether zerocopy should go away entirely; there may still be utilities like those in the byteorder module that are useful for zerocopy to provide. But even if that happens, this RFC will do all of the heavy lifting, and zerocopy will just be a collection of simple utilities. At a minimum, I'll get to entirely deprecate the zerocopy-derive crate, and simplify much of the zerocopy crate itself. Massive kudos to Jack and the Safe Transmute Project Group. Thanks especially to Jack for the insane amount of work that went into this RFC. |
``` | ||
...because not all instances of [`u8`] are valid instances of `Bool`. | ||
|
||
Another example: While laying out certain types, Rust may insert padding bytes between the layouts of fields. In the below example `Padded` has two padding bytes, while `Packed` has none: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another couple of examples that may be worth point out:
- Transmuting from a
*T
to a&T
, because*T
can be null (0) but&T
can't. - Transmuting from a
*T
(or other pointer type) to a*U
if U has higher alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all values of * T
are valid values of * U
assuming that T
and U
are Sized
, since a pointer is not required to be aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transmuting from a *T to a &T, because *T can be null (0) but &T can't.
And because &T
must be aligned and dereferencable and point to a valid T
. There are many conditions on references.^^
text/0000-safer-transmute.md
Outdated
|
||
|
||
|
||
### 📖 When is a transmutation safe? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This primarily talks about structs and enums, but what about primitive types?
Primitive types, specifically references, have some safety concerns that I don't see addressed in here anywhere (unless I'm missing something). For example:
- transmuting a
NonZeroUsize
to a&u8
is sound (both have well-defined layout, bit-validity is preserved, and alignment is preserved), but not safe (the value could point to invalid memory). - transmuting a
&[u8]
to a&str
is currently unsound due to the absence of a well-defined layout, but they did have well-defined layouts then the transmutation would be sound, but not safe, since the&[u8]
may have invalid byte sequences for a utf-8 encoding.
The easiest thing to do is to say that references can't be safely transmuted, but that is more restrictive than necessary. A more flexible approach would be to do something like only allow transmuting types containing references, if that segment of memory in the type is a reference in both the source and destination types, and the types pointed to are transmute compatible.
Meaning that this is fine:
#[repr(C)]
struct A(&u32);
#[repr(C)]
struct B(&[u8; 4]);
let b: B = A(1u32).transmute_into();
But this isn't:
#[repr(C)]
struct A(&u32);
#[repr(C)]
struct B(NonZeroUsize);
let a: A = unsafe {B(NonZeroUSize::new_unchecked(1)) }.transmute_into();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transmuting
NonZeroUsize
to&u8
is sound but not safe
That's an open question, actually. It's not decided whether &T
has a validity requirement to be dereferencable yet.
IIRC, the UWG's consensus currently seems to be that &T
's validity requirement should be just "aligned and nonnull" and the safety requirement be "dereferences to a T
that holds its safety requirements," but that the safety requirement of the reference is aggressively asserted whenever the reference undergoes a typed copy.
Or perhaps what the borrow model (Stacked Borrows) asserts on a copy of the reference is between those two? Just that the reference dereferences to the correct amount of memory with the correct provenance? Just that the pointee is valid, but not necessarily safe?
In any case, the exact validity/safety requirement split for references is not decided yet, so any transmutation provided by this document should keep that in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever I was unsure of whether a requirement was a soundness constraint or just a safety constraint, I erred on side of soundness. (Except in my introduction of NeglectAlignment
, which I've now made note of, and will correct quite soon.) A soundness constraint can be later relaxed into a safety constraint, but not the other way around.
Ultimately, I try to avoid getting too into the nitty-gritty of the transmutability rules. The primary goal of this RFC is to specify an API that's general enough to encompass whatever rules there happen to be, not to specify Rust's soundness and safety rules. The examples in this section are intended as illustrative descriptions, not prescriptions.
Types with exotic representations, like NonZero*
will require some special implementation attention.
A [sound transmutation] must: | ||
- [preserve or shrink size][reference-size], | ||
- [preserve or relax alignment][reference-alignment], | ||
- [preserve or shrink lifetimes][reference-lifetimes], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this also include mutability? That is, it is unsound to transmute from a &T
to a &mut T
.
Also, these constraints must be followed for sound transmutations if there are references inside of the source or destination types, right? Should that be more explicitly mentioned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this also include mutability? That is, it is unsound to transmute from a
&T
to a&mut T
.
This is addressed by the heading "Preserve or Shrink Uniqueness".
Also, these constraints must be followed for sound transmutations if there are references inside of the source or destination types, right? Should that be more explicitly mentioned?
That's the gist, yes! I'm in the process of creating a Redex model of this section's semantics for use as an implementation reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, great work everyone involved! When reading the RFC, I was thoroughly impressed by the precise language and clear concepts the RFC establishes. I have hardly any comments on that front, which is a rare occasion. ;)
I do have a few comments, of course, which I added in this review. I focused on the high-level concepts and mostly skimmed over how they are actually realized using traits. I do not have strong opinions on how to design the concrete API and will happily leave that bikeshed to others.
text/0000-safer-transmute.md
Outdated
#[repr(C)] | ||
pub struct Foo(pub Bar, pub Baz); | ||
``` | ||
The alignment of `Foo` affects transmutability of `&Foo`. A `&Foo` cannot be safely transmuted from a `&Bar` if the alignment requirements of `Foo` exceed those of `Bar`. If we don't want to promise that `&Foo` is stably transmutable from virtually *any* `Bar`, we simply make `Foo`'s `PromiseTransmutableFrom::Archetype` a type with maximally strict alignment requirements: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this assume that there is a maximal alignment? I do not think the language spec has anything like that currently. Also that seems like a rather roundabout way to constrain reference transmutability, the "simply" in the text notwithstanding. ;) This is not simple at all, I feel.
Like, doesn't this still allow transmuting to types that do have this maximal alignment requirement? I guess the point is that that's okay; what we are really saying here is that the semver-stable guarantee is that alignment will be <= max
(and >= 1
for the second example). The text could explain this a bit more directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this assume that there is a maximal alignment? I do not think the language spec has anything like that currently.
229 is stated as the maximal alignment in the reference.
I guess the point is that that's okay; what we are really saying here is that the semver-stable guarantee is that alignment will be <= max (and >= 1 for the second example).
Yep! In practice, the whole point of limited stability promises is giving yourself the leeway to make changes in the future without breaking anybody's builds. I think most people could pick an alignment far below 229 and still have plenty of wiggle room for future changes. ;-)
The text could explain this a bit more directly.
Agree! I'll revisit this in my next round of edits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
229 is stated as the maximal alignment in the reference.
Oh, I did not know this, interesting.
Still seems a bit odd to hardcode that value in a file.^^ Maybe swap the two examples; the one that says "alignment will always be at east 1" is simpler and more convincing IMO.
text/0000-safer-transmute.md
Outdated
|---------------------|-------------|---------------------------------------------------------| | ||
| `NeglectStabilty` | Stability | `transmute_{from,into}`, `unsafe_transmute_{from,into}` | | ||
| `NeglectAlignment` | Safety | `unsafe_transmute_{from,into}` | | ||
| `NeglectValidity` | Soundness | `unsafe_transmute_{from,into}` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe NeglectSafety
or NeglectSoundness
would be a better term?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names are specific, because they communicate the specific responsibility you assume by neglecting each check. If you NeglectValidity
, the onus is on you to dynamically ensure Src
is a bit-valid instance of Dst
before transmuting. Likewise, if you NeglectAlignment
, the onus is on you to check alignment (before transmuting in the case of references, and before dereferencing in the case of raw pointers — the "Compromises" reality is perhaps a little more nuanced than the column suggests).
Using specific names also reduces the risk of "overloading" an option with meaning. For instance, a future NeglectConstructability
option would also compromise safety.
Finally, the boundary between soundness and safety may change over time; things which are soundness properties could graduate to safety properties. For instance, while it is currently to UB to create references that are mis-aligned, future versions of Rust plausibly phrase this as a dynamic safety property instead: that is is UB to de-reference misaligned referenced. I don't expect this will happen, in this instance, but it's not unheard of. I think I recall some recent conversations about whether the bit-validity of str
could relax from being a soundness issue to a safety issue...
Anyways, I think the "Compromises" column makes the model seem a bit neater than it really is, and should probably be omitted.
pub struct NeglectAlignment; | ||
|
||
impl TransmuteOptions for NeglectAlignment {} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs here should be very careful to say that it is not ever okay to neglect alignment without a dynamic check. Creating unaligned references is insta-UB (as y'all know but having this option available could make it seem like that's something unsafe code might legitimately do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do! I'll tackle this in my next round of edits.
Just to confirm: it is okay to construct a mis-aligned raw pointer, just not to dereference it. In other words, for references you must do the check before conversion; for raw pointers, it's okay to do it after?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm: it is okay to construct a mis-aligned raw pointer, just not to dereference it. In other words, for references you must do the check before conversion; for raw pointers, it's okay to do it after?
Yes, it is safe to construct misaligned raw pointer. 1 as *const u32
is valid safe rust code today, and similarly pointer casts between types of different alignment is also safe. The alignment requirement must be upheld at deref time. See also https://doc.rust-lang.org/std/ptr/index.html#alignment
text/0000-safer-transmute.md
Outdated
By **safer transmutation** we mean: *what `where` bound could we add to `transmute` restricts its type parameters `Src` and `Dst` in ways that statically limit the function's misuse?* Our answer to this question will ensure that transmutations are, by default, *sound*, *safe* and *stable*. | ||
|
||
### 📖 Soundness | ||
A transmutation is ***sound*** if the mere act of transmuting a value from one type to another is not unspecified or undefined behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I must have overlooked this... while the terminology here is self-consistent within the file, it is not consistent with the wider ecosystem.
Specifically, see the UCG definition of soundness, which corresponds to what you call "safety".
If you'd ask me to define safety I'd ask "safety of what". "safety of code" IMO means "no unsafe {}
" (Rustaceans say things like "you can do this in safe code"); "safety of a program execution" IMO means that there was no UB ("this run of the program is memory and thread safe").
I thought the UCG's use of "soundness" corresponds to the usual consensus, but maybe not -- I'd be interested in feedback. For this RFC, the odd one is the notion you call "soundness"; I don't think I have seen a good name for it before -- I would have called it "not UB"/"UB-free". Maybe "well-defined" would work -- the transmute itself has well-defined semantics, but unless it is sound (in the UCG sense) it can still cause UB later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have called it "not UB"/"UB-free".
That isn't quite right though. Both "soundness" and "safety" as described in this RFC are about avoiding undefined behavior. The distinction is where that happens. "soundness" is about whether undefined behavior happens at the transmutation itself, whereas "safety" is about whether undefined behavior could occur at some later point after the transmutation (for example attempting to dereference an invalid pointer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That isn't quite right though. Both "soundness" and "safety" as described in this RFC are about avoiding undefined behavior. The distinction is where that happens.
Yes, I am aware. My point is, a sound-but-not-safe (in your terminology) transmute is not UB itself. That's why I think it makes sense to call it "well-defined" or "UB-free".
Usually, "sound" means "can be called from safe code in arbitrary ways without causing UB", so my proposal is certainly not more wrong than what the RFC says. ;)
"soundness" is about whether undefined behavior happens at the transmutation itself, whereas "safety" is about whether undefined behavior could occur at some later point after the transmutation (for example attempting to dereference an invalid pointer).
I understand that that's how the RFC uses those words. However, I do not think that's how anyone else uses those words -- I certainly have not seen this use of "sound"/"safe" before (even though I am involved in many discussions around this topic), and the UCG glossary linked above represents consensus of the UCG WG, so quite a few people doing unsafe-related work agree that your use of "sound" is not what that word usually means.
Specifically, under what I consider the normal meaning of the word sound, it is not sound to transmute 3u8
to a EvenU8
library type, assuming that this library type has a method that crucially relies on evenness for safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact it's not just the UCG, even the Rust reference agrees with me (well, I wrote that text, but it passed review):
It is the programmer's responsibility when writing unsafe code to ensure that any safe code interacting with the unsafe code cannot trigger these behaviors. unsafe code that satisfies this property for any safe client is called sound; if unsafe code can be misused by safe code to exhibit undefined behavior, it is unsound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying that this usage of the RFC's usage of soundness/safeness is right, just that I'm not sure "UB-free" is any better or less confusing. TBH, I'm not entirely sure what the value of distinguishing between these two kinds of safety is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying that this usage of the RFC's usage of soundness/safeness is right, just that I'm not sure "UB-free" is any better or less confusing.
Fair. I think I personally prefer "well-defined" -- in the sense that a UB transmute literally has no meaning. A transmute that is not itself UB but might cause UB later has meaning (the Rust Abstract Machine says what happens when you run it); it is only later that the program could become meaningless.
Unsafe code frequently works with things that are not UB but would be UB if arbitrary safe code could now run. In this old blogpost I call such data "valid but not safe". I think that is basically the same distinction that the RFC is making. So I think the distinction is fundamental, but there is no consensus on how exactly to call it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the term "well defined". I'll switch to it in my next round of edits.
text/0000-safer-transmute.md
Outdated
|---------------------|-------------|---------------------------------------------------------| | ||
| `NeglectStabilty` | Stability | `transmute_{from,into}`, `unsafe_transmute_{from,into}` | | ||
| `NeglectAlignment` | Safety | `unsafe_transmute_{from,into}` | | ||
| `NeglectValidity` | Soundness | `unsafe_transmute_{from,into}` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if there's a missing option here. As far as I can tell "Validity" refers specifically to bit validity - i.e., is the in-memory representation a valid representation of this type. But there's also higher-level validity concerns such as correctness. A NonZeroUsize
is a valid &NonZeroU8
, it has the correct alignment proprieties, and it is stable, but of course it might not point to a valid NonZeroU8
. Do we need to have separate neglect options for higher-level invariants? Or if validity is meant to capture this already, does it make sense to distinguish bit validity from invariant validity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A NonZeroUsize is a valid &NonZeroU8, it has the correct alignment proprieties, and it is stable, but of course it might not point to a valid NonZeroU8.
That is not decided yet -- it might be that &NonZeroU8
actually must point to something non-zero immediately when constructed, making this part of the fundamental validity rules. In fact that is the status quo, as relaxing this in the future is easier than strengthening it. See rust-lang/unsafe-code-guidelines#77.
text/0000-safer-transmute.md
Outdated
impl<T, N> PromiseTransmutableFrom for GenericArray<T, N> | ||
where | ||
T: PromiseTransmutableFrom, | ||
N: ArrayLength<T>, | ||
|
||
// for lack of implied bounds, we must repeat the bounds on `Archetype`: | ||
GenericArray<T::Archetype, N> | ||
: TransmuteInto<Self, NeglectStability> | ||
+ PromiseTransmutableFrom, | ||
{ | ||
type Archetype = GenericArray<T::Archetype, N>; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it feasible to limit uses of NeglectStability
to only implementations of PromiseTransmutable*
? (Similar to how !
is currently stably usable in one context, but not others?)
@mahkoh @Lokathor the "safe only inside std" comment in If you have |
Ah, well of course, that would be a different situation. I thought that |
@mahkoh this RFC is concerned with transmutes, not with adjusting existing APIs that should use I propose we focus the discussion here on |
let _: NonEmptySlice<'static, u8> = transmute!([0usize; 2]); // Compile Error: `NonEmptySlice<_, _>` is not constructible here. | ||
``` | ||
If a field is private, then instantiating or modifying it via transmutation is not, generally speaking, safe. | ||
The `Muckable` marker trait signals that your type's fields may be safely initialized and modified to *any* value. If you would not be comfortable making your type's fields `pub`, you probably should not implement `Muckable` for your type. By implementing `Muckable`, you promise to treat *any* observable modification to your type's layout as a breaking change. (Unobservable changes, such as renaming a private field, are fine.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `Muckable` marker trait signals that your type's fields may be safely initialized and modified to *any* value. If you would not be comfortable making your type's fields `pub`, you probably should not implement `Muckable` for your type. By implementing `Muckable`, you promise to treat *any* observable modification to your type's layout as a breaking change. (Unobservable changes, such as renaming a private field, are fine.) | |
The `Muckable` marker trait signals that your type's fields may be safely initialized and modified to *any* value. If you would not be comfortable making your type's fields `pub`, you probably should not implement `Muckable` for your type. Think of it as adding a public constructor that takes a value for each field, and constructs a value of your type from this. | |
By implementing `Muckable`, you further promise to treat *any* observable modification to your type's layout as a breaking change. (Unobservable changes, such as renaming a private field, are fine.) |
If a field is private, then instantiating or modifying it via transmutation is not, generally speaking, safe. | ||
The `Muckable` marker trait signals that your type's fields may be safely initialized and modified to *any* value. If you would not be comfortable making your type's fields `pub`, you probably should not implement `Muckable` for your type. By implementing `Muckable`, you promise to treat *any* observable modification to your type's layout as a breaking change. (Unobservable changes, such as renaming a private field, are fine.) | ||
|
||
As a rule, the destination type of a transmutation must be `Muckable`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why just destination? When I transmute, e.g., Vec
to something, I rely on stability of Vec
's layout.
The "safe initialization" aspect of Muckable
is only relevant for the target type, but the "stable layout" aspect is relevant for both source and target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I think intertwining safety and stability is a mistake, my only hesitation is that this change imposes an all-or-nothing stability regime in yet another place. Total stability isn't always relevant for the source type. If you are using TransmuteFrom
to write bytes of Src
onto a wire (but not read them), the total stability of Src
doesn't necessarily matter. If you are using TransmuteFrom
to assert a type-level invariant but not actually transmuting, the total stability of Src
doesn't necessarily matter. Requiring an all-or-nothing stability regime on both the source and destination types for safety withholds safe transmutation from the use-cases where these requirements are too coarse.
However, now that safety and stability are intertwined and stability is mostly all-or-nothing, I agree it makes sense to require that Src
is layout-stable, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is all a consequence of Muckable
saying two things at once.
|
||
For transmutations where the destination type involves mutate-able references, the constructability of the *source* type is also relevant. Consider: | ||
For transmutations where the destination type involves mutate-able references, the `Muckab`ility of the *source* type is also relevant. Consider: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For transmutations where the destination type involves mutate-able references, the `Muckab`ility of the *source* type is also relevant. Consider: | |
For transmutations where the destination type involves mutable references, the `Muckab`ility of the *source* type is also relevant. Consider: |
pub macro transmute($expr: expr) { | ||
core::convert::transmute::TransmuteFrom::<_>::transmute_from($expr) | ||
// ┯ | ||
// ┕ the destination type of the transmute (`_` used to infer the type from context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotation seems misaligned (at least in GH's diff view).
type A = Here!(); | ||
type B = Here!(); | ||
#### `NeglectSafety` | ||
By default, `TransmuteFrom`'s methods require that all instantiations of the source type are `Muckable`. If the destination type is a mutate-able reference, the source type must *also* be `Muckable`. This precludes transmutations that are [well-defined][sound transmutation] but not [safe][safe-and-stable transmutation]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "all instantiations of the source type" mean? Is this any different from "requires that the source type be Muckable
"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should read:
By default, `TransmuteFrom`'s methods require that all instantiations of the source type are `Muckable`. If the destination type is a mutate-able reference, the source type must *also* be `Muckable`. This precludes transmutations that are [well-defined][sound transmutation] but not [safe][safe-and-stable transmutation]. | |
By default, `TransmuteFrom`'s methods require that the destination type is `Muckable`. If the destination type is a mutate-able reference, the source type must *also* be `Muckable`. This precludes transmutations that are [well-defined][sound transmutation] but not [safe][safe-and-stable transmutation]. |
|
||
| Transmute Option | Usable With | | ||
|---------------------|---------------------------------------------------------| | ||
| `NeglectAlignment` | `unsafe_transmute_{from,into}` | | ||
| `NeglectValidity` | `unsafe_transmute_{from,into}` | | ||
| `NeglectSafety` | `unsafe_transmute_{from,into}` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The confusing part about this naming is that any of the Neglect
makes things unsafe (I assume), but only one of them is called NeglectSafety
. Maybe NeglectMuckable
or so would work better?
- `T` has no fields, or | ||
- `T`'s fields are fully implicitly constructible from the scope. | ||
## Implementation Guidance | ||
Two items in this RFC require special compiler support: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two items in this RFC require special compiler support: | |
Three items in this RFC require special compiler support: |
A `Src` is *safely* transmutable into `Dst` in a given if: | ||
1. `Dst: Muckable` | ||
2. `Dst: TransmuteFrom<Src, Neglect>` | ||
3. `NeglectSafety` ∉ `Neglect` | ||
|
||
If `Src` is a mutatable reference, then additionally: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"is or contains", I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"is or contains", I assume?
You assume correctly. I'll just use "is or contains" instead.
I agree completely and nobody has suggested this.
What is and isn't sound (at this high level) can be chosen by compiler writers. You're well aware of this. Compiler writers are also able to provide escape hatches to make operations that would otherwise be unsound sound. What people don't need are endless discussions about what is and isn't sound. What they need is clear guidance and the right tools to make the code they are writing anyway reliably sound. Otherwise Rust will go the way of C: [1], [2], [3]. People will not stop writing such code just because you need a few more years to come up with a usable model. Code that passes data with padding bytes to functions accepting byte arrays is used widely. (Maybe less so in Rust but certainly in C and C++.) In the context of this RFC I consider |
Language designers, not compiler writers, but -- sure. However, I didn't argue that this could not be made sound. I argued that it cannot be made sound by this RFC, based on its pretty clearly stated goals and scope. Thus, any discussion of making this sound is off-topic for this RFC.
Given that the RFC is specifically about safe transmute and not about safe unions, that makes little sense. This RFC is not about the underlying operational semantics of |
@RalfJung Discussing this particular use case is besides the point. My original post attacked the RFC as a whole including its stated goals and scope. a) The work of the working group should be based on a comprehensive survey of how transmute is used in practice and that includes unions and pointer casts. This survey should be published before the first RFC unless such an RFC is very small in scope (which this RFC is not: That the replies then tried to address my list of use cases constructively but in an ad-hoc fashion indicates to me that such a survey did not happen. b) Even if the working group chooses to first focus on use cases that they already know and understand. The mechanisms proposed to address these use cases do not seem to be well understood (Scopes, Archetype, visibility, etc.). For example, the problems with Archetype should have become clear long before the RFC was published. That the Muckable trait then goes in the complete opposite direction and completely restricts the type layout seems to be an overreaction. That such a restrictive trait is not sufficient would have become clearer if a survey had been performed. I'm filing all of this under "there is too much confusion." On a positive note: The core of the RFC (TransmuteFrom) seems to be a strict improvement on top of |
I feel uncertain about the addition of
I'm fairly confused about why
I agree that it would have been better for this very controversial aspect of the design to have been surfaced to the entire lang team sooner, but such is life. The role of the final RFC process is to allow the community and responsible team final review without obligating them to follow the discussion of the working group. |
Indeed, discussing anything that involves changing what is and is not UB is besides the point. That is exactly my point. You have been the one to ask the RFC to do that. The issue of uninit memory in My impression is that the RFC authors did a great job gathering usecases and not being distracted by orthogonal problems such as handling uninitialized memory. Trying to solve all problems at once is a great way of never getting anything done. Thus I think your criticism is unfair. Just because your favorite issue is not entirely solved by this RFC, does not mean the RFC authors did a bad job. |
You're twisting my words. I did not ask the RFC to change what is and isn't UB. I stated that providing a function/intrinsic that transforms
I see almost no connection between reading into
This is not an orthogonal problem. It's a core part of the RFC that padding bytes in the source must map to padding bytes in the destination. Overcoming this restriction is a natural next step that should be discussed (at least mentioned) in this RFC.
You're misrepresenting my comments. I did not ask for the RFC to solve all problems. I asked for it to display a comprehensive understanding of the problem space even if it only aims to address a small subset right now. In fact, nothing I asked the working group to address (uninitialized memory and unsized types among others) would prevent the goals of this particular RFC to be achieved independently. I do not understand why you are being so confrontational right now. Yes, many people are agreeing with you that this is a great and very comprehensive RFC. But they've been saying that since the RFC was initially posted. Yet, the RFC has had to receive significant revisions since then because the solutions it proposed were found to be insufficient. The author even agrees with me that the current version will probably have to be modified further to add more flexibility to the Muckable trait. Like I've said many times before, I do not disagree with an RFC that only tries to solve a limited number of problems. |
I wasn't the one to talk about "attacking" the RFC. But anyway, we seem to agree on the intended scope of the RFC. We can discuss elsewhere how to solve |
Do we have any idea of the status of this RFC? In particular whether there's interest in implementing something in the compiler with an eye towards stabilizing the minimal Maybe having some real tires to kick will help resolve some of these ongoing design questions? |
I think the project is stalled by normal winter obligation stuff. |
Sorry for the quiet, @KodrAus! I've been totally swamped navigating the home stretch of my PhD, but there has been movement on this! @joshtriplett and I recently discussed the shape of a minimal/initial/foundational proposal for a |
Wow, such a beautiful work! I don't know if someone already pointed that out, but regarding one of the last paragraphs:
We cannot automatically implement We may consider We may, however, automatically implement |
As a next step, Project Safe Transmute has filed a major change proposal to add an experimental, unstable API for checking transmutability. This low-level API will provide a foundation atop which we can experiment with various abstractions over transmutability. Eventually, we'll circle back around and file one or more RFCs based on what we've learned! For now, I'm going to close this RFC. Thank you everyone for your invaluable feedback! |
Rendered
This is the first RFC of Project Safe Transmute.
Additional discussion on the Project Safe Transmute zulip stream.
Related Reading:
For previous discussions of earlier versions of this proposal, see: