Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ref-wrapping #3382

Closed
wants to merge 2 commits into from
Closed

Ref-wrapping #3382

wants to merge 2 commits into from

Conversation

clarfonthey
Copy link

For #[repr(transparent)] structs and variants, allow &Wrapper(*reference) as a safe alternative to transmutation.

Rendered

@ehuss ehuss added the T-lang Relevant to the language team, which will review and decide on the RFC. label Feb 2, 2023
@QuineDot
Copy link

QuineDot commented Feb 3, 2023

Alternative: Safe transmute.

@Lokathor
Copy link
Contributor

Lokathor commented Feb 3, 2023

This is actually slightly superior to safe transmute because it allows the wrapping type to maintain any extra invariants by having checks within the construction function.

eg: str is a [u8] with the extra invariant that only valid utf-8 bytes are acceptable

# Prior art
[prior-art]: #prior-art

As far as this RFC is concerned, there really isn't much prior art for this. Discussions have been had about using `as`-casting for this purpose, although as mentioned in the alternatives section, that has downsides.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior Art: the ref_cast crate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're listing crates we could mention bytemuck::TransparentWrapper

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also aliri_braid which is designed specifically for strongly-typed string wrappers.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also trapper

@shepmaster
Copy link
Member

I like the general idea (and have indeed done the unsafe version by hand before), but the proposed syntax feels too loosey-goosey to me. I don't have much of a deep explanation as to why, however. I really want there to be a keyword in there because I don't want the conversion to happen "by accident" (ignoring whatever the probably of that occurring would actually be).

I'd be happy if there was some kind of "visibility-limited as", so that the cast is only available in the same places that the proposed RFC allows.

@burdges
Copy link

burdges commented Feb 4, 2023

I suppose unsized types like dyn Trait and struct Foo(u64,str); work here too.

@QuineDot
Copy link

QuineDot commented Feb 4, 2023

unsize is reserved keyword... though I like the constructor has to be visible to enforce the "constructor must be usable" restriction.

let hmm = &Path( unsize *s );

@clarfonthey
Copy link
Author

I suppose unsized types like dyn Trait and struct Foo(u64,str); work here too.

I'm not really sure what you're going for here. dyn Trait would work if nested inside a transparent type, yes, but I wouldn't say this "supports dyn Trait" since that seems unrelated.

In terms of that second example, that actually would not be allowed, since u64 and str are can't be together inside a transparent wrapper. In the case of this feature, it would require having to write the u64 before the str in memory somehow, which is definitely not doable.

@clarfonthey
Copy link
Author

I like the general idea (and have indeed done the unsafe version by hand before), but the proposed syntax feels too loosey-goosey to me. I don't have much of a deep explanation as to why, however. I really want there to be a keyword in there because I don't want the conversion to happen "by accident" (ignoring whatever the probably of that occurring would actually be).

That's fair, although one counter to this is the fact that * can pierce through a wrapper on the outside via deref, and so being able to pierce through via the inside would just require an assertion that the wrapper is transparent. If you do have any keyword examples that seem more compelling though, I'd love to see them!

I'd be happy if there was some kind of "visibility-limited as", so that the cast is only available in the same places that the proposed RFC allows.

So, I did briefly discuss this in the RFC, and I brought up the point that it subtly creates the additional zero-sized fields, which can be a bit weird semantically. Even though conjuring PhantomData out of thin air is exactly the reason why it exists, it feels weird to support this directly via syntax. Would love to hear your opinion on that, though!

(Apologies if you did already read that part; I'm mostly mentioning it because I also have a tendancy to skim RFCs and miss stuff like that.)

@burdges
Copy link

burdges commented Feb 5, 2023

I meant all unsized types work, so

&Wrapper( *( y as &dyn Trait) )

and

struct Foo(u64,str);
let foo: &Foo = ...
&Wrapper( *foo )

but extern types remain a question.

@clarfonthey
Copy link
Author

Oh, I see what you mean now.

I don't believe there would be any problems with extern types considering how the wrapper is transparent, and thus any potential weirdness about how their references work is completely sidestepped by the fact that you already have a reference to one, and therefore can cast it with a no-op under the hood.

@TimNN
Copy link
Contributor

TimNN commented Feb 6, 2023

Is this change backwards-compatible? For a somewhat contrived example, consider this:

// Can only be (safely) constructed via `Inner::new` and
// `Outer::new` because `x` is private.
pub struct Inner { x: u32, }

impl Inner {
    pub fn new(x: u32) -> &'static Self {
        Box::leak(Box::new(Self { x }))
    }
}

// Can only be (safely) constructed via `Outer::new` because
// `Inner` is only exposed via immutable references.
#[repr(transparent)]
pub struct Outer {
    // `x` is guaranteed to be even.
    pub inner: Inner,
}

impl Outer {
    pub fn new(x: u32) -> &'static Self {
        assert!(x % 2 == 0);
        Box::leak(Box::new(Self { inner: Inner { x } }))
    }
}

Right now, unless I'm missing something, this code can assume that instances of Outer will only ever be constructed via Outer::new (because with the existing public API, it's impossible to created an owned / stack-allocated instance of Inner).

That will no longer be the case with this RFC, IIUC, which could be problematic, for example if Outer::new enforces some additional invariants on Inner (like the "is even" check).

The key items that interact here are that we have a #[repr(transparent)] type who's primary field is public, but where specifying a value for the primary field is currently impossible (in this case because it's impossible to construct an instance of the field's type that is not behind an immutable reference).

@clarfonthey
Copy link
Author

@TimNN that code example is kind of breaking my mind, and I don't have a good answer for it. My gut response is that assuming that things are impossible because the APIs don't exist feels like something you can't do in a backwards-compatible way, but I don't know enough of the details to be sure.

Since #[repr(transparent)] as an API guarantees that types are just thin wrappers, even though it's unsafe, it feels like you've given anyone the ability to rely on transmutes, although I don't know if anyone has properly defined that as a breaking change or not.

@TimNN
Copy link
Contributor

TimNN commented Feb 6, 2023

API guarantees that types are just thin wrappers, even though it's unsafe, it feels like you've given anyone the ability to rely on transmutes

They may be able to rely on the layout being the same, but I don't think that says anything about validity. Consider NonZeroUsize, which is a #[repr(transparent)] wrapper around usize. You can't just transmute usize into NonZeroUsize without checking for 0.

This is discussed a bit in rust-lang/compiler-team#411 ("Lang Item for Transmutability") when describing why the Context parameter is needed:

Dst must be fully implicitly constructible (i.e., can be instantiated only using the implicit constructors of the involved types, without resorting to user-defined constructors)

I believe such a restriction would be needed here as well.

@QuineDot
Copy link

QuineDot commented Feb 7, 2023

The example reminded me that there's no language-level distinction between "#[repr] is guaranteed" and "#[repr] is an implementation detail I may change", which is already somewhat problematic.

The example demonstrates that this RFC needs not just "field is public implies #[repr(transparent)] is an API guarantee" but also "field is public and #[repr(transparent)] implies all transmutes from &FieldType to &Self are valid".

@araruna
Copy link

araruna commented Feb 9, 2023

Going back to the proposed syntax for a moment, I'm also not that happy with it as it is, because it would be a surprise for me to see a function returning a local reference being accepted by the borrow checker. The reason why that would be allowed is not local to the function, so that would require investigation.

Also, it it confusing for people learning a language when things have rules with special casing...

@burdges
Copy link

burdges commented Feb 9, 2023

I think &mut *foo is already pretty common ala https://doc.rust-lang.org/src/core/borrow.rs.html#246 and mostly less clear than this. You dereference, but abstractly, convert the dereferenced form, and then return the the reference.

I think the only serous syntax criticism would be if this somehow diverged pedagogically from existing &* uses, but afaik they're really the same thing. There id type inference in choosing whether one applies Deref here, but that's probably fine.

@neoeinstein
Copy link
Contributor

The example demonstrates that this RFC needs not just "field is public implies #[repr(transparent)] is an API guarantee" but also "field is public and #[repr(transparent)] implies all transmutes from &FieldType to &Self are valid".

I think this is a big issue. These additional concerns are the reasons why such casts nowadays require some level of unsafe to perform, because there are additional safety invariants that the compiler cannot enforce for you. As a blanket concern, I think the current proposal introduces mechanisms for potential unsoundness as demonstrated in the Outer/Inner concept above.

@tmccombs
Copy link

tmccombs commented Feb 9, 2023

As an alternative, what if there was a trait thay looked something like:

trait BikeshedName<T> {
     fn bikeshed_name(&T) ->  &Self
}

That could be derived for transparent structs.

Or maybe even have a way to derive From<&str> for &Wrapper(str) and similar for transparent types.

@QuineDot
Copy link

QuineDot commented Feb 9, 2023

#[derive(AsRef<Self> for TheFieldType)]

But with a trait, you lose the "only public if the field is public" aspect, and thus the "I can preserve a logical invariant" aspect.

@tmccombs
Copy link

you lose the "only public if the field is public" aspect

If there was a "private impl" feature, that would go away.

Although I'm not sure if it would be worth blocking this on a different feature.

@clarfonthey
Copy link
Author

I finally got around to updating the RFC to include the existing crates & the privacy concerns first mentioned by @TimNN. I do want to more formally address that last point in the RFC, but for now at least everything is mentioned in the text of the RFC itself.

@dlight
Copy link

dlight commented Feb 27, 2023

@TimNN I think you demonstrated that this operation must, in the general case, be unsafe - it can violate validity invariants, after all.

However one could create an unsafe marker trait that roughly says "doing the ref wrapping thing isn't unsafe" and then it would be available for use in safe code.

@burdges
Copy link

burdges commented Feb 27, 2023

An unsafe repr attribute is less overhead than a marker trait, no? Ala #[repr(transparent,unsafe_ref_wrap)] or whatever.

@clarfonthey
Copy link
Author

Again, I'm not sure why it would have to be unsafe -- the issue here is that it would be a semver-breaking change, and thus has to be opt-in since repr(transparent) is already stable.

@CAD97
Copy link

CAD97 commented Mar 2, 2023

The Inner/Outer has the additional problem(?) that some way to work with unsized values has been discussed/planned for a good while, i.e. feature(unsized_locals) from RFC#1909 (accepted 2017, merged 2018).

Imo, feature(unsized_locals) is sufficient to say that the example is improperly relying on compiler limitations for soundness. However, it also makes the &Wrapper(*raw) syntax proposed by this RFC problematic, since that has meaning under feature(unsized_locals) (of moving the value by-value and taking a reference to the temporary). This could be resolved by saying that a #[repr(transparent)] literal around an unsized value maintains the same place as the raw value, but that's an uncomfortable special case and "spooky" behavior from #[repr(transparent)] to change code semantics.

Perhaps a larger question is that this syntax obviously cannot work for Sized types. As written, the RFC does not restrict the new semantics to ?Sized types, and changes the stable behavior of &Wrapper(*raw) for raw: &mut i32 from taking a reference to a temporary copy to doing a reference cast.

Because of this, we need either not to use a normal literal constructor, or some way to specifically say that we're giving the constructor a place and not the value read from that place.

@dlight
Copy link

dlight commented Mar 2, 2023

Again, I'm not sure why it would have to be unsafe -- the issue here is that it would be a semver-breaking change, and thus has to be opt-in since repr(transparent) is already stable.

Soundness of unsafe code is generally reliant on privacy, and unrestricted transmutes could work around privacy by building structs that can't normally be built with safe code, which could break invariants of unsafe code that depends on the struct not being buildable.

Because of that, the author of the unsafe code must explicitly allow this transmute to happen. An unsafe marker trait was an idea, the unsafe attribute that @burdges proposed was another, but there must be some way for the author of the code to signal that safe transmuting is okay.

@burdges
Copy link

burdges commented Mar 3, 2023

I'm curious @CAD97 where else does *foo mean to copy the referenced value to the stack? I'd just assumed *foo always dealt with the referenced place, but then some usages then implied copies like anywhere.

@CAD97
Copy link

CAD97 commented Mar 3, 2023

*foo does evaluate to a place, which does not necessarily mean that the value behind foo is copied/moved on its own (i.e. if you just discard the place via let _ =). However, the language does not provide many ways to manipulate a place; if the place is used in any context other than the $place ., $place [ $value ], &$mut $place, let $pat = $place, or match $place expressions1, it immediately decays to a value via copy/move. This can be forced by writing {*foo}.

Footnotes

  1. It's possible I've missed some place-preserving expressions.

@clarfonthey
Copy link
Author

Thank you for adding that clarification. I was struggling to see how such a semantic difference would be observable, since most of the time the distinction between mutating the temporary and the result seems like it wouldn't matter much.

As a very bad (imo) syntax for getting around the limitations we could require wrapping the expression in a custom macro, but I don't like this for hopefully obvious reasons.

Once I have more time to elaborate on this I'll take another look through the proposal and see if I can make it more resilient to this, somehow. Might involve a separate repr than transparent, might be different syntax, we'll see.

@scottmcm
Copy link
Member

scottmcm commented Mar 4, 2023

This syntax worries me, since it's making the whole struct literal a weird sort of place, and I don't know how to feel about that.

In particular, if I change one of the examples from str to u32, like

use std::marker::PhantomData;
pub struct WeirdType<T> {
    marker: PhantomData<T>,
    id: u32,
}
impl<T> WeirdType<T> {
    pub fn from_str(s: &u32) -> &WeirdType<T> {
        &WeirdType {
            marker: PhantomData,
            id: *s,
        }
    }
}

then how is the compiler supposed to know that it's supposed to do this place trick, rather than make a reference to a temporary (because the * read the value from the reference-to-Copy-type) and later hit a borrowing error?

Or what would be the syntax if I want to get a view like this of a local variable without moving it? Do I have to write &Foo(*&x), since &Foo(x) works today and moves x?

I think it should use some form that makes it easier to distinguish what's happening. Maybe ptr::addr_of! could be inspiration here? That's also something that emphasizes the "placeness" of what's happening inside it...

TBH, much as I dislike as for numbers, I think it could fit fine here, since it's a lossless conversion. And as a language thing it could still enforce privacy and allow for internal-only (non-semver-promised) use in a way that anything relating to traits and such would make much harder.

@clarfonthey
Copy link
Author

TBH, much as I dislike as for numbers, I think it could fit fine here, since it's a lossless conversion. And as a language thing it could still enforce privacy and allow for internal-only (non-semver-promised) use in a way that anything relating to traits and such would make much harder.

Yeah, no matter what I think that the chosen solution should use the visibility of the constructor to decide who's allowed to perform the cast. That way, the standard privacy rules will still apply, invariants can be upheld, etc.

@burdges
Copy link

burdges commented Mar 4, 2023

the language does not provide many ways to manipulate a place

Alright, but intuitively "zero cost" strongly suggests "places cannot change unless required to". In particular in let Foo(inner) = {*outer}, one expects a place change to not merely be unlikely, but to actually be forbidden.

I suppose the example by @TimNN suggests why this intuition breaks down for constructors.

@clarfonthey
Copy link
Author

Thinking more about this, I think that it would be possible to have a situation where we semantically separate "value transparency" and "referential transparency," where the latter is a stronger version of the former. Right now, we can think of #[repr(transparent)] as value transparency, and since changing this to referential transparency could be a breaking API change, we could potentially make it so that the semantics of #[repr(transparent)] are extended to referential transparency in a future edition. This means that a defined type would have the semantics based upon the edition the crate is compiled in, rather than all transparent types having the same semantics.

Value transparency can still be used for some future version of safe transmutes, but only for values. Meanwhile, referential transparency allows the kinds of operations detailed in this RFC.

I'll update the text later to reflect this, and potentially we could allow a #[repr(ref_transparent)] representation and a #[repr(value_transparent)] version to explicitly specify, avoiding the need for edition differences and instead just deprecating #[repr(transparent)]. But I'll have to work out the details and describe those later.

In terms of the issues with &Wrapper(*value), I think that these semantics also cover this case too-- referential transparency is opting into changing the semantics of this expression to instead "pierce through" the Wrapper constructor. This could be an argument in favour of the as syntax, but at least the semantics aren't ambiguous any more.

@programmerjake
Copy link
Member

Thinking more about this, I think that it would be possible to have a situation where we semantically separate "value transparency" and "referential transparency," where the latter is a stronger version of the former.

i think that's a good idea but I don't like referential transparency naming, since to me that implies something kinda like you can substitute the wrapped type for the wrapper type and vis versa and the program's behavior won't change, like a type alias, however that is not true because each type has a completely independent set of trait impls and they just happen to have identical ABI and be convertible to each other in a way that is a no-op at runtime.

@clarfonthey
Copy link
Author

Yeah, to be fair, I wrote that when I was sleep-deprived and agree with you. "Borrow transparency" is probably a way better name.

@clarfonthey
Copy link
Author

I'm going to close this since I initially thought this would be a pretty cut-and-dry feature, which is why I didn't go through any of the non-RFC forums before just writing this up, but it's clear now that this will need more discussion before the final version is proposed. I could just write up my own version and propose that, but that seems less and less like the path forward nowadays.

Please feel free to build off of the existing work when creating any future proposals, though, as this is still something I would love to see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.