Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Forbid object lifetime changing pointer casts #136776

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

BoxyUwU
Copy link
Member

@BoxyUwU BoxyUwU commented Feb 9, 2025

Fixes #136702

r? @ghost

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 9, 2025
@BoxyUwU
Copy link
Member Author

BoxyUwU commented Feb 9, 2025

@bors try

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 9, 2025
…, r=<try>

[WIP] Forbid object lifetime changing pointer casts

Fixes rust-lang#136702

r? `@ghost`
@bors
Copy link
Contributor

bors commented Feb 9, 2025

⌛ Trying commit d5ebeac with merge 44f3504...

@bors
Copy link
Contributor

bors commented Feb 9, 2025

☀️ Try build successful - checks-actions
Build commit: 44f3504 (44f3504e96c944ae54fc72b5f5008f53f7eda001)

@BoxyUwU
Copy link
Member Author

BoxyUwU commented Feb 9, 2025

@craterbot check

@craterbot
Copy link
Collaborator

👌 Experiment pr-136776 created and queued.
🤖 Automatically detected try build 44f3504
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-crater Status: Waiting on a crater run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 9, 2025
@craterbot
Copy link
Collaborator

🚧 Experiment pr-136776 is now running

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot
Copy link
Collaborator

🎉 Experiment pr-136776 is completed!
📊 169 regressed and 4 fixed (580506 total)
📰 Open the full report.

⚠️ If you notice any spurious failure please add them to the denylist!
ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Feb 11, 2025
@RalfJung
Copy link
Member

RalfJung commented Feb 11, 2025

Most of these are on github; in terms of crates.io regressions all we have is:

  • may
  • a bunch of crates using metrics, see e.g. this (for metrics-0.23) or this (for metrics-0.24, the latest version)

Overall, 142 regressions are caused by metrics and 14 by may; if we ca get fixed versions of those crates out that seems to mostly cover it.

EDIT: Ah, there's also cogo.

@traviscross traviscross added the I-lang-nominated Nominated for discussion during a lang team meeting. label Feb 12, 2025
@traviscross
Copy link
Contributor

We discussed this in the lang triage call today. We wanted to think more about it, so we're leaving it nominated to discuss again.

@tmandry
Copy link
Member

tmandry commented Feb 19, 2025

@BoxyUwU Do you think it would be possible to implement this as an FCW? We talked about this in lang triage today and would prefer to start with that if we can. If it's not feasible, a hard error can also work (I would say though that we should upstream PRs to any crates we break).

Another small thing I noticed is that the error message links to the Nomicon section on variance, but it would be ideal to link to a tracking issue or something describing this issue in particular.

@traviscross
Copy link
Contributor

traviscross commented Feb 19, 2025

To add on to what tmandry, said, in our discussions we did feel that the approach taken in this PR is generally the right way forward, and we're happy to see this progress so as to help clear the way for arbitrary_self_types and derive_coerce_pointee.

cc @rust-lang/lang

@BoxyUwU
Copy link
Member Author

BoxyUwU commented Feb 26, 2025

@tmandry I do expect it to be possible to FCW this. We can likely do something hacky around to fully emulate the fix (but as a lint), but if that doesn't work out all the regression we found were relatively "simple" cases that can probably be taken advantage of (if need be) to lint a subset of the actual cases we'd break with this PR

edit: see compiler-errors' comment, I'm not so convinced this will be possible to FCW anymore and will likely investigate improving the diagnostics here. I've already filed PRs to the affected crates to migrate them over to a transmute to avoid the breakage if this lands

@compiler-errors
Copy link
Member

I was thinking earlier that it may be possible to implement a lint to detect, but it seems to me that MIR borrowck is not equipped to implement such a lint.

Specifically, it seems near impossible to answer whether a region outlives constraint (like, 'a: 'b) would not hold in a way that doesn't actually commit to that constraint, at least not without tons of false positives based on how NLL computes lower bounds for all of the regions it deals with in the MIR.

To fix this would require some significant engineering effort to refactor how NLL processes its region graph to make it easier to clone and reprocess with new constraints.

workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Mar 4, 2025
…uto_to_object-hard-error, r=oli-obk

Make `ptr_cast_add_auto_to_object` lint into hard error

In Rust 1.81, we added a FCW lint (including linting in dependencies) against pointer casts that add an auto trait to dyn bounds.  This was part of work making casts of pointers involving trait objects stricter, and was part of the work needed to restabilize trait upcasting.

We considered just making this a hard error, but opted against it at that time due to breakage found by crater.  This breakage was mostly due to the `anymap` crate which has been a persistent problem for us.

It's now a year later, and the fact that this is not yet a hard error is giving us pause about stabilizing arbitrary self types and `derive(CoercePointee)`.  So let's see about making a hard error of this.

r? ghost

cc `@adetaylor` `@Darksonn` `@BoxyUwU` `@RalfJung` `@compiler-errors` `@oli-obk` `@WaffleLapkin`

Related:

- rust-lang#135881
- rust-lang#136702
- rust-lang#136776

Tracking:

- rust-lang#127323
- rust-lang#44874
- rust-lang#123430
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Mar 5, 2025
…uto_to_object-hard-error, r=oli-obk

Make `ptr_cast_add_auto_to_object` lint into hard error

In Rust 1.81, we added a FCW lint (including linting in dependencies) against pointer casts that add an auto trait to dyn bounds.  This was part of work making casts of pointers involving trait objects stricter, and was part of the work needed to restabilize trait upcasting.

We considered just making this a hard error, but opted against it at that time due to breakage found by crater.  This breakage was mostly due to the `anymap` crate which has been a persistent problem for us.

It's now a year later, and the fact that this is not yet a hard error is giving us pause about stabilizing arbitrary self types and `derive(CoercePointee)`.  So let's see about making a hard error of this.

r? ghost

cc ``@adetaylor`` ``@Darksonn`` ``@BoxyUwU`` ``@RalfJung`` ``@compiler-errors`` ``@oli-obk`` ``@WaffleLapkin``

Related:

- rust-lang#135881
- rust-lang#136702
- rust-lang#136776

Tracking:

- rust-lang#127323
- rust-lang#44874
- rust-lang#123430
@rustbot rustbot removed the I-lang-easy-decision Issue: The decision needed by the team is conjectured to be easy; this does not imply nomination label Mar 13, 2025
@nikomatsakis
Copy link
Contributor

@rustbot labels -I-lang-nominated

We discussed this in our meeting today. Meeting consensus is that given that warning is not feasible we are in favor of going forward with this change with the proviso that we will have an error message with actionable instructions and open PRs against known regressions.

Side note, informal design axioms for breaking changes...

  • First, don't break.
  • If you must break, give a warning.
  • If you can't give a warning, give actionable advice.
  • No matter what, fix as many folks as you can.

@rustbot rustbot removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Mar 19, 2025
@BoxyUwU
Copy link
Member Author

BoxyUwU commented Mar 22, 2025

PRs against affected crates have been opened and can be seen here:

There were three regressions I've not filed PRs against:


It feels a bit awkward to bring up after having filed these PRs but regardless it seems like due diligence to ask anyway; is it worth considering an alternative fix to this problem with arbitrary self types? A couple options:

Allow lifetime casts in unsafe code only

Someone asked on one of the PRs whether it would be reasonable to allow this code to continue to work when the code is placed in an unsafe block. This would mean that behaviour of as casts changes between safe/unsafe rather than unsafe simply allowing more operations to be performed.

This feels somewhat dubious to me as it is not super clear that a safety invariant is being introduced when as casting. unsafe is also tricky to learn as-is and it's already a source of confusion as to whether unsafe "disables" the borrow checker, I think this would make that problem worse even if it's only a fringe edge case.

Regardless- it would solve the soundness bug and minimize the breakage to some extent. Looking at the regressions this would only avoid breaking a few of the affected crates, but this does include the metrics crate which was by large the most common cause of breakage and would mean that people depending on old versions of the metrics crate will stay unbroken.

We could potentially only do this as a migration strategy by breaking this even in unsafe contexts across an edition where it's more "morally correct" to make a breaking change. (This would be my preference if we do this as having this as intentional behaviour would likely be quite bad for teachability of unsafe, see followup comments)

@RalfJung I imagine you would probably have opinions about muddling the waters around what unsafe code does in this way (?)

Require construction of smart pointers that implement DispatchFromDyn via raw pointers to be unsafe

If unsafe fields existed we could require #[derive(CoercePointee)] struct SmartPtr<T: ?Sized>{ ptr: *const T } to actually be written as

struct SmartPtr<T: ?Sized>{
    /// SAFETY: When `T` is a dyn-type it must not have a lifetime bound greater than the underlying type the vtable is for
    unsafe ptr: *const T
}

This would enforce that arbitrary user-defined smart pointers are correct in the presence of pointer casts of dyn type lifetime bounds in the same way that all the smart pointers in std are. E.g. Box/Arc all have unsafe from_raw functions whose safety invariants imply that the vtable is fine.

This would mean that arbitrary_self_types_pointers (which permits directly using raw pointers as receivers without an intermediate smart pointer type) would not be possible to stabilize at any point down the road without backing out of this choice and going back down the road that this PR currently takes.

Another problem would be that this would block arbitrary self types on unsafe fields being stabilized which would be quite unfortunate given the importance of stabilizing this feature. It would also require derive(CoercePointee) to be blocked on this as well.

However it would remove the need to error on casting *const dyn Trait to *const dyn Trait + Send which was approved by lang already in #136764`, resulting in there being no breaking changes required to stabilize arbitrary self types.


Going to re-nominate for lang with a question of whether which of these y'all would prefer (options are more thoroughly elaborated above):

  1. Just break this immediately on stable
  2. Break some subset of the crates on stable, while some will continue to work due to the casts happening to be in unsafe code. Would we then want to break this over an edition?
  3. Block arbitrary self types on unsafe fields which would no longer require this change for arbitrary self types stabilization.

@BoxyUwU BoxyUwU added the I-lang-nominated Nominated for discussion during a lang team meeting. label Mar 22, 2025
@RalfJung
Copy link
Member

Allow lifetime casts in unsafe code only

Purely conceptually, it seems fine to me to say that some as casts break library invariants and hence can only be done in an unsafe block. However, I don't know how hard this would be to teach. Cc @rust-lang/opsem

For this concrete question that would mean we have to allow such invalid-lifetime dyn trait values to exist temporarily (i.e., they satisfy the language invariant). Is that where we stand today, i.e.,, Miri would accept the as cast but then it can be used later to cause UB?

@RalfJung
Copy link
Member

Break some subset of the crates on stable, while some will continue to work due to the casts happening to be in unsafe code which can then be broken over an edition

This is quite dubious. We're retroactively attaching more safety obligations to an existing operation, and then if there's UB somewhere we tell you its your fault since you wrote unsafe and thereby promised you upheld the safety obligation that didn't even exist yet when you wrote the code?

Does metrics happen to actually be sound under the new semantics, i.e. the safety obligation is actually satisfied?

@Mark-Simulacrum
Copy link
Member

Is it accurate to say that the UB being "added" here is not detectable within e.g. Miri, because we lack the information about the lifetimes present to enforce that you didn't mess this up? It seems unfortunate if that's true, because I could easily see there being code out there that didn't satisfy this safety obligation but is already using transmute for other reasons. It's pretty common I think to see casts to 'static to allow temporarily, unsafely storing objects in some place, typically with an argument that it's actually safe so long as you're careful to only call/do stuff with them that you'd be able to with the original lifetime.

I don't see a clear alternative to this -- I think we are sort of stuck given past decisions -- but I hadn't seen that question brought up so wanted to raise it here. Or maybe I've misunderstood, and we're actually not adding UB from violating this condition -- merely working to prevent it, and only if you happen to explicitly do something "wrong" does your code actually break. (Essentially saying that you shouldn't leak such a value to safe code, but there's no UB from just having it).

= help: consider adding the following bound: `'a: 'b`
= note: requirement occurs because of a mutable pointer to `dyn Trait<'_>`
= note: mutable pointers are invariant over their type parameter
= help: see <https://doc.rust-lang.org/nomicon/subtyping.html> for more information about variance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for us to note something in these errors pointing at some docs for why this is a bad idea? I could easily see someone just changing this to a transmute without realizing this is an intentional limitation of as casts.

I guess this falls under "dyn Trait metadata is invalid if it is not a pointer to a vtable for Trait that matches the actual dynamic trait the pointer or reference points to" in some sense (from https://doc.rust-lang.org/nightly/nomicon/what-unsafe-does.html) but maybe that should be clarified to say that it's not just "trait" but rather "trait and lifetime bounds on it" (or explicitly note this is a safety, not validity, invariant)...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm gonna try improve the diagnostics here before this can land as it's not a very helpful error message to get both from the POV of someone whose code just broke, or from the POV of someone who just tried to ptr cast the lifetimes in new code

@saethlin
Copy link
Member

Someone asked on one of the PRs whether it would be reasonable to allow this code to continue to work when the code is placed in an unsafe block. This would mean that behaviour of as casts changes between safe/unsafe rather than unsafe simply allowing more operations to be performed.

With regard to Ralf's ping, I think that this would be a teaching disaster and a regretful wart on the language unless this is just to provide a smoother deprecation period. as is already hard enough to teach because of the number of different operations that it can be depending on the types. And to the specific request, that author seems motivated by a desire to not use transmute because transmute is powerful and scary. In this case I think that desire is wrongheaded; as far as I can tell these as casts deserve the attention that at transmute would draw.

@RalfJung
Copy link
Member

Is it accurate to say that the UB being "added" here is not detectable within e.g. Miri, because we lack the information about the lifetimes present to enforce that you didn't mess this up? It seems unfortunate if that's true, because I could easily see there being code out there that didn't satisfy this safety obligation but is already using transmute for other reasons. It's pretty common I think to see casts to 'static to allow temporarily, unsafely storing objects in some place, typically with an argument that it's actually safe so long as you're careful to only call/do stuff with them that you'd be able to with the original lifetime.

I don't see a clear alternative to this -- I think we are sort of stuck given past decisions -- but I hadn't seen that question brought up so wanted to raise it here. Or maybe I've misunderstood, and we're actually not adding UB from violating this condition -- merely working to prevent it, and only if you happen to explicitly do something "wrong" does your code actually break. (Essentially saying that you shouldn't leak such a value to safe code, but there's no UB from just having it).

My understanding is that there's no immediate UB when doing the wrong-lifetime cast, but there can be UB further down the road since now we can make virtual function calls we shouldn't have been able to make. So, the cast breaks a library/safety invariant, but not a language/validity invariant. Miri can only check language invariants.

@BoxyUwU
Copy link
Member Author

BoxyUwU commented Mar 24, 2025

RalfJung: Is that where we stand today, i.e.,, Miri would accept the as cast but then it can be used later to cause UB?

On stable I don't think you can actually do anything "wrong" with these pointer casts as the only way to dispatch through the vtable requires going through unsafe code, either via reborrowing to get a reference instead of a pointer, or by going through an unsafe from_raw function on one of std's smart pointers.

If you (incorrectly) used unsafe to do those operations that situation is somewhat analogous to the case with arbitrary self types where no language level UB has been reached but it's possible to perform a vtable call without where clauses being satisfied.

I'm not sure if you can really escalate this "without where clauses being satisified" into language level UB. Even with arbitrary self types, it's a raw pointer so it's not safe to simply dereference and get a value out of it that is incorrectly believed to live for longer than it actually should.

If there's a safety invariant somewhere that the pointer is valid for reads of the pointee type then it's also not problematic as that rules out having these kinds of pointers passed in. This is the same kind of logic as to why these casts don't cause any problems for std's smart pointers as they all require unsafe to construct from a raw pointer.

So both on stable and with arbitrary_self_types I don't believe Miri can/should detect anything here and I'm also not confident you could actually escalate this into language level UB detectable by Miri. We can't not generate the methods with unsatisfiable where clauses for the vtable as they're only unsatisfiable due to lifetime bounds which we don't really have the ability to reason about in this way.

Regardless its certainly wrong for the type system to allow this in safe code...

saethlin: With regard to Ralf's ping, I think that this would be a teaching disaster and a regretful wart on the language unless this is just to provide a smoother deprecation period

This is roughly my opinion too 👍 I would be quite concerned about the teachability of this and would only want to go ahead with this if the intention was to follow up with making it hard error in future editions.

@WorldSEnder
Copy link

WorldSEnder commented Mar 25, 2025

Allow lifetime casts in unsafe code only

I am that person that suggested this, and after mulling it over a bit more, I also think it would be a mistake and the transmute makes a lot more sense here (besides the teachability of it which would also be increadibly complicated to understand). In my case, I need to go through a dyn Trait + 'static so I can push it through a channel which outlives the data passed through it, but I then end up using it as a dyn Trait + 'b for some lifetime that I know is outlived by the original the dyn object was created with. If that requires a transmute, so be it.

Now, I do have a question about what will be allowed and what will be forbidden.

The test cases in this PR so far only mention *mut pointers which are in any case invariant wrt their type argument and the error mentions this explicitly

= note: mutable pointers are invariant over their type parameter

Supposedly though, the problem and change also touches *const pointers which are covariant. From a variance perspective, it should still be okay to cast from dyn Trait + 'a to dyn Trait + 'b if 'a outlives 'b (just not the other way round, definitely not with 'b = 'static)?

&& let ty::Dynamic(dst_tty, ..) = dst_tail.kind()
if let ty::Dynamic(src_tty, src_region_bound, ..) = src_tail.kind()
&& let ty::Dynamic(dst_tty, dst_region_bound, ..) =
dst_tail.kind()
&& src_tty.principal().is_some()
&& dst_tty.principal().is_some()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "when there aren't principal traits" still true? It seems like I could get into trouble with a fn (self: dyn Send + 'static), right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We forbid casting dyn Send to dyn Trait in general as we would have no way to construct a vtable. The opposite cast is also fine as the vtable for dyn Send (or other autotraits) necessarily doesn't contain potentially user-written functions with where clauses that could rely on the dyn type's lifetime bound. I should write this rationale down somewhere

@Mark-Simulacrum
Copy link
Member

Thanks! Mostly trying to make sure we're actually fully closing the intended soundness hole...

IIUC, it sounds like when we say:

lang already FCP'd the decision to require VTables on raw pointers to be valid so this is effectively just enforcing that

I think we're referring to the decision in #101336... which defined the safety invariant as:

[...] the safety invariant for *dyn types must be that their metadata points to a fully valid vtable (i.e., a vtable created by the compiler)

That is actually somewhat ill-specified. In order for us to avoid bugs, I think it would help to make that a bit more specific. I believe we are saying is that the safety invariant for the vtable is:

  1. soundly loadable at the compiler-expected type for whatever A is in dyn A, including lifetimes. This is mostly just that we have the right size and no null pointers for fn ptrs, etc. In practice, the change in principal lifetime (A + 'a -> A + 'b) never changes the vtable the compiler synthesizes (right?), so this should be mostly trivial to ensure we get right.
  2. every method on the vtable must be soundly callable at the type of the method. i.e., if a method requires 'static on some X, it violates the safety invariant of a vtable to produce dyn A such that the method is callable where X: 'static is not true. Presumably, the same holds for any other bound (e.g., dyn A<T> -> dyn A<U> where a method is fn foo(&self) where T: 'static would also not be sound).

I think (2) only has a gap with respect to the principal lifetime (i.e., what this PR is closing) because we'd previously (in #120248) closed the other gaps by saying:

  • adding auto traits is only allowed if the principal trait has the auto trait as a super trait (given trait T: Send {}, *dyn T -> *dyn T + Send is valid, but *dyn Debug -> *dyn Debug + Send is not)
  • Generics (including lifetimes) must match (*dyn T<'a, A> -> *dyn T<'b, B> requires 'a = 'b and A = B)

That PR notes that we still allow two cases:

  • We only care about the metadata/last field, so you can still cast *const dyn T to *const WithHeader<dyn T>, etc
    • is that actually OK? Do we have a test which confirms that the WithHeader cast can't add some requirement to methods (e.g., because with arbitrary self types I have two methods, one on self: *const dyn T and the other self: *const WithHeader<dyn T> which makes that cast load bearing? (Possibly where the WithHeader method is adding a lifetime bound?)
  • The lifetime of the trait object itself (dyn A + 'lt) is not checked, so you can still cast *mut FnOnce() + '_ to *mut FnOnce() + 'static, etc
    • that's this PR -- we'd said "feels fishy, but I couldn't come up with a reason it must be checked", so I guess we found the reason :)

I think that implies that we should go edit the reference (after this PR merges) to update the as cast docs to strengthen the 2nd point, i.e., that in dyn A -> dyn B, it is actually the case that A and B must fully match (modulo moving an auto trait out of the "parent traits" of an existing trait in the list A, or dropping an auto trait from the list entirely).

(Also left one inline comment on a possibly missing case in the PR?)

@RalfJung
Copy link
Member

@WorldSEnder what do you mean by dyn 'a Trait? Do you mean dyn Trait + 'a?

@WorldSEnder
Copy link

@WorldSEnder what do you mean by dyn 'a Trait? Do you mean dyn Trait + 'a?

Oh yeah that's what I mean, syntax checking in github issues is not there yet ;D

@tmandry
Copy link
Member

tmandry commented Mar 26, 2025

This was discussed in today's lang team meeting. We agreed that we have to disable this operation in safe code (a breaking change with a small impact) to preserve soundness while shipping arbitrary self types. The question we debated was what to do in unsafe, and the consensus was that we should avoid making unnecessary breaking changes and allow this to continue working in unsafe. This can be justified under the idea that as casts can do multiple kinds of operations; some are safe and others are unsafe. However, it introduces some problems for which there are mitigating steps.

First, @nikomatsakis realized that just about all of the breaking occurrences seem to involve the implicit 'static lifetime on dyn types. We had consensus that we should implement a warn-by-default lint on these kinds of pointer casts that implicitly change lifetimes, whether in safe or unsafe code.

Second, we still have the chance to change this rule over an edition, without breaking more code than we have to today. We can lean into the safe behavior and disable changing lifetimes even in unsafe, just as we disallow *const dyn Trait<'a> as *const dyn Trait<'b> when the lifetimes are unrelated, and migrate users to using transmute. Or we can decide to continue allowing some as casts in unsafe that are not allowed in safe code. We did not have consensus on what to do here, but agreed that if we went forward with this change it could be decided later.

FAQ

Why is it okay to allow this in unsafe? Making sure that future uses of a pointer are valid before changing a lifetime like this is exactly the sort of thing a user should be verifying in unsafe, so it fits into the mold of existing unsafe operations.

Is this a new requirement that must be upheld by unsafe code? When arbitrary_self_types stabilizes, technically yes – it must ensure that it doesn't leak values with lifetime-extended types that might get used as Receivers (so pointer-y types) to untrusted code. In a practical sense, not really; it would be strange to leak such values anyway. Most unsafe code that does any sort of lifetime extension on a pointer carefully controls the code that has access to it.

Won't users miss that they're supposed to be verifying new things now without syntax to remind them of it? I share this concern, but I think the mitigating factors described above will resolve it. The lint on implicit lifetimes alone may be enough to resolve it.

Why is it okay that this is an as cast, and not a transmute? Users can already change pointee types with as casts. It's true that none of those operations can lead to unsoundness without unsafe, and this would be an exception. Adding a requirement that such casts are done in unsafe seems to us like enough of a mitigation to prevent surprising UB for now.

What about the precedent that as casts are not themselves dangerous operations wrt memory safety, while dereferencing them is? What about the precedent that the same syntactic form cannot do more under unsafe than under safe? Personally I agree with @BoxyUwU and would like to preserve these precedents, as I think they make Rust simpler to learn and understand. But preserving them in the current edition would require making more breaking changes than we have to or waiting to ship arbitrary self types in the next edition, neither of which I want to do. We have the option to restore these precedents over an edition.

What are the arguments against restoring precedent in a new edition? It would mean adding complexity to the compiler, language refrence, and edition guide to have edition-dependent behavior. An argument was also made that as casts on pointers are strictly more powerful than transmuting a value because they allow changing the size of the pointee, and it would be nice to preserve the property that as pointer casts are strictly more powerful than transmute.

Why not just break all the existing crates that do this; it's not that bad and avoids the edition complexity? This is the judgment call. metrics is a widely used crate, and personally I think breaking user code unnecessarily so we can ship a high priority feature is not a great look. There was some discussion that each affected crate might release a fixed version but that new releases might not be semver compatible with all the existing ones people are using, and that upstreaming fixes (which is a good thing we should continue doing) is not a panacea even, if we accept that users will need to cargo update. Unfortunately we don't have a standard yardstick to use to measure crates.io breakage by; I suspect people are using various heuristics and perhaps a slight change in those heuristics can lead to a different opinion on this judgment call. But this is the one we had consensus on in the meeting.

@compiler-errors
Copy link
Member

The question we debated was what to do in unsafe, and the consensus was that we should avoid making unnecessary breaking changes and allow this to continue working in unsafe.

I disagree with the proposal that unsafe blocks should affect the way we do MIR borrow checking, which as far as I can tell is what this is proposing from an implementation perspective. So, the whole root cause of this issue is that we've implemented the wrong semantics for wide ptr-to-ptr casts, which today in MIR type-checking does not enforce any lifetime relationship between the source expression and casted type.

As far as I am aware, before this proposal, unsafe blocks are simply ""lint"" markers that can be localized to unsafety checking (morally they are equivalent to #[allow(..)] cfgs that disable otherwise deny-lint behavior of unsafe operations, if you see unsafety checking as a type of lint). After this proposal, they will have a concrete effect on the way that the type system operates on the MIR since they necessarily change the way we have to treat lifetimes in the borrow checker for NLL to work correct. I'm not too keen to say that we can clean this up over an edition, since we need to maintain this behavior for ~approximately forever.

Before this proposal, it was pretty easy to teach unsafe to users by saying that it does not affect the way that system works, but it simply changes what operations we enforce as illegal (that is, what MIR operations we allow). It is even explained this way in the book:

It’s important to understand that unsafe doesn’t turn off the borrow checker or disable any other of Rust’s safety checks: if you use a reference in unsafe code, it will still be checked.

This definitely complicates that teachability, and IMO this is a pretty drastic new behavior to have to introduce just to avoid crate breakage from something we only recently started to allow, and from the type system perspective I am pretty inclined to push back on this proposal from a maintenance perspective.


First, @\nikomatsakis realized that just about all of the breaking occurrences seem to involve the implicit 'static lifetime on dyn types. We had consensus that we should implement a warn-by-default lint on these kinds of pointer casts that implicitly change lifetimes, whether in safe or unsafe code.

I wanted to note again that I don't think it will be possible to implement such a lint in a way that doesn't either have too many false positives (e.g. triggers on every 'static raw wide pointer cast) or too many false negatives (e.g. doesn't detect the breakage that we want to lint here).

I thought I made sure that people were aware of the implementation difficulty, but I guess not 🤔 See https://rust-lang.zulipchat.com/#narrow/channel/144729-t-types/topic/lifetime.20extension.20from.20dyn.20casts.20.23136776 for the relevant discussion.

@Manishearth
Copy link
Member

Manishearth commented Mar 27, 2025

From the perspective of an unsafe reviewer (and someone who tries to teach others to unsafe review), I'd like to state a strong opposition to forbidding this in safe code only and allowing it in unsafe, thereby introducing a new source of UB to as casts.

So far as casts are UB based on the usage of their results, not based on the cast themselves1. They are not easy to review, especially with inferred types (and such), but ultimately you can kinda go "well we turned a thingy into a thangy and are now degreebling it, which is safe to do to the original type in this context". This introduces a new, much trickier, modality to the review.

Currently, what is the actual unsafe operation is a major and tricky task when reviewing unsafe code. Fortunately, it mostly boils down to "look at the function calls" and "look at the *s", and in rarer cases "look at modifications of a MUTABLE_STATIC" and "look at field accesses". Multiple people have talked about teachability already: this new source is subtler than those, non-obvious, and kinda hard to notice in code. This is going to be a source of problems.

I definitely feel heard by @RalfJung's comment here:

This is quite dubious. We're retroactively attaching more safety obligations to an existing operation, and then if there's UB somewhere we tell you its your fault since you wrote unsafe and thereby promised you upheld the safety obligation that didn't even exist yet when you wrote the code?

I think that's not a good precedent to set at all.

And by @compiler-errors's comment:

This definitely complicates that teachability, and IMO this is a pretty drastic new behavior to have to introduce just to avoid crate breakage from something we only recently started to allow, and from the type system perspective I am pretty inclined to push back on this proposal from a maintenance perspective.

Yep, this feels rather drastic to me as an unsafe reviewer .

I like @nikomatsakis' axioms here, and I'll note that making this UB is breakage from an unsafe-writer's POV too, even if it doesn't cause compilation failures, and I'd argue that that's worse since there's no way to detect it.

Minimizing the position on @nikomatsakis' breakage ladder should account for this type of breakage too, it shouldn't just be about "which code still compiles".


This is the judgment call. metrics is a widely used crate, and personally I think breaking user code unnecessarily so we can ship a high priority feature is not a great look. There was some discussion that each affected crate might release a fixed version but that new releases might not be semver compatible with all the existing ones people are using, and that upstreaming fixes (which is a good thing we should continue doing) is not a panacea even, if we accept that users will need to cargo update. Unfortunately we don't have a standard yardstick to use to measure crates.io breakage by; I suspect people are using various heuristics and perhaps a slight change in those heuristics can lead to a different opinion on this judgment call. But this is the one we had consensus on in the meeting.

@tmandry

It feels like talking about the semver issue feels premature unless we know that the metrics crate does not want to do backports and stuff: do we know that?

When it comes to these kinds of breakages I feel like there's a lot of value in negotiating with our users. Some users have legitimate needs and will say no to stuff like this, but quite often the answer can be "...yeah, okay, not ideal but we can work with that". As a crate maintainer I've had to do that often enough, from both sides of the equation.

I understand that this doesn't cover private crates, which is a risk, but this is a problem that can be attacked from multiple angles, including the FCW.


I wanted to note again that I don't think it will be possible to implement such a lint in a way that doesn't either have too many false positives (e.g. triggers on every 'static raw wide pointer cast) or too many false negatives (e.g. doesn't detect the breakage that we want to lint here).

I do feel like this is a tradeoff that one can make a call on: either choice at least gives people something to work with. To me it feels like some pain now is worth avoiding perpetual pain in the long run.

Also, perhaps I'm missing something: but the FCW is being talked about for the situation where this becomes a hard error, yes? If it's possible to hard error, it should be possible to FCW for that case with no false negatives, no?

One thing I'll add, looking holistically at this: personally I don't particularly enjoy replacing as with transmute
(I think transmute is too powerful and it's nice to be able to narrow it down to a specific, clear purpose), it would probably be nice to have an unsafe macro or something that narrowly does a lifetime pointer cast, but that might be too niche an API. Not actually trying to propose that here.

Footnotes

  1. Not fully clear on the situation here with provenance though. I know ptr-to-int transmutes are UB or otherwise verboten in const, but the as cast is forbidden by

@RalfJung
Copy link
Member

RalfJung commented Mar 27, 2025

@tmandry

An argument was also made that as casts on pointers are strictly more powerful than transmuting a value because they allow changing the size of the pointee, and it would be nice to preserve the property that as pointer casts are strictly more powerful than transmute.

I don't understand what you mean by this. I can transmute *const dyn Send to *const dyn Debug but I cannot do an as cast here. So in which sense are as casts strictly more powerful on pointers? Also I can of course transmute *const u32 to *const u8, i.e., transmute can change the size of the pointee.

Transmute cannot change the size of the pointer, e.g. *const i32 to *const dyn Send. Is that what you mean? But then the claim that as is strictly more powerful is still wrong since as on wide ptrs ensures that the "unsized tail" remains of the same shape.

@Manishearth

So far as casts are UB based on the usage of their results, not based on the cast themselves

This property is preserved. A bad as cast violates a library/"safety" invariant, which means that future safe operations can cause UB. But the bad cast itself cannot cause UB.

The difference to before is that so far as casts could cause UB only based on unsafe usage of their results. With the proposed change, even safe usage (of unsafe casts) can cause UB.

@BoxyUwU
Copy link
Member Author

BoxyUwU commented Mar 27, 2025

I'll try not to re-cover what other people have already commented, e.g. feelings about teachability here, or pointer casts not being strictly more powerful (or equal to) transmutes.

First, reading the lang meeting minutes and the summarizing comment here I get the impression that lang is making decisions under the belief that writing *const dyn Trait means *const dyn Trait + 'static in all circumstances. This is not true.

In item signatures we have the dyn type lifetime default rules that do give this behaviour, e.g. static FOO: *const dyn Trait = ...; is equivalent to *const dyn Trait + 'static. However, in bodies (e.g. the initializer of a static, or inside the { ... } of a function), elided lifetimes of dyn types are simply unconstrained lifetimes that are inferred by the borrow checker. Example:

fn foo<T: Trait>(ptr: *const T) {
    let a: *const dyn Trait = ptr;
}

This currently compiles on stable and will continue to do so under this PR. If the type annotation on the let statement meant *const dyn Trait + 'static then this would not compile even on stable.


I am somewhat confused by the proposal to start linting on code. The exact details of what we're supposed to lint on are unclear to me, especially given the prior context of having already ruled out being able to do a FCW. Having read the meeting notes my understand is that lang is considering a lint that forbids eliding dyn type lifetimes altogether in as casts?

For example that the following would emit a lint:

fn foo<T: Trait>(ptr: *const T) {
    ptr as *const dyn Trait;
}

I would expect this to have a lot of false positives. I also don't believe this really helps alleviate the footguns involved with these pointer casts (which seemed to be a big point of focus in the meeting notes, that this removes a footgun). Even when explicitly writing out the lifetimes involved you can just write lifetimes that make it seem like no real lifetime changing has occurred.

Taking the example from the lang meeting notes:

fn foo(output: &dyn Write) {
  /* ... */
  unsafe { output as *const (dyn Write + 'static) as *mut (dyn Write + 'static) };
  /* ... */
}

This example still seems like a footgun, the explicitly written lifetime bounds almost make it worse as you could easily believe that the pointee of output lives for 'static and that we then perform a pointer cast which does not change the lifetime bound.

In order to avoid a footgun here the type of output with the lifetime bound written explicitly needs to be present so that you can see the source and target types of the cast. e.g. my_ref as &(dyn Trait + 'a) as *const dyn Trait + 'static doesn't seem like a footgun as you can tell that a cast from + 'a to + 'static is occuring.


I am also unclear on where lang stands in regards to breaking such unsafe code over an edition. The summary comment here was not particularly clear, and reading the meeting notes I have been given the impression that no real consensus was arrived at and that the hypothetical lint obviates any need to break this unsafe code.

I am not sure if things have changed for lang now given my previous statements about footguns and dyn-type elision rules.

Before going through with this I would like for lang to fully commit to either supporting such casts as an actual language feature, or only as a migration hack with a commitment to break it going forward across the 2025 edition (with the understanding that it may not be possible to FCW or auto-fix).

Generally I would like to avoid going forward with this with a vague handwaving of "we could potentially break this over an edition" to assuage concerns of teachability/etc, and then potentially wind up with lang deciding not to do so.

On the general consistency of pointer casts here. Lang has already forbidden casting *const dyn Trait to *const dyn Trait + Send, and have also already forbidden casting *const dyn Trait<'a> to *const dyn Trait<'b> (or even *const dyn Trait<T> to *const dyn Trait<U>).

If lang wishes to extend the power of as casts in unsafe code to be able to do more "transmute like" casts as a language feature, then I would expect us to also allow these arbitrary casts of dyn types. Is this something lang is on board with?

On the other hand if this is only intended as a migration strategy then this inconsistency seems fine to me as we expect all code to either be legacy (i.e. no longer maintained) or updated to a newer edition where there is no inconsistency.


Finally, reading the meeting notes I see that there were parallels drawn between raw pointer derefs being the same syntax as derefing references and how one is safe and the other not.

Admittedly I do see the parallel here but I think it's worth pointing out that it is significantly easier to determine whether one is dereferencing a pointer or reference, in comparison to determining whether a lifetime has been extended. I think lang is aware of this hence the discussion of the lint and avoiding footguns? I think this is worth revisiting in light of previous statements about footguns/the lint.

I would also like to note that r-a currently has the ability to highlight unsafe operations in a specific colour to make it more clear when safety invariants are introduced in unsafe code. I don't know how well that can be supported with this change where all as casts of pointers to dyn-types in unsafe blocks are sometimes morally-unsafe but only determinable by the specific lifetimes in play.

Do we expect r-a to be able to figure this out or just conservatively consider all of these as casts to be unsafe operations? Similar to the previous section of my comment, I don't think this is much of a problem if this change is only done as a migration method and broken over an edition. It's probably fine if legacy codebases have "overly liberal" application of "this operation is unsafe" highlighting.

I generally find it hard to go along with the idea that its "just" allowing a new operation to be performed inside of unsafe code rather than "changing semantics/disabling checks" when both humans and tooling is going to struggle to even distinguish these two different operations. This to the extent that the compiler impl would literally be to unconditionally generate the "unsafe transmute cast" operation in unsafe blocks rather than the checked kind.

I think the fact that there is ~no world in which the compiler impl will align with this mental model of "its a new operation allowed in unsafe" should be a good signal that it's not the right mental model.

@BoxyUwU
Copy link
Member Author

BoxyUwU commented Mar 27, 2025

Separately from my previous comment I want to just say that the metrics author(s?) seem amenable to backporting the fix to previous releases allowing crates that depend on older-semver-incompatible version to painlessly upgrade. This crate is by far the largest source of regressions of this change other than the may crate.

The may crate however does not have its casts already present in an unsafe block so would not be affected by this migration strategy of "allow it in unsafe blocks". With backported versions of metrics released I think the outcome of insta-breaking would be a lot smaller/more palatable, or atleast the benefit of continuing to allow this in unsafe code is significantly smaller as users of metrics would already have a trivial path to being unbroken.

(If you look at the crater regressions there are maybe a dozen due to may but maybe a hundred due to metrics)

@workingjubilee
Copy link
Member

@Veykril Could you weigh in on the technical feasibility of both detecting this new class of unsafe-but-only-when-lifetime-changing as casts, using r-a's understanding of how Rust code works, and emitting the right highlighting suggestions to editors to identify this for users?

@tmandry
Copy link
Member

tmandry commented Mar 27, 2025

Thanks for the feedback everyone. I was the one primarily pushing to avoid unnecessary breakage in this case. That is conditional on the feasibility and maintainability of a solution, and @compiler-errors makes a compelling case that it would be neither. No one thought we were proposing changes to MIR borrow check! @BoxyUwU may be able to confirm/deny if that is what she had in mind.

I think we've heard enough to convince me that this is not a path worth going down now. It sounds like the lint mitigation was based on some faulty assumptions. Upon reflection, I agree with @BoxyUwU that we should have consensus within the lang team to walk this safe/unsafe difference back over an edition before releasing it. Deciding this is infeasible or a maintenance burden, or being convinced (as others seem to be) that the breakage is very minimal, would be enough as well.

In this particular case I think the breakage is unfortunate but workable, so I am okay to move forward. I don't speak for the rest of the lang team in this comment, but no one spoke up about wanting to avoid the breakage as much as I did.

@Manishearth You mentioned an FCW; my understanding has been that an FCW in this case is infeasible. I do agree with your assertion that this would make unsafe reviews harder.

IMO this is a pretty drastic new behavior to have to introduce just to avoid crate breakage from something we only recently started to allow

@compiler-errors That wasn't my understanding from the discussion, which included a 6yr old version of diesel. Was that relying on this as an unstable feature until recently, or what am I missing?

@compiler-errors
Copy link
Member

compiler-errors commented Mar 28, 2025

@tmandry: I guess I was a bit misled when I said we only recently started to allow this behavior. What I should have said is that we only recently made this behavior more relaxed beginning with #113262.

That PR made casts where nothing but the lifetime was changing work, i.e. expr as *const dyn Tr + '_ as *const dyn Tr + 'static even if expr's type doesn't outlive 'static. That's why most of the regressions are from ~1 year ago or so.

But there are rare cases where this behavior coincidentally worked when there were more than 1 casts chained together and between the two casts more than the lifetime changed in the casted type. That old diesel one is an example. Boxy and I worked out a few examples where this was allowed even pre 1.75 which is when that linked PR landed.

This is the moral equivalent of the diesel example:

trait Tr {}

struct Foo {
    ptr: dyn Tr + 'static
}

fn foo<T: Tr>(x: &dyn Tr) -> *const Foo {
    let x = x as *const dyn Tr as *const Foo;
    x
}

And this is the moral equivalent of the brainfuck one:

trait Tr {}

fn foo<T: Tr>(x: &dyn Tr) -> *mut (dyn Tr + 'static) {
    x as *const dyn Tr as *mut dyn Tr
}

Both of these examples worked before 1.75 because they were emitting something that was a non-trivial pointer cast. But they still are definitely violations of the vtable validity here.


I will note that we're still really funny with what we allow when doing raw pointer casts of wide pointers. For example, this errors today:

trait Tr {}

fn foo<T: Tr>(x: &dyn Tr) -> *const (dyn Tr + 'static) {
    x as *const dyn Tr
}

i.e. it was not fixed by #113262. You could argue that @BoxyUwU's PR here is making the language more consistent by always enforcing the correct lifetimes in wide pointers along these lines :)

@Veykril
Copy link
Member

Veykril commented Mar 28, 2025

@Veykril Could you weigh in on the technical feasibility of both detecting this new class of unsafe-but-only-when-lifetime-changing as casts, using r-a's understanding of how Rust code works, and emitting the right highlighting suggestions to editors to identify this for users?

given rust-analyzer is still completely oblivious to lifetimes in our IRs I'd say very difficult. Though either way that feature is only meant to show what would be erroring if the unsafe block was missing, which reading from this, the as cast wouldn't even do in the first place? (as you said it is more of a new class of unsafe)

@traviscross
Copy link
Contributor

traviscross commented Mar 28, 2025

On the substance here, we'll obviously take this back up on Wednesday. My estimate is that we're rather likely to readopt the original plan and to "stay the course" on that. We very nearly went that way last Wednesday.

Speaking for myself, I think what's being done here, in terms of making PRs to the affected public projects and working out a diagnostic that guides people in the right direction, is what we need to do, and combined with the soundness arguments we've been making, is sufficient to justify and support this change. I appreciate -- and I know we all on lang appreciate -- the work @BoxyUwU did to make those PRs and is doing to put this all together. It's a nice additional benefit that we apparently only allowed the most likely kinds of this semi-recently and that this probably does make the language more consistent in the way that CE mentioned.

Note about earlier notifications. (As an aside, if you're following notifications on this thread, you may have seen I earlier momentarily said some other things by way of analysis that turned out to not be correct and that I retracted soon after to try to avoid sending the thread on a tangent. Sorry about the noise there if you saw that; please disregard. Thought I was onto something; was not. Just was staring at it too long.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-lang-nominated Nominated for discussion during a lang team meeting. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

arbitrary_self_types + derive_coerce_pointee allows calling methods whose where clauses are violated