-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Forbid object lifetime changing pointer casts #136776
base: master
Are you sure you want to change the base?
Conversation
@bors try |
…, r=<try> [WIP] Forbid object lifetime changing pointer casts Fixes rust-lang#136702 r? `@ghost`
☀️ Try build successful - checks-actions |
@craterbot check |
👌 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
🚧 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
🎉 Experiment
|
Most of these are on github; in terms of crates.io regressions all we have is:
Overall, 142 regressions are caused by EDIT: Ah, there's also |
We discussed this in the lang triage call today. We wanted to think more about it, so we're leaving it nominated to discuss again. |
@BoxyUwU Do you think it would be possible to implement this as an FCW? We talked about this in lang triage today and would prefer to start with that if we can. If it's not feasible, a hard error can also work (I would say though that we should upstream PRs to any crates we break). Another small thing I noticed is that the error message links to the Nomicon section on variance, but it would be ideal to link to a tracking issue or something describing this issue in particular. |
To add on to what tmandry, said, in our discussions we did feel that the approach taken in this PR is generally the right way forward, and we're happy to see this progress so as to help clear the way for cc @rust-lang/lang |
@tmandry I do expect it to be possible to FCW this. We can likely do something hacky around to fully emulate the fix (but as a lint), but if that doesn't work out all the regression we found were relatively "simple" cases that can probably be taken advantage of (if need be) to lint a subset of the actual cases we'd break with this PR edit: see compiler-errors' comment, I'm not so convinced this will be possible to FCW anymore and will likely investigate improving the diagnostics here. I've already filed PRs to the affected crates to migrate them over to a transmute to avoid the breakage if this lands |
I was thinking earlier that it may be possible to implement a lint to detect, but it seems to me that MIR borrowck is not equipped to implement such a lint. Specifically, it seems near impossible to answer whether a region outlives constraint (like, To fix this would require some significant engineering effort to refactor how NLL processes its region graph to make it easier to clone and reprocess with new constraints. |
…uto_to_object-hard-error, r=oli-obk Make `ptr_cast_add_auto_to_object` lint into hard error In Rust 1.81, we added a FCW lint (including linting in dependencies) against pointer casts that add an auto trait to dyn bounds. This was part of work making casts of pointers involving trait objects stricter, and was part of the work needed to restabilize trait upcasting. We considered just making this a hard error, but opted against it at that time due to breakage found by crater. This breakage was mostly due to the `anymap` crate which has been a persistent problem for us. It's now a year later, and the fact that this is not yet a hard error is giving us pause about stabilizing arbitrary self types and `derive(CoercePointee)`. So let's see about making a hard error of this. r? ghost cc `@adetaylor` `@Darksonn` `@BoxyUwU` `@RalfJung` `@compiler-errors` `@oli-obk` `@WaffleLapkin` Related: - rust-lang#135881 - rust-lang#136702 - rust-lang#136776 Tracking: - rust-lang#127323 - rust-lang#44874 - rust-lang#123430
…uto_to_object-hard-error, r=oli-obk Make `ptr_cast_add_auto_to_object` lint into hard error In Rust 1.81, we added a FCW lint (including linting in dependencies) against pointer casts that add an auto trait to dyn bounds. This was part of work making casts of pointers involving trait objects stricter, and was part of the work needed to restabilize trait upcasting. We considered just making this a hard error, but opted against it at that time due to breakage found by crater. This breakage was mostly due to the `anymap` crate which has been a persistent problem for us. It's now a year later, and the fact that this is not yet a hard error is giving us pause about stabilizing arbitrary self types and `derive(CoercePointee)`. So let's see about making a hard error of this. r? ghost cc ``@adetaylor`` ``@Darksonn`` ``@BoxyUwU`` ``@RalfJung`` ``@compiler-errors`` ``@oli-obk`` ``@WaffleLapkin`` Related: - rust-lang#135881 - rust-lang#136702 - rust-lang#136776 Tracking: - rust-lang#127323 - rust-lang#44874 - rust-lang#123430
@rustbot labels -I-lang-nominated We discussed this in our meeting today. Meeting consensus is that given that warning is not feasible we are in favor of going forward with this change with the proviso that we will have an error message with actionable instructions and open PRs against known regressions. Side note, informal design axioms for breaking changes...
|
PRs against affected crates have been opened and can be seen here:
There were three regressions I've not filed PRs against:
It feels a bit awkward to bring up after having filed these PRs but regardless it seems like due diligence to ask anyway; is it worth considering an alternative fix to this problem with arbitrary self types? A couple options: Allow lifetime casts in unsafe code onlySomeone asked on one of the PRs whether it would be reasonable to allow this code to continue to work when the code is placed in an unsafe block. This would mean that behaviour of This feels somewhat dubious to me as it is not super clear that a safety invariant is being introduced when Regardless- it would solve the soundness bug and minimize the breakage to some extent. Looking at the regressions this would only avoid breaking a few of the affected crates, but this does include the We could potentially only do this as a migration strategy by breaking this even in unsafe contexts across an edition where it's more "morally correct" to make a breaking change. (This would be my preference if we do this as having this as intentional behaviour would likely be quite bad for teachability of unsafe, see followup comments) @RalfJung I imagine you would probably have opinions about muddling the waters around what unsafe code does in this way (?) Require construction of smart pointers that implement
|
Purely conceptually, it seems fine to me to say that some For this concrete question that would mean we have to allow such invalid-lifetime dyn trait values to exist temporarily (i.e., they satisfy the language invariant). Is that where we stand today, i.e.,, Miri would accept the |
This is quite dubious. We're retroactively attaching more safety obligations to an existing operation, and then if there's UB somewhere we tell you its your fault since you wrote Does |
Is it accurate to say that the UB being "added" here is not detectable within e.g. Miri, because we lack the information about the lifetimes present to enforce that you didn't mess this up? It seems unfortunate if that's true, because I could easily see there being code out there that didn't satisfy this safety obligation but is already using transmute for other reasons. It's pretty common I think to see casts to I don't see a clear alternative to this -- I think we are sort of stuck given past decisions -- but I hadn't seen that question brought up so wanted to raise it here. Or maybe I've misunderstood, and we're actually not adding UB from violating this condition -- merely working to prevent it, and only if you happen to explicitly do something "wrong" does your code actually break. (Essentially saying that you shouldn't leak such a value to safe code, but there's no UB from just having it). |
= help: consider adding the following bound: `'a: 'b` | ||
= note: requirement occurs because of a mutable pointer to `dyn Trait<'_>` | ||
= note: mutable pointers are invariant over their type parameter | ||
= help: see <https://doc.rust-lang.org/nomicon/subtyping.html> for more information about variance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for us to note something in these errors pointing at some docs for why this is a bad idea? I could easily see someone just changing this to a transmute without realizing this is an intentional limitation of as
casts.
I guess this falls under "dyn Trait metadata is invalid if it is not a pointer to a vtable for Trait that matches the actual dynamic trait the pointer or reference points to" in some sense (from https://doc.rust-lang.org/nightly/nomicon/what-unsafe-does.html) but maybe that should be clarified to say that it's not just "trait" but rather "trait and lifetime bounds on it" (or explicitly note this is a safety, not validity, invariant)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm gonna try improve the diagnostics here before this can land as it's not a very helpful error message to get both from the POV of someone whose code just broke, or from the POV of someone who just tried to ptr cast the lifetimes in new code
With regard to Ralf's ping, I think that this would be a teaching disaster and a regretful wart on the language unless this is just to provide a smoother deprecation period. |
My understanding is that there's no immediate UB when doing the wrong-lifetime cast, but there can be UB further down the road since now we can make virtual function calls we shouldn't have been able to make. So, the cast breaks a library/safety invariant, but not a language/validity invariant. Miri can only check language invariants. |
On stable I don't think you can actually do anything "wrong" with these pointer casts as the only way to dispatch through the vtable requires going through unsafe code, either via reborrowing to get a reference instead of a pointer, or by going through an unsafe If you (incorrectly) used unsafe to do those operations that situation is somewhat analogous to the case with arbitrary self types where no language level UB has been reached but it's possible to perform a vtable call without where clauses being satisfied. I'm not sure if you can really escalate this "without where clauses being satisified" into language level UB. Even with arbitrary self types, it's a raw pointer so it's not safe to simply dereference and get a value out of it that is incorrectly believed to live for longer than it actually should. If there's a safety invariant somewhere that the pointer is valid for reads of the pointee type then it's also not problematic as that rules out having these kinds of pointers passed in. This is the same kind of logic as to why these casts don't cause any problems for std's smart pointers as they all require unsafe to construct from a raw pointer. So both on stable and with arbitrary_self_types I don't believe Miri can/should detect anything here and I'm also not confident you could actually escalate this into language level UB detectable by Miri. We can't not generate the methods with unsatisfiable where clauses for the vtable as they're only unsatisfiable due to lifetime bounds which we don't really have the ability to reason about in this way. Regardless its certainly wrong for the type system to allow this in safe code...
This is roughly my opinion too 👍 I would be quite concerned about the teachability of this and would only want to go ahead with this if the intention was to follow up with making it hard error in future editions. |
I am that person that suggested this, and after mulling it over a bit more, I also think it would be a mistake and the transmute makes a lot more sense here (besides the teachability of it which would also be increadibly complicated to understand). In my case, I need to go through a Now, I do have a question about what will be allowed and what will be forbidden. The test cases in this PR so far only mention
Supposedly though, the problem and change also touches |
&& let ty::Dynamic(dst_tty, ..) = dst_tail.kind() | ||
if let ty::Dynamic(src_tty, src_region_bound, ..) = src_tail.kind() | ||
&& let ty::Dynamic(dst_tty, dst_region_bound, ..) = | ||
dst_tail.kind() | ||
&& src_tty.principal().is_some() | ||
&& dst_tty.principal().is_some() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "when there aren't principal traits" still true? It seems like I could get into trouble with a fn (self: dyn Send + 'static)
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We forbid casting dyn Send
to dyn Trait
in general as we would have no way to construct a vtable. The opposite cast is also fine as the vtable for dyn Send
(or other autotraits) necessarily doesn't contain potentially user-written functions with where clauses that could rely on the dyn type's lifetime bound. I should write this rationale down somewhere
Thanks! Mostly trying to make sure we're actually fully closing the intended soundness hole... IIUC, it sounds like when we say:
I think we're referring to the decision in #101336... which defined the safety invariant as:
That is actually somewhat ill-specified. In order for us to avoid bugs, I think it would help to make that a bit more specific. I believe we are saying is that the safety invariant for the vtable is:
I think (2) only has a gap with respect to the principal lifetime (i.e., what this PR is closing) because we'd previously (in #120248) closed the other gaps by saying:
That PR notes that we still allow two cases:
I think that implies that we should go edit the reference (after this PR merges) to update the (Also left one inline comment on a possibly missing case in the PR?) |
@WorldSEnder what do you mean by |
Oh yeah that's what I mean, syntax checking in github issues is not there yet ;D |
This was discussed in today's lang team meeting. We agreed that we have to disable this operation in safe code (a breaking change with a small impact) to preserve soundness while shipping arbitrary self types. The question we debated was what to do in unsafe, and the consensus was that we should avoid making unnecessary breaking changes and allow this to continue working in unsafe. This can be justified under the idea that First, @nikomatsakis realized that just about all of the breaking occurrences seem to involve the implicit Second, we still have the chance to change this rule over an edition, without breaking more code than we have to today. We can lean into the safe behavior and disable changing lifetimes even in unsafe, just as we disallow FAQWhy is it okay to allow this in unsafe? Making sure that future uses of a pointer are valid before changing a lifetime like this is exactly the sort of thing a user should be verifying in unsafe, so it fits into the mold of existing unsafe operations. Is this a new requirement that must be upheld by unsafe code? When Won't users miss that they're supposed to be verifying new things now without syntax to remind them of it? I share this concern, but I think the mitigating factors described above will resolve it. The lint on implicit lifetimes alone may be enough to resolve it. Why is it okay that this is an What about the precedent that What are the arguments against restoring precedent in a new edition? It would mean adding complexity to the compiler, language refrence, and edition guide to have edition-dependent behavior. An argument was also made that Why not just break all the existing crates that do this; it's not that bad and avoids the edition complexity? This is the judgment call. |
I disagree with the proposal that unsafe blocks should affect the way we do MIR borrow checking, which as far as I can tell is what this is proposing from an implementation perspective. So, the whole root cause of this issue is that we've implemented the wrong semantics for wide ptr-to-ptr casts, which today in MIR type-checking does not enforce any lifetime relationship between the source expression and casted type. As far as I am aware, before this proposal, unsafe blocks are simply ""lint"" markers that can be localized to unsafety checking (morally they are equivalent to Before this proposal, it was pretty easy to teach unsafe to users by saying that it does not affect the way that system works, but it simply changes what operations we enforce as illegal (that is, what MIR operations we allow). It is even explained this way in the book:
This definitely complicates that teachability, and IMO this is a pretty drastic new behavior to have to introduce just to avoid crate breakage from something we only recently started to allow, and from the type system perspective I am pretty inclined to push back on this proposal from a maintenance perspective.
I wanted to note again that I don't think it will be possible to implement such a lint in a way that doesn't either have too many false positives (e.g. triggers on every I thought I made sure that people were aware of the implementation difficulty, but I guess not 🤔 See https://rust-lang.zulipchat.com/#narrow/channel/144729-t-types/topic/lifetime.20extension.20from.20dyn.20casts.20.23136776 for the relevant discussion. |
From the perspective of an So far Currently, what is the actual unsafe operation is a major and tricky task when reviewing unsafe code. Fortunately, it mostly boils down to "look at the function calls" and "look at the I definitely feel heard by @RalfJung's comment here:
I think that's not a good precedent to set at all. And by @compiler-errors's comment:
Yep, this feels rather drastic to me as an unsafe reviewer . I like @nikomatsakis' axioms here, and I'll note that making this UB is breakage from an unsafe-writer's POV too, even if it doesn't cause compilation failures, and I'd argue that that's worse since there's no way to detect it. Minimizing the position on @nikomatsakis' breakage ladder should account for this type of breakage too, it shouldn't just be about "which code still compiles".
It feels like talking about the semver issue feels premature unless we know that the When it comes to these kinds of breakages I feel like there's a lot of value in negotiating with our users. Some users have legitimate needs and will say no to stuff like this, but quite often the answer can be "...yeah, okay, not ideal but we can work with that". As a crate maintainer I've had to do that often enough, from both sides of the equation. I understand that this doesn't cover private crates, which is a risk, but this is a problem that can be attacked from multiple angles, including the FCW.
I do feel like this is a tradeoff that one can make a call on: either choice at least gives people something to work with. To me it feels like some pain now is worth avoiding perpetual pain in the long run. Also, perhaps I'm missing something: but the FCW is being talked about for the situation where this becomes a hard error, yes? If it's possible to hard error, it should be possible to FCW for that case with no false negatives, no? One thing I'll add, looking holistically at this: personally I don't particularly enjoy replacing Footnotes
|
I don't understand what you mean by this. I can transmute Transmute cannot change the size of the pointer, e.g.
This property is preserved. A bad The difference to before is that so far |
I'll try not to re-cover what other people have already commented, e.g. feelings about teachability here, or pointer casts not being strictly more powerful (or equal to) transmutes. First, reading the lang meeting minutes and the summarizing comment here I get the impression that lang is making decisions under the belief that writing In item signatures we have the dyn type lifetime default rules that do give this behaviour, e.g. fn foo<T: Trait>(ptr: *const T) {
let a: *const dyn Trait = ptr;
} This currently compiles on stable and will continue to do so under this PR. If the type annotation on the let statement meant I am somewhat confused by the proposal to start linting on code. The exact details of what we're supposed to lint on are unclear to me, especially given the prior context of having already ruled out being able to do a FCW. Having read the meeting notes my understand is that lang is considering a lint that forbids eliding dyn type lifetimes altogether in For example that the following would emit a lint: fn foo<T: Trait>(ptr: *const T) {
ptr as *const dyn Trait;
} I would expect this to have a lot of false positives. I also don't believe this really helps alleviate the footguns involved with these pointer casts (which seemed to be a big point of focus in the meeting notes, that this removes a footgun). Even when explicitly writing out the lifetimes involved you can just write lifetimes that make it seem like no real lifetime changing has occurred. Taking the example from the lang meeting notes: fn foo(output: &dyn Write) {
/* ... */
unsafe { output as *const (dyn Write + 'static) as *mut (dyn Write + 'static) };
/* ... */
} This example still seems like a footgun, the explicitly written lifetime bounds almost make it worse as you could easily believe that the pointee of In order to avoid a footgun here the type of I am also unclear on where lang stands in regards to breaking such unsafe code over an edition. The summary comment here was not particularly clear, and reading the meeting notes I have been given the impression that no real consensus was arrived at and that the hypothetical lint obviates any need to break this unsafe code. I am not sure if things have changed for lang now given my previous statements about footguns and dyn-type elision rules. Before going through with this I would like for lang to fully commit to either supporting such casts as an actual language feature, or only as a migration hack with a commitment to break it going forward across the 2025 edition (with the understanding that it may not be possible to FCW or auto-fix). Generally I would like to avoid going forward with this with a vague handwaving of "we could potentially break this over an edition" to assuage concerns of teachability/etc, and then potentially wind up with lang deciding not to do so. On the general consistency of pointer casts here. Lang has already forbidden casting If lang wishes to extend the power of On the other hand if this is only intended as a migration strategy then this inconsistency seems fine to me as we expect all code to either be legacy (i.e. no longer maintained) or updated to a newer edition where there is no inconsistency. Finally, reading the meeting notes I see that there were parallels drawn between raw pointer derefs being the same syntax as derefing references and how one is safe and the other not. Admittedly I do see the parallel here but I think it's worth pointing out that it is significantly easier to determine whether one is dereferencing a pointer or reference, in comparison to determining whether a lifetime has been extended. I think lang is aware of this hence the discussion of the lint and avoiding footguns? I think this is worth revisiting in light of previous statements about footguns/the lint. I would also like to note that r-a currently has the ability to highlight unsafe operations in a specific colour to make it more clear when safety invariants are introduced in unsafe code. I don't know how well that can be supported with this change where all Do we expect r-a to be able to figure this out or just conservatively consider all of these I generally find it hard to go along with the idea that its "just" allowing a new operation to be performed inside of I think the fact that there is ~no world in which the compiler impl will align with this mental model of "its a new operation allowed in unsafe" should be a good signal that it's not the right mental model. |
Separately from my previous comment I want to just say that the The (If you look at the crater regressions there are maybe a dozen due to |
@Veykril Could you weigh in on the technical feasibility of both detecting this new class of |
Thanks for the feedback everyone. I was the one primarily pushing to avoid unnecessary breakage in this case. That is conditional on the feasibility and maintainability of a solution, and @compiler-errors makes a compelling case that it would be neither. No one thought we were proposing changes to MIR borrow check! @BoxyUwU may be able to confirm/deny if that is what she had in mind. I think we've heard enough to convince me that this is not a path worth going down now. It sounds like the lint mitigation was based on some faulty assumptions. Upon reflection, I agree with @BoxyUwU that we should have consensus within the lang team to walk this safe/unsafe difference back over an edition before releasing it. Deciding this is infeasible or a maintenance burden, or being convinced (as others seem to be) that the breakage is very minimal, would be enough as well. In this particular case I think the breakage is unfortunate but workable, so I am okay to move forward. I don't speak for the rest of the lang team in this comment, but no one spoke up about wanting to avoid the breakage as much as I did. @Manishearth You mentioned an FCW; my understanding has been that an FCW in this case is infeasible. I do agree with your assertion that this would make unsafe reviews harder.
@compiler-errors That wasn't my understanding from the discussion, which included a 6yr old version of diesel. Was that relying on this as an unstable feature until recently, or what am I missing? |
@tmandry: I guess I was a bit misled when I said we only recently started to allow this behavior. What I should have said is that we only recently made this behavior more relaxed beginning with #113262. That PR made casts where nothing but the lifetime was changing work, i.e. But there are rare cases where this behavior coincidentally worked when there were more than 1 casts chained together and between the two casts more than the lifetime changed in the casted type. That old diesel one is an example. Boxy and I worked out a few examples where this was allowed even pre 1.75 which is when that linked PR landed. This is the moral equivalent of the diesel example:
And this is the moral equivalent of the brainfuck one:
Both of these examples worked before 1.75 because they were emitting something that was a non-trivial pointer cast. But they still are definitely violations of the vtable validity here. I will note that we're still really funny with what we allow when doing raw pointer casts of wide pointers. For example, this errors today:
i.e. it was not fixed by #113262. You could argue that @BoxyUwU's PR here is making the language more consistent by always enforcing the correct lifetimes in wide pointers along these lines :) |
given rust-analyzer is still completely oblivious to lifetimes in our IRs I'd say very difficult. Though either way that feature is only meant to show what would be erroring if the unsafe block was missing, which reading from this, the |
On the substance here, we'll obviously take this back up on Wednesday. My estimate is that we're rather likely to readopt the original plan and to "stay the course" on that. We very nearly went that way last Wednesday. Speaking for myself, I think what's being done here, in terms of making PRs to the affected public projects and working out a diagnostic that guides people in the right direction, is what we need to do, and combined with the soundness arguments we've been making, is sufficient to justify and support this change. I appreciate -- and I know we all on lang appreciate -- the work @BoxyUwU did to make those PRs and is doing to put this all together. It's a nice additional benefit that we apparently only allowed the most likely kinds of this semi-recently and that this probably does make the language more consistent in the way that CE mentioned. Note about earlier notifications.(As an aside, if you're following notifications on this thread, you may have seen I earlier momentarily said some other things by way of analysis that turned out to not be correct and that I retracted soon after to try to avoid sending the thread on a tangent. Sorry about the noise there if you saw that; please disregard. Thought I was onto something; was not. Just was staring at it too long.) |
Fixes #136702
r? @ghost