-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: new lifetime elision rules #141
Conversation
* If there is exactly one input lifetime position (elided or not), that lifetime | ||
is assigned to _all_ elided output lifetimes. | ||
|
||
* If there are multiple input lifetime positions, but one of them is `&self` or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this rule a bit surprising (the others make perfect sense). I can intuitively see the motivation that self ought to be privileged but, I can't really justify why that is so. Looking at the examples below, the ones using this rule took me a lot longer to grok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale is that in several usage surveys, this was essentially the only pattern we saw when &self
was involved.
I believe that the reason for this is that when you're borrowing something out of self
, it makes sense to involve another ref for computation. In contrast, it's a very unusual pattern to borrow something out of a value as a method of some other object. It's just not really how people think about using methods and objects in general, so it doesn't happen (almost at all).
I suspect that in cases where this pattern could occur, people use standalone functions instead of methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wycats, what proportion of the cited 87% would be lost if this rule were not accepted? I don't personally object to it, but I can see how it's a bit more flimsy than the others, and I would be willing to live without it if the statistics bore it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it should use the lifetime of the first input parameter, regardless of whether it is self or not, and only if it is an elided lifetime.
This avoids issues with UFC and makes method and non-method functions work the same.
Supporting elision of lifetimes only in the return value when they are explicit on self seems a bad idea, since it is counterintuitive. Also, it doesn't work for multiple explicit lifetimes (e.g. &'a Block<'b>).
👍 |
* For `impl` headers, input refers to the lifetimes appears in the type | ||
receiving the `impl`, while output refers to the trait, if any. So `impl<'a> | ||
Foo<'a>` has `'a` in input position, while `impl<'a> SomeTrait<'a> Foo<'a>` | ||
has `'a` in both input and output positions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the word for
is lacking from the second example. It’s not an obvious example of where the lifetimes are, either—it could be rewritten as the probably-fairly-nonsensical “impl<'a, 'b> SomeTrait<'a> for Foo<'b>
has 'a
in [the] output position and 'b
in [the] input position”.
(As for the “the”, I think that should be there in all these cases, or “an” as the case may be in some places. This affects much of the document.)
I'm nervous about adding elision for output parameters since I'm slightly concerned that may make things less clear (a minor adjustment to a signature that otherwise compiles would make the compiler spew weird errors), but I am in favour of elision in input position in impl BufReader { ... }
impl Reader for BufReader { ... }
impl Reader for (&str, &str) { ... } |
@huonw Do you mean output parameters in general, or just in impls? |
I don't like this rule. The other rules have the property that there's no other way the signature could possibly make sense: i.e., the desugaring is unambiguous. Here we're making an arbitrary choice. I don't think we should do that.
There's an additional subtlety: lifetime parameters of
Here If you just take that into account when applying the rules, then I think they would keep working. But I'm not sure what the situation is with invariant or bivariant lifetime parameters, because I haven't thought about it yet. |
OK, so in plain English, I think the rule should be: If there's exactly one readable lifetime and N writable ones, all the writable lifetimes are assumed to be the same as the readable one. Lifetime parameters in covariant position are readable, in contravariant writable, invariant both, bivariant neither. |
@huonw I think the proposed error messages will go a long way to avoid "compiling spewing weird error messages", no? |
I was originally a bit nervous about this sort of thing, but now I have no objections. I'm slightly more nervous about the |
can avoid writing any lifetimes in ~87% of the cases where they are currently | ||
required. | ||
|
||
Doing so is a clear ergonomic win. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the biggest part of this proposal for me. (well, combined with the data that shows that it is)
A big 👍 from me. If the vast majority of code is doing something a certain way, then it's a good basis for making a rule. This should eliminate a lot of what is effectively boilerplate, and a good lifetimes tutorial / better errors will assist in the pedagogy sense. Also, if you like the lifetimes, you can keep writing them. |
@glaebhoerl Great point about contravariance, which I hadn't thought about. I agree that a contravariant argument should not be considered as an input position. Just to be clear, is the suggestion that contravariant positions swap the input/output distinction? (Which would be the typical type-theoretical thing to do.) Concretely, are you proposing that fn some_fn(&self, cb: Callback) -> int;
fn other_fn(n: int) -> (&T, cb: Callback); expands to fn some_fn<'a>(&'a self, cb: Callback<'a>) -> int;
fn other_fn<'a>(n: int) -> (&'a T, cb: Callback<'a>) The first case makes some sense, but the latter case is pretty surprising -- it would happen because the We could also simply disallow eliding contravariant lifetimes, since it may be preferable to be explicit in those (rare) cases. Finally, see @wycats's comment above re: the |
Just posting to express my support for this well written RFC. With the proposed error messages there should be little confusion when an user first encounters unelidable lifetimes. |
Even thinking about this example makes my head hurt... I think the "logic" of it, as it were, is that when the caller of One distinction that I noticed, and I'm not sure if it has significance, is that while the return type of a function I basically agree with you that it seems reasonable-but-not-imperative to desugar your first example, but not so much the second one. I don't have any concrete rules in mind which might accomplish this.
To avoid getting caught up in debating the meaning of the word "arbitrary" (I wasn't assuming that you flipped a coin): For the first and second rules, there's only one way it can make sense. If the user were to explicitly annotate lifetimes, they would annotate the same ones we infer 100% of the time. For the third rule, there's more than one way it can make sense, and we'd be choosing to favor one of them. Even if our favoring rests on a stronger basis than a coin flip, I don't think this kind of "probably what you meant" inference is something we should be doing. |
@glaebhoerl Thanks for the thoughtful comments. My feeling about The rules are simple enough that it's easy to know, given the signature in your head, whether you can elide or not. Put another way, the debate is whether fn foo(&self, t: &T) -> &U; is simply not allowed/usable as a signature, or whether it has a useful meaning based on the most common lifetime patterns. Once you know the rules, you know immediately that the above would expand into fn foo<'a,'b>(&'a self, t: &'b T) -> &'a U; and would only write the elided signature if that's what you wanted. FWIW, I disagree that the other rules give the only sensible expansion. Not even today's rules do. If you write fn bar(t: &T, u: &U); you get distinct lifetimes for the two parameters. But it can also make sense for them to share the same lifetime, and some uses would require it. In that situation, you know you can't leave off the lifetimes, and you write an explicit signature. I think the same would be true with the |
|
||
The error case on `impl` is exceedingly rare: it requires (1) that the `impl` is | ||
for a trait with a lifetime argument, which is uncommon, and (2) that the `Self` | ||
type has multiple lifetime arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this example arise today in any known Rust codebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bstrie I don't know of any cases offhand, which is why the error message here is probably not so important.
Above I draw a comparison between lifetime elision and type inference, and how the great thing is that people who choose to be explicit are still welcome to manually annotate lifetimes. However, there is one thing that would support the people who make such a decision and improve teachability for newcomers: make the |
Quick question: Would it be feasible to handle the multiple-input case by having something like
expand to
While such a shorthand would be mostly orthogonal to the elision rules of this RFC, I bring it up because it seems like it could impact whether we want to treat
Also, I realize the lookup rules would take some consideration if this were to be implemented, since lifetimes and parameter names are currently in different namespaces. |
@aturon You're right. Now we have the interesting situation that you've shown that my stated arguments against "the I think a large part of it is because of the fact that I don't think we should semantically/syntactically distinguish the |
@bstrie |
@glaebhoerl I didn't mean to turn this discussion into a defense of rule 3; I'm also trying to understand better the relationships between the rules. I'm sorry I didn't make that more clear. (Text is hard.) Let me try again. The initial question was whether I see a qualitative difference between rule 3 and the others. I do not, myself. But I can see how someone with a different perspective on methods (which I think you have?) would feel differently. My general perspective on the rules is that they are simply shorthand, providing carefully-chosen defaults. Defaults are always heuristic and connected to common patterns of thought and code. As with any defaults, in a purely semantic sense the rules are arbitrary, because there are other valid (and sometimes useful) lifetime assignments that the language allows. As heuristics, the rules have a clear quantitative basis. I think what's up for grabs is the qualitative basis -- how do they "feel", how well do they match our intuitions? The intuitions that @wycats and I were most interested in come from borrowing/ownership, as opposed to lifetimes. If you write fn foo(x: &Foo) -> &Bar you know the function takes in borrowed data and produces borrowed data. The simplest intuition is that the output borrow takes its ownership from the input borrow. It's then not a hard conceptual leap to say that the borrowed ownership of the output is only good for as long as the input's was -- we hope that the elided form can build intuitions about borrowing that lead naturally into the mechanics of lifetimes. I feel similarly about methods. I'm using a method to access or otherwise manipulate the receiver, so all things being equal I expect any output borrows to flow from my borrow of the receiver. Does that help? |
What was the argument against @bill-myers suggestion of using the first input lifetime? That covers more cases for regular functions and rule 3 falls out for free. It's not a particular deep or profound unifying principle, but it's simple and seems less ad-hoc. |
First input lifetime seems a bit more ad-hoc, as strange as it sounds. Methods are special, and "First input lifetime" will also cause some possibly surprising behavior in Overall, "lifetime of |
Under the currently proposed rules, |
@jfager Hrm, you're right. I hadn't considered the I think that, due to rule 3, it may be reasonable to adjust the rules such that That said, this particular case is I think something of an edge case, and I would not consider it a serious problem if the rules are left unchanged. |
It's an edge case but now that it's come up I think it gets right at the discomfort of the current set of rules. The justification for rule 3 is 'methods are special', but this interaction with rule 2 says 'but maybe not that special'. They should either be uniform, or they should be different; it's straddling the fence that feels odd. "You may elide lifetimes; output lifetimes are assigned the first input lifetime" is arbitrary and there's not a great intuitive reason it should be true, but it's uniform between fns and methods, and despite its arbitrariness it's simple and easy to understand, and it ends up giving you the same code and behavior in all but one of the examples given in this RFC, "Elided output lifetimes take the lifetime of self for methods, or the lifetime of a sole input lifetime for functions" is similarly straightforward and simple, but treats methods and fns clearly differently. I could get behind either. *Edit: sorry, posted early. |
@aturon Yes, that's closer what I was trying to get at. (Though I was also wondering if there might be some drier, more formal formulation of our intuitions.) How does rule 1 fit into these intuitions about borrowing, i.e. why is it more intuitive for each input lifetime to be different rather than tied together? |
result types. So `fn foo(s: &str) -> (&str, &str)` has elided one lifetime in | ||
input position and two lifetimes in output position. | ||
|
||
* For `impl` headers, input refers to the lifetimes appears in the type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trait definitions themselves are also a form that offers lifetime positions. That may or may not be relevant (I'll be posting a question about that soon -- see a few lines up), but should probably be addressed explicitly.
Explicitly note that lifetimes from the `impl` (and `trait`/`struct`) are not considered "input positions" for the purposes of expanded `fn` definitions. Added a collection of examples illustrating this. Drive-by: Addressed a review comment from @chris-morgan [here](rust-lang#141 (comment)).
@glaebhoerl In the absence of any lifetime variables in return types, the assignment of distinct lifetime parameters is the most general type that can be given. That is probably the intuition that is at play here. Of course, once return types with lifetime variables get involved, then this no longer applies, but everyone agreed that this case was broken today anyways. |
I was thinking that maybe elided lifetimes in arguments of higher-order function parameters should be desugared to higher-rank lifetimes, because that's usually what you want:
=>
The question is, given that closures are going to be merely trait objects, how could we properly generalize this? (There may or may not be an easy answer; I've spent approximately two minutes thinking about it.) |
@glaebhoerl Why does it matter in that particular case? The only lifetimes that can ever be passed to I assume you were thinking of another case where it does matter? |
Maybe the example was bad. But:
This is not true, because But to amend, imagine this:
Now there are two But the point is really that in general, do you ever want the lifetimes of the arguments of an argument function to be pre-determined by lifetime parameters on the outer HOF, instead of the (strictly-)more-general formulation where the argument function itself is parameterized over them? |
@glaebhoerl My (potentially mistaken) assumption is that the legacy closures actually have higher-rank lifetimes, even though it isn't a feature exposed independently in the type system, and rust-lang/rust#15067 is tracking exposing that to the new unboxed closures. This code type-checks: fn print_two_with<'a, 'b>(text1: &'a str, text2: &'b str, printer: |&str|) {
if true {
printer(text1);
} else {
printer(text2);
}
} whereas this code does not: fn print_two_with<'a, 'b>(text1: &'a str, text2: &'b str, printer: |&'a str|) {
if true {
printer(text1);
} else {
printer(text2);
}
} |
@glaebhoerl The current plan is that the elision rules apply recursively for the sugared form of unboxed closure types (i.e., the |
Explicitly note that lifetimes from the `impl` (and `trait`/`struct`) are not considered "input positions" for the purposes of expanded `fn` definitions. Added a collection of examples illustrating this. Drive-by: Addressed a review comment from @chris-morgan [here](rust-lang#141 (comment)).
👍 |
After this change, would there be any lifetime elision rules for lifetimes that appear in the head of a lambda expression (anonymous function)? |
Rendered (draft)
text/
tracking issue: rust-lang/rust#15552
Note: the core idea for this RFC and the initial survey both came from @wycats.