Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: new lifetime elision rules #141

Merged
merged 4 commits into from
Jul 9, 2014
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
279 changes: 279 additions & 0 deletions active/0000-lifetime-elision.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,279 @@
- Start Date: (2014-06-24)
- RFC PR #: (leave this empty)
- Rust Issue #: (leave this empty)

# Summary

This RFC proposes to

1. Expand the rules for eliding lifetimes in `fn` definitions, and
2. Follow the same rules in `impl` headers.

By doing so, we can avoid writing lifetime annotations ~87% of the time that
they are currently required, based on a survey of the standard library.

# Motivation

In today's Rust, lifetime annotations make code more verbose, both for methods

```rust
fn get_mut<'a>(&'a mut self) -> &'a mut T
```

and for `impl` blocks:

```rust
impl<'a> Reader for BufReader<'a> { ... }
```

In the vast majority of cases, however, the lifetimes follow a very simple
pattern.

By codifying this pattern into simple rules for filling in elided lifetimes, we
can avoid writing any lifetimes in ~87% of the cases where they are currently
required.

Doing so is a clear ergonomic win.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the biggest part of this proposal for me. (well, combined with the data that shows that it is)


# Detailed design

## Today's lifetime elision rules

Rust currently supports eliding lifetimes in functions, so that

```rust
fn print(s: &str);
fn get_str() -> &str;
```

become
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

becomes. and isn't this backwards? To elide is to remove, so the ones with the rules become the ones without the rules.


```rust
fn print<'a>(s: &'a str);
fn get_str<'a>() -> &'a str;
```

The ellision rules work well for functions that consume references, but not for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ellision/elision/

functions that produce them. The `get_str` signature above, for example,
promises to produce a string slice that lives arbitrarily long, and is
either incorrect or should be replaced by

```rust
fn get_str() -> &'static str;
```

Returning `'static` is relatively rare, and it has been proposed to make leaving
off the lifetime in output position an error for this reason.

Moreover, lifetimes cannot be elided in `impl` headers.

## The proposed rules

### Overview

This RFC proposes two changes to the lifetime elision rules:

1. Since eliding a lifetime in output position is usually wrong or undesirable
under today's elision rules, interpret it in a different and more useful way.

2. Interpret elided lifetimes for `impl` headers analogously to `fn` definitions.

### Lifetime positions

A _lifetime position_ is anywhere you can write a lifetime in a type:

```rust
&'a T
&'a mut T
T<'a>
```

As with today's Rust, the proposed elision rules do _not_ distinguish between
different lifetime positions. For example, both `&str` and `Ref<uint>` have
elided a single lifetime.

Lifetime positions can appear as either "input" or "output":

* For `fn` definitions, input refers to argument types while output refers to
result types. So `fn foo(s: &str) -> (&str, &str)` has elided one lifetime in
input position and two lifetimes in output position.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an fn method definition, i.e. one that occurs in the scope of an impl block or as the default method in a trait item, are the lifetimes that occur in the implementing type (in the former case) or the trait (in the latter case) also considered to be input positions? (Or perhaps all of the lifetimes bound by impl<'a,'b,...> are part of the input positions? Or perhaps none of them are?)

In other words, is a method considered to be in the scope of its impl header for the purposes of lifetime elision?

(I will follow up to this comment with a concrete set of examples elaborating my question in a moment.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, here is a gist with my attempt to survey the space here: https://gist.github.com/pnkfelix/a4054e51400152c63714

It could well be that the intent is (and has always been) to not consider an impl header in scope for lifetime elision on methods. But if so, this needs to be spelled out explicitly in the RFC itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I guess since this was already merged I should instead open an issue against it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah and now I just saw @aturon 's comment here which explicitly confirms that the intent has been to not consider an impl header in scope for lifetime elision on methods.


* For `impl` headers, input refers to the lifetimes appears in the type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trait definitions themselves are also a form that offers lifetime positions. That may or may not be relevant (I'll be posting a question about that soon -- see a few lines up), but should probably be addressed explicitly.

receiving the `impl`, while output refers to the trait, if any. So `impl<'a>
Foo<'a>` has `'a` in input position, while `impl<'a> SomeTrait<'a> Foo<'a>`
has `'a` in both input and output positions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the word for is lacking from the second example. It’s not an obvious example of where the lifetimes are, either—it could be rewritten as the probably-fairly-nonsensical “impl<'a, 'b> SomeTrait<'a> for Foo<'b> has 'a in [the] output position and 'b in [the] input position”.

(As for the “the”, I think that should be there in all these cases, or “an” as the case may be in some places. This affects much of the document.)


### The rules

* Each elided lifetime in input position becomes a distinct lifetime
parameter. This is the current behavior for `fn` definitions.

* If there is exactly one input lifetime position (elided or not), that lifetime
is assigned to _all_ elided output lifetimes.

* If there are multiple input lifetime positions, but one of them is `&self` or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this rule a bit surprising (the others make perfect sense). I can intuitively see the motivation that self ought to be privileged but, I can't really justify why that is so. Looking at the examples below, the ones using this rule took me a lot longer to grok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale is that in several usage surveys, this was essentially the only pattern we saw when &self was involved.

I believe that the reason for this is that when you're borrowing something out of self, it makes sense to involve another ref for computation. In contrast, it's a very unusual pattern to borrow something out of a value as a method of some other object. It's just not really how people think about using methods and objects in general, so it doesn't happen (almost at all).

I suspect that in cases where this pattern could occur, people use standalone functions instead of methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wycats, what proportion of the cited 87% would be lost if this rule were not accepted? I don't personally object to it, but I can see how it's a bit more flimsy than the others, and I would be willing to live without it if the statistics bore it out.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it should use the lifetime of the first input parameter, regardless of whether it is self or not, and only if it is an elided lifetime.

This avoids issues with UFC and makes method and non-method functions work the same.

Supporting elision of lifetimes only in the return value when they are explicit on self seems a bad idea, since it is counterintuitive. Also, it doesn't work for multiple explicit lifetimes (e.g. &'a Block<'b>).

`&mut sef`, the lifetime of `self` is assigned to _all_ elided output lifetimes.

* Otherwise, it is an error to elide an output lifetime.

Notice that the _actual_ signature of a `fn` or `impl` is based on the expansion
rules above; the elided form is just a shorthand.

### Examples

```rust
fn print(s: &str); // elided
fn print<'a>(s: &'a str); // expanded

fn get_str() -> &str; // ILLEGAL

fn frob(s: &str, t: &str) -> &str; // ILLEGAL

fn get_mut(&mut self) -> &mut T; // elided
fn get_mut<'a>(&'a mut self) -> &'a mut T; // expanded

fn args<T:ToCStr>(&mut self, args: &[T]) -> &mut Command // elided
fn args<'a, 'b, T:ToCStr>(&'a mut self, args: &'b [T]) -> &'a mut Command // expanded

fn new(buf: &mut [u8]) -> BufWriter; // elided
fn new<'a>(buf: &'a mut [u8]) -> BufWriter<'a> // expanded

impl Reader for BufReader { ... } // elided
impl<'a> Reader for BufReader<'a> { .. } // expanded

impl Reader for (&str, &str) { ... } // elided
impl<'a, 'b> Reader for (&'a str, &'b str) { ... } // expanded

impl StrSlice for &str { ... } // elided
impl<'a> StrSlice<'a> for &'a str { ... } // expanded
```

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A by-value arg is not a lifetime position, so the following is legal?

fn foo(a: &str, b: int) -> &str

That is, it would possible to use multiple args and still have the lifetimes elided, right? I think the answer is yes but it's not shown in these examples.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfager Yes, that's right, and good point about the examples. Will update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add examples of methods within impls where the implementing trait and/or type has lifetime parameters itself, just to underline the scenarios I brought up in my comment here

## Error messages

Since the shorthand described above should eliminate most uses of explicit
lifetimes, there is a potential "cliff". When a programmer first encounters a
situation that requires explicit annotations, it is important that the compiler
gently guide them toward the concept of lifetimes.

An error can arise with the above shorthand only when the program elides an
output lifetime and neither of the rules can determine how to annotate it.

### For `fn`

The error message should guide the programmer toward the concept of lifetime by
talking about borrowed values:

> This function's return type contains a borrowed value, but the signature does
> not say which parameter it is borrowed from. It could be one of a, b, or
> c. Mark the input parameter it borrows from using lifetimes,
> e.g. [generated example]. See [url] for an introduction to lifetimes.

This message is slightly inaccurate, since the presence of a lifetime parameter
does not necessarily imply the presence of a borrowed value, but there are no
known use-cases of phantom lifetime parameters.

### For `impl`

The error case on `impl` is exceedingly rare: it requires (1) that the `impl` is
for a trait with a lifetime argument, which is uncommon, and (2) that the `Self`
type has multiple lifetime arguments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this example arise today in any known Rust codebase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bstrie I don't know of any cases offhand, which is why the error message here is probably not so important.


Since there are no clear "borrowed values" for an `impl`, this error message
speaks directly in terms of lifetimes. This choice seems warranted given that a
programmer implementing a trait with lifetime parameters will almost certainly
already understand lifetimes.

> TraitName requires lifetime arguments, and the impl does not say which
> lifetime parameters of TypeName to use. Mark the parameters explicitly,
> e.g. [generated example]. See [url] for an introduction to lifetimes.

## The impact

To asses the value of the proposed rules, we conducted a survey of the code
defined _in_ `libstd` (as opposed to the code it reexports). This corpus is
large and central enough to be representative, but small enough to easily
analyze.

We found that of the 169 lifetimes that currently require annotation for
`libstd`, 147 would be elidable under the new rules, or 87%.

_Note: this percentage does not include the large number of lifetimes that are
already elided with today's rules._

The detailed data is available at:
https://gist.github.com/aturon/da49a6d00099fdb0e861
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the 13% of functions which still require explicit lifetimes, do any seem particularly notable for their nonconformity to the usual patterns? It would also be really great if you could select one of these real-world functions and use it in the example error message above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all of the remaining cases are situations like:

impl<'a> AsciiCast<&'a[Ascii]> for &'a [u8] {
    fn unsafe fn to_ascii_nocheck(&self) -> &'a[Ascii] { ... }
    ...
}

where the impl involves types with lifetimes, and the fns within refer to those lifetimes directly. That counts against us in two ways:

  1. The impl header has to be annotated so that you can name the lifetime, even though it would otherwise follow the standard pattern, and
  2. The fn definitions have to be annotated to use the outer lifetime.

Note that this kind of example does not require an annotation according to the rules (so you wouldn't get an annotation error if you elided the lifetime). Rather, the annotation is needed to go beyond the patterns provided by the rule.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bstrie The other predominant case is:

fn difference<'a>(&'a self, other: &'a HashSet<T, H>) -> SetAlgebraItems<'a, T, H>;

where the two input lifetimes are required to match.

@glaebhoerl Take note -- this is a case where even the rules for input positions don't give you what you want.


# Drawbacks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another drawback: I find full specification of lifetime parameters makes it easier to understand what is going on. Even today, I often write the lifetimes where they could be elided because I think it makes code easier to reason about if you can name things. If I have a lifetime error, the first thing I do is add explicit lifetimes wherever they are missing.

I get the impression I'm in the minority with this though.

To me, these extra rules trade off easier reading (and writing) when you don't need to think about lifetimes too much against greater cognitive overhead when you do have to think about them. I guess that since reading code is more common than debugging lifetime errors, this trade off is worthwhile. I certainly like the idea of reducing lifetime noise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to full specs making things clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nick29581 It might be worth considering having the compiler optionally show you all of the inferred lifetimes when there are error messages that involve lifetimes: rustc --errors=expanded or something.

That said, I think the error message improvements in this proposal go a long way to making it obvious what has happened when you inappropriately elided a lifetime. Similar error message work around other lifetime errors would go a long way to improving the general ergonomics of explicit lifetimes as well, and we should work on that!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nick29581, that same argument can be made for type inference. Just like type inference, nothing is stopping you from being fully explicit with lifetimes if you deem it's better for readability.

## Learning lifetimes

The main drawback of this change is pedagogical. If lifetime annotations are
rarely used, newcomers may encounter error messages about lifetimes long before
encountering lifetimes in signatures, which may be confusing. Counterpoints:

* This is already the case, to some extent, with the current elision rules.

* Most existing error messages are geared to talk about specific borrows not
living long enough, pinpointing their _locations_ in the source, rather than
talking in terms of lifetime annotations. When the errors do mention
annotations, it is usually to suggest specific ones.

* The proposed error messages above will help programmers transition out of the
fully elided regime when they first encounter a signature requiring it.

* When combined with a good tutorial on the borrow/lifetime system (which should
be introduced early in the documentation), the above should provide a
reasonably gentle path toward using and understanding explicit lifetimes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I care about this.


Programmers learn lifetimes once, but will use them many times. Better to favor
long-term ergonomics, if a simple elision rule can cover 87% of current lifetime
uses (let alone the currently elided cases).

## Subtlety for non-`&` types

While the rules are quite simple and regular, they can be subtle when applied to
types with lifetime positions. To determine whether the signature

```rust
fn foo(r: Bar) -> Bar
```

is actually using lifetimes via the elision rules, you have to know whether
`Bar` has a lifetime parameter. But this subtlety already exists with the
current elision rules. The benefit is that library types like `Ref<'a, T>` get
the same status and ergonomics as built-ins like `&'a T`.

# Alternatives

* Do not include _output_ lifetime elision for `impl`. Since traits with lifetime
parameters are quite rare, this would not be a great loss, and would simplify
the rules somewhat.

* Only add elision rules for `fn`, in keeping with current practice.

* Only add elision for explicit `&` pointers, eliminating one of the drawbacks
mentioned above. Doing so would impose an ergonomic penalty on abstractions,
though: `Ref` would be more painful to use than `&`.

# Unresolved questions

The `fn` and `impl` cases tackled above offer the biggest bang for the buck for
lifetime elision. But we may eventually want to consider other opportunities.

## Double lifetimes

Another pattern that sometimes arises is types like `&'a Foo<'a>`. We could
consider an additional elision rule that expands `&Foo` to `&'a Foo<'a>`.

However, such a rule could be easily added later, and it is unclear how common
the pattern is, so it seems best to leave that for a later RFC.

## Lifetime elision in `struct`s

We may want to allow lifetime elision in `struct`s, but the cost/benefit
analysis is much less clear. In particular, it could require chasing an
arbitrary number of (potentially private) `struct` fields to discover the source
of a lifetime parameter for a `struct`. There are also some good reasons to
treat elided lifetimes in `struct`s as `'static`.

Again, since shorthand can be added backwards-compatibly, it seems best to wait.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'm fine with leaving structs as they are.