- Feature Name: autoref_match
- Start Date: 2016-08-12
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)
Better ergonomics for match (and similar expressions).
Prevent some of the boilerplate required for match expressions by auto- dereferencing the target expression, and 'auto-referencing' pattern variables. The rules for the latter are based closely on those for captured variables in closures.
For example, current:
match *x {
Foo(ref x) => { ... }
Bar(ref mut y, z) => { ... }
}
proposed:
match x {
Foo(x) => { ... }
Bar(y, z) => { ... }
}
Rust is usually strict when distinguishing between value and reference types. In particular, distinguishing borrowed and owned data. However, there is often a trade-off between explicit-ness and ergonomics, and Rust errs on the side of ergonomics in some carefully selected places. Notably when using the dot operator to call methods and access fields, and when declaring closures.
The match expression is an extremely common expression and arguably, the most
important control flow mechanism in Rust. Borrowed data is probably the most
common form in the language. However, using match expressions and borrowed data
together can be frustrating: getting the correct combination of *
, &
, and
ref
to satisfy the type and borrow checkers is a common problem, and one which
is often encountered early by Rust beginners. It is especially frustrating since
it seems that the compiler can guess what is needed but gives you error messages
instead of helping.
For example, consider the following program:
enum E { Foo(...), Bar }
fn f(e: &E) {
match e { ... }
}
It is clear what we want to do here - we want to check which variant e
is a
reference to. Annoyingly, we have two valid choices:
match e {
&E::Foo(...) => { ... }
&E::Bar => { ... }
}
and
match *e {
E::Foo(...) => { ... }
E::Bar => { ... }
}
The former is more obvious, but requires more noisey syntax (an &
on every
arm). The latter can appear a bit magical to newcomers - the type checker treats
*e
as a value, but the borrow checker treats the data as borrowed for the
duration of the match. It also does not work with nested types, match (*e,) ...
for example is not allowed.
In either case if we further bind variables, we must ensure that we do not attempt to move data, e.g.,
match *e {
E::Foo(x) => { ... }
E::Bar => { ... }
}
If the type of x
does not have the Copy
bound, then this will give a borrow
check error. We must use the ref
keyword to take a reference: E::Foo(ref x)
(or &E::Foo(ref x)
if we match e
rather than *e
).
The ref
keyword is a pain for Rust beginners, and a bit of a wart for everyone
else. It violates the rule of patterns matching declarations, it is not found
anywhere outside of patterns, and it is often confused with &
. (See for
example, rust-lang/rust-by-example#390).
Match expressions are an area where programmers often end up playing 'type Tetris': adding operators until the compiler stops complaining, without understanding the underlying issues. This serves little benefit - we can make match expressions much more ergonomic without sacrificing safety or readability.
Match ergonomics has been highlighted as an area for improvement in 2017: internals thread and Rustconf keynote.
I propose two mostly orthogonal improvements: match target auto-deref removes
the need to write *e
in the above example; match arm 'auto-ref' infers use
of ref
in patterns in match arms, allowing them to be elided. Both features
aim to make match
expressions easier to read and write by reducing
boilerplate.
The target expression (aka discriminant) in a match expression is expr
in
match expr { ... }
.
If the target expression is borrowed data, then the compiler will implicitly
dereference the target expression as many times as necessary to make every
pattern in every arm type check. Dereference here is equivalent to using the *
operator: either a primitive pointer dereference or calling the deref
method,
if a type implements the Deref
trait.
Auto-deref allows the programmer to access borrowed data within a limited scope without having to explicitly dereference the data. This is analogous to the implicit dereferencing performed by the dot operator, however, the scope of dereference is inline, rather than defined by a function.
More formally, let T0...Tn
be the types of the patterns of each arm in the
match expression. If the pattern is a wildcard or variable, we ignore its type.
Let expr
be the target expression.
I will use A <: B
to mean A
is a subtype of B
, and e: T
to mean that
expression e
has type T
. I use *^m
to mean applying the *
operator m
times where m >= 0
.
forall i in [0..n] Ti <: U
*^m expr: U
...
--------------------------
match expr { ... }: ...
If this rule can be satisfied with different values of m
(e.g., if none of the
arms constrain the type), then the compiler will choose the smallest value.
Note that this dereferencing allows the compiler to insert calls to the deref
method (if there is a Deref
implementation) for any intermediate type found
whilst dereferencing.
We do not permit auto-referencing. Since deref
is implemented for &self
, to
apply the auto-deref, expr
must be a borrowed reference. E.g., if expr
has
type Rc<Foo>
we will not implicitly dereference to Foo
, but we will if the
type is &Rc<Foo>
. This means that we never implicitly borrow owned data, which
should make this conversion easier to reason about. It also matches the
behaviour of deref coercions (although it is weaker than the dot operator
conversions).
A less conservative alternative would be to allow auto-referencing. I believe there are no technical obstacles. That would be more flexible and match the behaviour of the dot operator. However, I believe constraining to borrowed data makes reasoning easier. We can always adopt auto-referencing later.
The above implies that the patterns in each arm must have the same type. For example, the following match would not type check:
match x: &Option<_> {
Some(_) => {}
&None => {}
}
Currently, we might infer the type of the target expression from the type of the
patterns. In order to be backwards compatible, we must be able to fall back to
the type known from the patterns if type inference is stuck without that
information. I think this will only require choosing m = 0
if type inference
can't make progress.
The programmer should no longer have to annotate variables in
patterns with ref
or ref mut
; the compiler will infer that information from
the arm body following the same rules as for captured variables in closures (see
RFCs 114
and 231).
For each variable declared in a pattern, the compiler will inspect how that variable is used. If the variable is required to be treated by value, then the value is moved or copied. If the value is mutated (or a mutable reference to the variable is taken), then the variable is mutably borrowed. Otherwise, the variable is immutably borrowed. Similarly to closures, the type of the variable is not affected, only its move/borrow semantics at runtime and in the borrow checker.
In designing this feature we must be careful to ensure backwards compatibility
where the user does write ref
, ref mut
, or mut
.
If there is a ref
or ref mut
annotation on a variable, then that annotation
is obeyed and no further auto-referencing is performed. The compiler will not
infer mut
for a by-value variable even if it is mutated.
Example with mutability:
let mut i: i32 = 0;
match i {
j => j += 1, // `j` is treated as `ref mut j` since `+=` takes `j` as `&mut i32`.
}
println!("{}", i); // prints `1`.
match i {
j => j = j + 1, // ERROR: j is not mutable (since `j` is used by value here,
// the compiler will not treat `j` as `mut`).
}
To opt out of all implicit dereferencing in a pattern, we could allow the
programmer to prefix the pattern with the move
keyword. This would follow
closures and indicates that all variables are to be treated strictly as values.
I don't think that the move
opt-out is necessary and I don't think we should
support it unless it is required. It is necessary for closures where the
closure escapes the stack frame and therefore inference cannot be complete. An
analogous situation for match arms seems impossible.
Currently, to type check a match expression we:
- check the target expression,
- check the patterns on each arm (and check that the type matches the target expression (subtyping is kind of complex here, but it is close enough to assume that we do not take subtyping into account and that types must match exactly)),
- check each arm (and check that all arms have a common super- type and that that type is a sub-type of any expected type for the whole match).
To accommodate match arm auto-ref, step 3 is extended with a step to determine the move/borrow semantics of variables. After type checking the arm body, uses of variables are inspected if auto-ref should take place.
To see how a variable is treated, the compiler follows this algorithm:
- any variable declared
ref
is borrowed and no further adjustment is made. - any variable declared
mut
is by-value (and mutable) and no further adjustment is made. - if the type of a variable is
Copy
, it is not borrowed and is copied. - for any remaining variables, the match arm is inspected:
- if the target of the
match
expression is mutated or a mutable reference is taken in the body, all variables in the pattern have move semantics (we could also issue a lint warning in this case). - If the variable is ever mutated or found to have mutable reference type, then it is treated as borrowed and mutable.
- If the variable is ever used by value (and is not
Copy
), then it has move semantics. - Otherwise it is treated as borrowed.
- if the target of the
'Borrowed' here means that the variable is compiled to a pointer and the borrow
checker treats the variable as a reference. One can imagine a desugaring where
the variable declaration (x
) is desugared to ref x
and uses of the variable
to *x
.
In every case, the type of the variable does not change (it has the type as expected when treating the pattern as written). The behaviour of the variable changes only with respect to borrow checking, not type checking.
if let
, while let
, and for
expressions all have patterns similar to match
arms and suffer from the same ergonomic friction around ref
. In fact,
internally, the compiler treats all these expressions in the same way (they are
desugared early in compilation). It would therefore be trivial to extend both
proposed features to these other expressions. I believe it also makes intuitive
sense - if let
in particular is usually thought of like a single arm match
expression.
Alternatively, we could just support match
expressions. This would be more
conservative, but the only advantage is that if let
remains similar to let
and patterns in function declarations.
Since there is no opt-in to these features, we must be sure that there are no backwards compatibility concerns.
Target auto-deref is backwards compatible, since it will only apply to programs that do not type check today.
My reasoning that arm 'auto-ref' is backwards compatible is as follows:
- We are only concerned with programs that type check today - they must continue to do so, and give the same results.
- For any variable annotated with
ref
, there is no change to type checking, so there cannot be issues. - The concern is thus for variables declared without
ref
, where they were previously treated as by-value, but would be treated by-reference. If a variable has copy semantics, then it is not referenced, so we only need concern ourselves with moved variables. This means that in the worrying cases, the matched value cannot be accessed after the match (since we currently move the variable out of it).
Given these steps, we have considerably reduced the scope for differing semantics.
First, let us consider only immutable data. Following the rules for inferring ref-ness, we know that every use of a variable must be a reference. What if a variable use could be either a move or a reference? I believe the only ways this is possible is as the receiver of a method call or used in a match expression (as proposed in this RFC).
In the first case, the variable will be treated as by-value rather than by-
reference due to the method lookup rules and we won't infer a ref
. For
example, consider x: T
and m
where m
is declared for self: T
and self: &T
:
fn foo(x: T) {
match x {
x => x.m(),
}
}
Where x.m()
is used in an arm body we type check this with x: T
(we don't
consider any auto-referencing at this stage); when we type check the method
call, we use the implementation where self: T
because it is found before the
other implementation. That means that we consider x
to be used by value and we
do not infer any auto-borrowing.
In the second case, consider the following code where there are two different
implementations of foo
for T
and &T
. Current code will call T::foo
, we
must ensure that we do not infer a ref
for x
and thus call &T::foo
. In any
case where the compiler can type check without auto-deref'ing the target
expression it will do so (see the 'smallest m
' rule), therefore we will be
backwards compatible here.
match a: T {
x => {
match x {
y => y.foo(),
}
}
}
We might cause backwards incompatibility if we can observe the ref-ness via
mutation. By the same reasoning as above, such a situation would involve a
variable being moved, and so a difference could only be observed inside the match
arm. Mutation could occur by mutating the target expression or a pattern
variable. In the former case, we are protected by the special case about not
inferring ref
if the target expression is mutated directly or indirectly. In
the latter case, the variable must have been marked mut
, so again the ref-ness
will not be changed.
For example,
struct S(i32);
fn foo(x: S) {
match x {
y => {
x.0 += 1; // ERROR - x has been moved (because cannot infer `ref y`)
y.0 += 1; // ERROR - y is not mutable
}
}
// s is moved and cannot be accessed.
}
Finally, we might cause backwards incompatibility if a program that currently compiles will not do so with the proposed changes. We can apply much of the same reasoning as above. I believe the only way this could happen is where we reference a variable which was previously moved and this means that it (and any structure it belongs to) is borrowed. This could change the outcome of borrow checking. However, I believe the only way this could be observed is if the target expression is mutated, and this is covered by a special case rule.
To see why that rule is necessary, consider the following code:
let mut i = String::new();
match i {
j => {
g(&j);
i = "Hello".to_owned();
}
}
Under this RFC, &j
in g(&j)
would mean that we infer that j
is used by-
reference. However, this would mean that i
is borrowed for the duration of the
match, and thus the assignment to i
would be a borrow-check error. The special
case rule about the discriminant being mutated or mutably borrowed means we take
j
by-value.
Whenever considering relaxing Rust's rules around borrowing, we must consider safety. Could either of the proposed rules make Rust code less safe?
Target auto-dereference introduces implicit dereferencing. However, there is no
safety concern in the strict sense because the compiler still treats the data as
borrowed and therefore, for example, won't let data move out of the match
. We
do not permit implicit dereferencing of owned data (e.g., Box<_>
).
With match arm auto-ref, there is no hard threat to safety - the compiler is merely inferring some annotations, the borrow checker remains unchanged and cannot be compromised.
One risk is that errors become even more confusing - the compiler might complain
about borrowing violations without there being explicit dereferencing of the
target expression. I believe that these errors are less likely due to arm auto-
ref, and that when they do happen, there must be an explicit dereference in the
arm body to indicate an error. Furthermore, since the target must always be
borrowed, either its type or an explicit borrow (&
) should indicate that the
data is borrowed.
It is reasonable to argue that arm auto-ref makes match
more confusing
(because there is an additional subtle rule to understand). I don't think this
has proven to be a problem with closures, and I don't see why it should be with
match arms, but some experience with the feature when implemented will help us
decide.
Note that explicit borrows where pattern variables are used are still required, as they would be if this feature did not exist. I.e., there is still a clear indication in the source code that owned data becomes borrowed.
The major downside of this proposal is that it introduces implicit referencing and dereferencing, and in particular implicit borrowing of otherwise un-borrowed data (in the arm auto-ref case). This is generally something we avoid, however, there is some precedent in the dot operator and closures.
This is a fairly major change to the semantics of a common feature with no opt- in. Whilst I believe the proposal is backwards compatible (see above) and mostly only affects code which would not compile today, it is a bit unnerving.
This RFC introduces differences in the semantics of patterns in match
compared
with let
. On the other hand, there are differences already - allowing multiple
patterns separated with |
and if
guards in match arms.
As an alternative to this proposal, we could apply the same rules to let
as to
match arms. I think this should not be done because ref
in let
statements is
much rarer than in matches, and there is no explicit scope to limit checking of
uses to. These reasons make the facility less necessary and less appealing.
- We could support either target auto-deref or arm auto-ref without the other.
- We could opt-in to arm auto-ref, for example by using closure syntax:
match x {
Foo(x) => { ... } // No auto-ref
|Bar(x)| => { ... } // Auto-ref of x
move |Baz(x)| => { ... } // No auto-ref, equivalent to the first arm
// and so probably unnecessary.
}
There might be parsing issues since we use |
to separate patterns in match
arms. Note that this is a bit misleading - this RFC follows the rules for
variables implicitly captured by closures, not closure parameters.
- Target auto-deref could be done in a slightly different way: intuitively,
rather than inferring
*
s on the target, we infer&
s on the patterns in match arms. This is more flexible in that it allows different arms to be deref'ed a different number of times, and could be made to work with nested references. However, it cannot work with smart pointers such asRc
, at least not in any obvious way. It is also likely to be more complex to implement. - Ubiquitous auto-ref/deref. Rather than restricting auto-ref/deref to a few places in Rust, one could consider allowing it anywhere (or nearly anywhere). This might work, however, it would be a huge change, difficult to design, and close to impossible to ensure backwards compatibility. It therefore seems impractical.
The biggest open question is if the implementation will actually work. Some of the details in this RFC, particularly around backwards compatibility are subtle, and today's implementation of type checking for match is already complex.
The interaction of target auto-deref and closures (described above) or the analogous interaction with arm auto-ref (i.e., nested matches) is one area that makes me nervous.
Although I believe this proposal is backwards compatible, the fact that we are making big changes to how match arms are type-checked means there is scope for problems.
Can we extend auto-deref of the target expression to nested types?