Remove variable-length slice patterns from the language #144

ghost · 2014-06-27T19:52:13Z

Start Date: 2014-06-27
RFC PR #: (leave this empty)
Rust Issue #: (leave this empty)

Summary

Remove variable-length slice patterns from the language.

    fn is_symmetric(list: &[uint]) -> bool {
        match list {
            [] | [_]                   => true,
            [x, ..inside, y] if x == y => is_symmetric(inside),
            _                          => false
        }
    }

Motivation

Slice patterns do not fit in well with the rest of the supported pattern types. Slices are currently the only non-algebraic data type that can be destructured in a pattern.
To the best of my knowledge and understanding, they're also impossible to reason about, in all cases but the trivial (such as match expressions with only Cons-like ([first, second, ..tail]) or Snoc-like patterns ([..init, last])), when analyzing match arms for exhaustiveness and reachability, without incorporating an SMT solver.

Detailed design

Support for variable length slice patterns will be removed from match expressions. It will still be possible to pattern match fixed-length vectors as they do not suffer from the same two issues mentioned in the Motivation section.

Drawbacks

Slice patterns can be a convenient way of working with slices and vectors. In particular, they can be convenient when implementing tail-recursive algorithms that work on slices. Making this change will break users' code, however, there do not appear to be many uses of the feature in the wild. rustc contains 10 occurrences of slice patterns in a match expression. Servo uses a slice pattern once and it's a fixed-length slice pattern. coreutils uses them twice.

Alternatives

To address the second limitation without removing the syntax, slice patterns can be restricted to only allow Cons-like an Snoc-like patterns, where the variable-length part of the pattern appears at the very end or the very beginning of the pattern:

match x { [first, second, ..tail] => () }
match x { [..init, first, second] => () }

In addition, the two could not be intermixed together in a single column of a match expression.

The impact of leaving the feature intact will be that slice patterns will have to, in some scenarios, be excluded from the exhaustiveness analysis, which will result in match expressions that can fail at runtime, introducing a new class of runtime errors.

Unresolved questions

If the syntax was to be introduced again or preserved now to its full extent, what would be the best approach to implement an exhaustiveness/reachability analysis pass that's different than the current ML-like approach and correctly supports this flavour of patterns?

lilyball · 2014-06-27T20:00:49Z

Why are slice patterns bad? You said they can't be reasoned about, but why not? A slice pattern can only have one wildcard, which means that any given slice pattern is known to either be length == n or length >= n (depending on the presence of the wildcard). I don't see why the >= is any harder to reason about than the ==.

lilyball · 2014-06-27T20:04:48Z

I suppose supporting specific values in a variable-length slice pattern might be the hard part, e.g. [3, ..middle, 4] but that's not given as an example in the RFC. Is that where the complexity comes in?

If this RFC is accepted, I would very much like to see Cons (and preferably Snoc) patterns still allowed, as mentioned in the alternative.

In the second paragraph of the alternative, it says not removing the feature will require runtime failure in some cases. Is that if Cons patterns are still allowed, or is that if the RFC isn't accepted? Because if that applies to if Cons patterns are still allowed, isn't runtime failure the same issue with making no changes at all? And in that case, what does restricting to Cons patterns buy us?

ghost · 2014-06-27T20:10:46Z

@kballard It's easy with flat patterns but once you start matching substructures of slices, such as:

match x {
    [Some(Foo { a: false, .. }), None, ..xs, x] => (),
    [None, ..xs, None, None] => ()
}

we're in the SMT land.

Now, restricting to Cons/Snoc patterns allows us to use the current, simplified algorithm, in which a slice pattern [..xs, a, b] is really treated as Snoc(b, Snoc(a, Wild)) and [a, b, c] is treated as Cons(a, Cons(b, Cons(c, Nil))).

In the second paragraph of the alternative, it says not removing the feature will require runtime failure in some cases. Is that if Cons patterns are still allowed, or is that if the RFC isn't accepted?

No, if only Cons/Snoc patterns were to be allowed, then we can reliably determine the exhaustiveness/reachability properties of a match expression so no failure at runtime would ever be necessary. You're right, the wording isn't very clear there.

zwarich · 2014-06-27T20:11:41Z

In normal patterns over algebraic data types, the patterns only match things that are part of the structure of the value being matched, and any bindings directly bind those components. This simplifies both exhaustiveness checking and match compilation.

I haven't seen a use of variable length slice patterns that looks like idiomatic Rust code.

lilyball · 2014-06-27T20:19:36Z

If both Cons and Snoc patterns are supportable, why can't we support [a, b..., c] by treating it as the union of both a Cons and a Snoc pattern? It should be equivalent to [a, _, _...] & [_..., _, c] (with a theoretical & operator) with some special handling of the binding of the b... slice (which is fine, that just requires knowing the range of the subslice and we definitely know that).

ghost · 2014-06-27T20:42:26Z

@kballard So the codegen for that is trivial, agreed. But I understand that for the exhaustiveness check, the & operator you're proposing is what requires an SMT solver. But maybe someone with an actual PLT background could chime in and confirm. :-)

zwarich · 2014-06-27T20:50:39Z

@kballard You then end up with a disjunction of conjunctions of patterns, which makes exhaustiveness checking difficult. In the case where all of the conjunctions are only 2 patterns, there is probably a way to develop an efficient algorithm based on 2SAT, but it would be more complex than the current exhaustiveness checker in entirety.

lilyball · 2014-06-27T20:55:48Z

I see.

Well, I suppose I'm down with this as long as Cons/Snoc patterns are still supported. They may be uncommon, but if we can do exhaustiveness checking with them, then I think we should keep them.

zwarich · 2014-06-27T21:11:38Z

@kballard Is there a use of Cons/Snoc patterns that would be considered idiomatic Rust code? I haven't seen one.

zwarich · 2014-06-27T21:25:45Z

(And you also can't stay in the nice world of 2-SAT if you have nested slice patterns, since the conjunctions may grow beyond 2 formulas.)

bstrie · 2014-06-27T21:46:42Z

@zwarich, back when ~[T] patterns still existed, every command-line program that I wrote used these sorts of patterns to grab arguments out of os::args(). If, one day, we happen to bring vector patterns back, would the same restrictions as here apply?

But I suppose I don't understand your suggestion that these patterns are not idiomatic. There's a reason that we all applauded their addition in the first place, and lamented their loss as a result of the ~[T] fallout.

zwarich · 2014-06-27T22:03:02Z

@bstrie, the people I asked about this on IRC didn't seem to have many compelling use cases for slice patterns, and @jakub-'s stats from rustc, Servo, and coreutils seem to agree with this. Is it that they are only really useful in the bind-by-move case?

bstrie · 2014-06-27T22:59:54Z

I certainly care more about hypothetical vector patterns than slice patterns, yes.

SimonSapin · 2014-06-27T23:37:47Z

For what it’s worth, I’ve recently used (guided a new contributor to use) a slice patterns in Servo to dispatch on the length of the slice and deconstruct the elements at the same time. This code is very nice and terse.

Without slice patterns, this code would have to be something like if harness_args.len() >= 1 { harness_args[0] …, which is much more awkward.

SimonSapin · 2014-06-27T23:39:58Z

From the RFC:

Slices are currently the only non-algebraic data type that can be destructured in a pattern.

Are they? What does non-algebraic mean here? What other types are non-algebraic?

SimonSapin · 2014-06-27T23:41:08Z

they're also impossible to reason about […] when analyzing match arms for exhaustiveness and reachability, without incorporating an SMT solver.

What does the compiler do now? How is that bad?

lilyball · 2014-06-28T00:14:59Z

@zwarich As suggested by @bstrie, I think most of the uses of this sort of pattern were removed when we lost vector patterns. I certainly remember having to do that myself, and being saddened by the uglier code that resulted.

Slice patterns may be less common, but I would love to see vector patterns resurface, and they would presumably follow the semantics of slice patterns. So if slice patterns were removed now we'd just have to end up re-adding them anyway when we got vector patterns (or else be curiously inconsistent).

zwarich · 2014-06-28T00:33:21Z

@bstrie @kballard Is it even possible to have vector patterns with exhaustiveness checking now that vectors aren't a primitive in the language? As far as I can tell, no trait can express the logical property that the primitive matching functions must satisfy for correctness.

lilyball · 2014-06-28T01:18:11Z

@zwarich Well, we have no way right now to represent the ability to destructure an arbitrary data structure in order to move data out of it. But that's something we'd like to have. In the absence of that ability, we can certainly design traits to support being able to copy data, or being able to reference data.

One possible answer might look like the following:

trait Destructure<T> {
    fn len(&self) -> uint;
    fn subscript<'a>(&'a self, x: uint) -> &'a T;
}

trait DestructureMut<T>: Destructure<T> {
    fn mut_subscript<'a>(&'a mut self, x: uint) -> &'a mut T;
}

Although this is a bit handwavy, because mutable destructuring (which would let you use ref mut x) would have issues with borrowck, so there would need to be some way to deal with basically subscripting multiple indexes simultaneously and getting multiple non-overlapping output references, and doing so safely. This could just be handled by the compiler specially and documenting that the implementation of mut_subscript needs to return a reference that doesn't alias with any other subscripts. Ideally, if we had generic integral parameters, we could do

fn mut_subscript<'a, int N>(&'a mut self, indexes: [uint, ..N]) -> [&'a mut T, ..N];

(with the guarantee that the same index will not occur twice in the input array).

Another approach is an Iterator-based one, which would allow for moving out of the data structure, but some care would have to go into it to deal with the variable-length slice binding (although when talking about custom data structures, that would presumably actually bind to a new instance of the data structure, rather than binding to a slice). This also has the downside that it may have to iterate twice, once for the matching and then a second time if there are any move or mut bindings (because the first time would just use normal references).

zwarich · 2014-06-28T02:12:49Z

@kballard The problem of how to safely move multiple elements out of a vector without breaking the soundness of the language is interesting. I don't think that vector patterns are nearly important enough to warrant a soundness hole.

However, I was thinking of a more subtle problem, which occurs even if you extend by-ref matches to user-defined data structures. In the code

match v {
    [Some(_), ...] => { ... }
    (arbitrary patterns)
    [None(_), ...] => { ... }
}

how do you ensure that the match is exhaustive? If the generated code is calling a function to get each array element, it may not get the same element the second time it calls the function. You could try caching the results of each function call, but then that means independent match arms don't have independent semantics.

This actually gets worse when combined with the problem that you mentioned. Consider a hypothetical by-move binding out of the interior of a value of sum type in a vector, e.g.

match v {
    [Some(a), Some(b), ..xs] => { ... },
    _ => { ... }
}

When matching the first arm, the generated code has to capture a pointer to the memory it is going to move if the arm is successful, since if it calls a function to get a reference again, it may not even be a pointer to a Some(_) value. Moreover, this memory can potentially change value (e.g. it could just be implemented as a Cell) when evaluating the match for the second element.

The more I think about vector patterns in a world where vectors are implemented as a library, the more broken they seem.

bill-myers · 2014-06-28T02:16:00Z

Is exhaustiveness really hard?

Let a be the maximum number of terms before the variable length pattern among match arms, b be the maximum number of terms after the variable length pattern among match arms and c the maximum number of terms among match arms without variable length patterns.

Now transform the match to first match on the length, and generate a case for each length <= max(a + b, c), which of course can be treated as matching on a tuple.

Then, for larger lengths only the patterns with variable length patterns can match, and you can reduce all match arms to have exactly a terms, a variable length pattern, and b terms (by adding _ terms as needed), and then treat the match as matching on an (a + b + 1)-tuple.

This is of course exponential in the nesting levels (where nesting means that the terms themselves are slices and are matched), but nesting is rare.

lilyball · 2014-06-28T02:22:36Z

@zwarich

If the generated code is calling a function to get each array element, it may not get the same element the second time it calls the function.

That's a good point, one I hadn't considered before. I can think of 3 solutions:

Require unsafe to implement this functionality, and put the burden on the implementor to make sure they follow the rules. Not really very good, but it has the benefit of no runtime overhead.
After finding a match, re-validate that it matches when moving out of the data structure. If a function is written badly to return a different value the second time, this could trigger a runtime failure, which seems like a reasonable response to a malicious implementation.
Provide a stack array of the size of appropriate size and move the elements into that, before testing. This allows the compiler to make guarantees itself e.g. that the item won't change from under it. I'm not sure if this approach can support Cons/Snoc patterns though (at least, not patterns that need to bind the wildcard).

Note that this isn't really an exhaustiveness issue. Exhaustiveness concerns whether the switch covers all possible values that could be produced. This is determined at compile-time without actually running any code. The problem with the data structure returning a different value the second time it's asked for a subscript really is only a problem for actually destructuring the value at runtime; if it returns a Some(_) the first time, and a None the second time, the contained value can't be extracted from the None. But I think it's reasonable to simply fail at runtime under these circumstances.

Note that if we're talking about actually matching a specific value, it doesn't really matter if it returns a different nonmatching value at runtime, because that exact same problem could occur if you merely asked it manually for that same value in the body of the case, and that's not related to the correctness of the switch. The only real problem is if the structure of the value changes.

pcwalton · 2014-06-28T02:40:10Z

+1, too much complexity. We need the language to stay simple.

zwarich · 2014-06-28T02:54:16Z

@bill-myers If you have an n-tuple of Option values, then you can view patterns of the form (Some(_), None, ...) as being P1 and (not P2) and ... with an atomic proposition for each entry in the tuple. This means that a pattern match is a disjunction of these conjunctions, and to check exhaustiveness, you can check the satisfiability of the negation, a conjunction of disjunctions of possibly negated atomic propositions. Thus the problem is NP-complete in the worst case, even without slice patterns, and the superpolynomial growth occurs with the size of the tuple, not the amount of nesting. I guess the approach that is used for checking matches never hits this worst case, and it probably wouldn't hit it for slice patterns either. Not sure why I missed that this applies to the current matches...

bill-myers · 2014-06-28T03:29:50Z

@zwarich You are right, as long as you include _ in addition to Some(_) and None (also, it's simpler to use bool instead of Option): determining the exhaustiveness of pattern matching on n-tuples of bool is co-NP-complete...

This means that including variable-length slice patterns should not make things much worse, so they shouldn't be removed because of that.

zwarich · 2014-06-28T03:40:48Z

@kballard Exhaustiveness of pattern matching in other languages with pattern matching is synonymous with absence of failure. It is also important that the compiler's exhaustiveness checking guarantees that at least some case will match at runtime, not in some idealized world that might not match actual program behavior.

glaebhoerl · 2014-06-28T14:48:28Z

I would prefer to make our support for pattern matching arrays more expressive, not less. With something like RFC PR #101, native language support for manipulating mutable and immutable arrays and array slices in the safe subset of the language could be considered complete. That seems kind of important for a systems language and a competitor to C. It would obviate the need for, and be more flexible and convenient than, current unsafely implemented functions like mut_split_at.

Slices are currently the only non-algebraic data type that can be destructured in a pattern.

On its own, this argument is utterly unpersuasive to me. Does this fact have any consequences? If not, the question should be whether the feature is useful and well-behaved, not whether it fits into an arbitrary categorization.

If supporting the feature caused the implementation to become intractable, that would be a persuasive argument, but per some of the above comments, I'm not sure if that holds up, either.

ghost · 2014-06-28T14:56:52Z

@glaebhoerl Yeah, I agree. I think it was a mistake to drive a language discussion with implementation details. Apologies. In any case, my PR rust-lang/rust#15186 now supports sub slices of any kind properly so the whole argument on my part is moot anyway.

ghost · 2014-06-30T05:59:26Z

Actually, at this point, I don't really have good reasons for why they should be removed, in which case I'll close this and if someone does, feel free to reopen and drive.

…n-tutorial Rewording a sentence in TUTORIAL.md

Add Ember issue to 0136-contains-to-includes.md

Create 0000-remove-slice-patterns.md

be0f162

Update 0000-remove-slice-patterns.md

ca228d5

ghost mentioned this pull request Jun 27, 2014

Change exhaustiveness analysis to allow a pattern to specialize with multiple constructors rust-lang/rust#15186

Merged

ghost closed this Jun 30, 2014

ghost mentioned this pull request Jul 15, 2014

Add RFC to feature gate some slice patterns #164

Merged

withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017

Merge pull request rust-lang#144 from echochamber/reword-a-sentence-i…

200e757

…n-tutorial Rewording a sentence in TUTORIAL.md

wycats pushed a commit to wycats/rust-rfcs that referenced this pull request Mar 5, 2019

Merge pull request rust-lang#144 from bmeurant/master

46ce562

Add Ember issue to 0136-contains-to-includes.md

This pull request was closed.

Remove variable-length slice patterns from the language #144

Remove variable-length slice patterns from the language #144

Uh oh!

Conversation

ghost commented Jun 27, 2014

Summary

Motivation

Detailed design

Drawbacks

Alternatives

Unresolved questions

Uh oh!

lilyball commented Jun 27, 2014

Uh oh!

lilyball commented Jun 27, 2014

Uh oh!

ghost commented Jun 27, 2014

Uh oh!

zwarich commented Jun 27, 2014

Uh oh!

lilyball commented Jun 27, 2014

Uh oh!

ghost commented Jun 27, 2014

Uh oh!

zwarich commented Jun 27, 2014

Uh oh!

lilyball commented Jun 27, 2014

Uh oh!

zwarich commented Jun 27, 2014

Uh oh!

zwarich commented Jun 27, 2014

Uh oh!

bstrie commented Jun 27, 2014

Uh oh!

zwarich commented Jun 27, 2014

Uh oh!

bstrie commented Jun 27, 2014

Uh oh!

SimonSapin commented Jun 27, 2014

Uh oh!

SimonSapin commented Jun 27, 2014

Uh oh!

SimonSapin commented Jun 27, 2014

Uh oh!

lilyball commented Jun 28, 2014

Uh oh!

zwarich commented Jun 28, 2014

Uh oh!

lilyball commented Jun 28, 2014

Uh oh!

zwarich commented Jun 28, 2014

Uh oh!

bill-myers commented Jun 28, 2014

Uh oh!

lilyball commented Jun 28, 2014

Uh oh!

pcwalton commented Jun 28, 2014

Uh oh!

zwarich commented Jun 28, 2014

Uh oh!

bill-myers commented Jun 28, 2014

Uh oh!

zwarich commented Jun 28, 2014

Uh oh!

glaebhoerl commented Jun 28, 2014

Uh oh!

ghost commented Jun 28, 2014

Uh oh!

ghost commented Jun 30, 2014

Uh oh!

Uh oh!