Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove variable-length slice patterns from the language #144

Closed
wants to merge 2 commits into from
Closed

Remove variable-length slice patterns from the language #144

wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Jun 27, 2014

  • Start Date: 2014-06-27
  • RFC PR #: (leave this empty)
  • Rust Issue #: (leave this empty)

Summary

Remove variable-length slice patterns from the language.

    fn is_symmetric(list: &[uint]) -> bool {
        match list {
            [] | [_]                   => true,
            [x, ..inside, y] if x == y => is_symmetric(inside),
            _                          => false
        }
    }

Motivation

  1. Slice patterns do not fit in well with the rest of the supported pattern types. Slices are currently the only non-algebraic data type that can be destructured in a pattern.
  2. To the best of my knowledge and understanding, they're also impossible to reason about, in all cases but the trivial (such as match expressions with only Cons-like ([first, second, ..tail]) or Snoc-like patterns ([..init, last])), when analyzing match arms for exhaustiveness and reachability, without incorporating an SMT solver.

Detailed design

Support for variable length slice patterns will be removed from match expressions. It will still be possible to pattern match fixed-length vectors as they do not suffer from the same two issues mentioned in the Motivation section.

Drawbacks

Slice patterns can be a convenient way of working with slices and vectors. In particular, they can be convenient when implementing tail-recursive algorithms that work on slices. Making this change will break users' code, however, there do not appear to be many uses of the feature in the wild. rustc contains 10 occurrences of slice patterns in a match expression. Servo uses a slice pattern once and it's a fixed-length slice pattern. coreutils uses them twice.

Alternatives

To address the second limitation without removing the syntax, slice patterns can be restricted to only allow Cons-like an Snoc-like patterns, where the variable-length part of the pattern appears at the very end or the very beginning of the pattern:

match x { [first, second, ..tail] => () }
match x { [..init, first, second] => () }

In addition, the two could not be intermixed together in a single column of a match expression.

The impact of leaving the feature intact will be that slice patterns will have to, in some scenarios, be excluded from the exhaustiveness analysis, which will result in match expressions that can fail at runtime, introducing a new class of runtime errors.

Unresolved questions

If the syntax was to be introduced again or preserved now to its full extent, what would be the best approach to implement an exhaustiveness/reachability analysis pass that's different than the current ML-like approach and correctly supports this flavour of patterns?

@lilyball
Copy link
Contributor

Why are slice patterns bad? You said they can't be reasoned about, but why not? A slice pattern can only have one wildcard, which means that any given slice pattern is known to either be length == n or length >= n (depending on the presence of the wildcard). I don't see why the >= is any harder to reason about than the ==.

@lilyball
Copy link
Contributor

I suppose supporting specific values in a variable-length slice pattern might be the hard part, e.g. [3, ..middle, 4] but that's not given as an example in the RFC. Is that where the complexity comes in?

If this RFC is accepted, I would very much like to see Cons (and preferably Snoc) patterns still allowed, as mentioned in the alternative.

In the second paragraph of the alternative, it says not removing the feature will require runtime failure in some cases. Is that if Cons patterns are still allowed, or is that if the RFC isn't accepted? Because if that applies to if Cons patterns are still allowed, isn't runtime failure the same issue with making no changes at all? And in that case, what does restricting to Cons patterns buy us?

@ghost
Copy link
Author

ghost commented Jun 27, 2014

@kballard It's easy with flat patterns but once you start matching substructures of slices, such as:

match x {
    [Some(Foo { a: false, .. }), None, ..xs, x] => (),
    [None, ..xs, None, None] => ()
}

we're in the SMT land.

Now, restricting to Cons/Snoc patterns allows us to use the current, simplified algorithm, in which a slice pattern [..xs, a, b] is really treated as Snoc(b, Snoc(a, Wild)) and [a, b, c] is treated as Cons(a, Cons(b, Cons(c, Nil))).

In the second paragraph of the alternative, it says not removing the feature will require runtime failure in some cases. Is that if Cons patterns are still allowed, or is that if the RFC isn't accepted?

No, if only Cons/Snoc patterns were to be allowed, then we can reliably determine the exhaustiveness/reachability properties of a match expression so no failure at runtime would ever be necessary. You're right, the wording isn't very clear there.

@zwarich
Copy link

zwarich commented Jun 27, 2014

In normal patterns over algebraic data types, the patterns only match things that are part of the structure of the value being matched, and any bindings directly bind those components. This simplifies both exhaustiveness checking and match compilation.

I haven't seen a use of variable length slice patterns that looks like idiomatic Rust code.

@lilyball
Copy link
Contributor

If both Cons and Snoc patterns are supportable, why can't we support [a, b..., c] by treating it as the union of both a Cons and a Snoc pattern? It should be equivalent to [a, _, _...] & [_..., _, c] (with a theoretical & operator) with some special handling of the binding of the b... slice (which is fine, that just requires knowing the range of the subslice and we definitely know that).

@ghost
Copy link
Author

ghost commented Jun 27, 2014

@kballard So the codegen for that is trivial, agreed. But I understand that for the exhaustiveness check, the & operator you're proposing is what requires an SMT solver. But maybe someone with an actual PLT background could chime in and confirm. :-)

@zwarich
Copy link

zwarich commented Jun 27, 2014

@kballard You then end up with a disjunction of conjunctions of patterns, which makes exhaustiveness checking difficult. In the case where all of the conjunctions are only 2 patterns, there is probably a way to develop an efficient algorithm based on 2SAT, but it would be more complex than the current exhaustiveness checker in entirety.

@lilyball
Copy link
Contributor

I see.

Well, I suppose I'm down with this as long as Cons/Snoc patterns are still supported. They may be uncommon, but if we can do exhaustiveness checking with them, then I think we should keep them.

@zwarich
Copy link

zwarich commented Jun 27, 2014

@kballard Is there a use of Cons/Snoc patterns that would be considered idiomatic Rust code? I haven't seen one.

@zwarich
Copy link

zwarich commented Jun 27, 2014

(And you also can't stay in the nice world of 2-SAT if you have nested slice patterns, since the conjunctions may grow beyond 2 formulas.)

@bstrie
Copy link
Contributor

bstrie commented Jun 27, 2014

@zwarich, back when ~[T] patterns still existed, every command-line program that I wrote used these sorts of patterns to grab arguments out of os::args(). If, one day, we happen to bring vector patterns back, would the same restrictions as here apply?

But I suppose I don't understand your suggestion that these patterns are not idiomatic. There's a reason that we all applauded their addition in the first place, and lamented their loss as a result of the ~[T] fallout.

@zwarich
Copy link

zwarich commented Jun 27, 2014

@bstrie, the people I asked about this on IRC didn't seem to have many compelling use cases for slice patterns, and @jakub-'s stats from rustc, Servo, and coreutils seem to agree with this. Is it that they are only really useful in the bind-by-move case?

@bstrie
Copy link
Contributor

bstrie commented Jun 27, 2014

I certainly care more about hypothetical vector patterns than slice patterns, yes.

@SimonSapin
Copy link
Contributor

For what it’s worth, I’ve recently used (guided a new contributor to use) a slice patterns in Servo to dispatch on the length of the slice and deconstruct the elements at the same time. This code is very nice and terse.

Without slice patterns, this code would have to be something like if harness_args.len() >= 1 { harness_args[0] …, which is much more awkward.

@SimonSapin
Copy link
Contributor

From the RFC:

Slices are currently the only non-algebraic data type that can be destructured in a pattern.

Are they? What does non-algebraic mean here? What other types are non-algebraic?

@SimonSapin
Copy link
Contributor

they're also impossible to reason about […] when analyzing match arms for exhaustiveness and reachability, without incorporating an SMT solver.

What does the compiler do now? How is that bad?

@lilyball
Copy link
Contributor

@zwarich As suggested by @bstrie, I think most of the uses of this sort of pattern were removed when we lost vector patterns. I certainly remember having to do that myself, and being saddened by the uglier code that resulted.

Slice patterns may be less common, but I would love to see vector patterns resurface, and they would presumably follow the semantics of slice patterns. So if slice patterns were removed now we'd just have to end up re-adding them anyway when we got vector patterns (or else be curiously inconsistent).

@zwarich
Copy link

zwarich commented Jun 28, 2014

@bstrie @kballard Is it even possible to have vector patterns with exhaustiveness checking now that vectors aren't a primitive in the language? As far as I can tell, no trait can express the logical property that the primitive matching functions must satisfy for correctness.

@lilyball
Copy link
Contributor

@zwarich Well, we have no way right now to represent the ability to destructure an arbitrary data structure in order to move data out of it. But that's something we'd like to have. In the absence of that ability, we can certainly design traits to support being able to copy data, or being able to reference data.

One possible answer might look like the following:

trait Destructure<T> {
    fn len(&self) -> uint;
    fn subscript<'a>(&'a self, x: uint) -> &'a T;
}

trait DestructureMut<T>: Destructure<T> {
    fn mut_subscript<'a>(&'a mut self, x: uint) -> &'a mut T;
}

Although this is a bit handwavy, because mutable destructuring (which would let you use ref mut x) would have issues with borrowck, so there would need to be some way to deal with basically subscripting multiple indexes simultaneously and getting multiple non-overlapping output references, and doing so safely. This could just be handled by the compiler specially and documenting that the implementation of mut_subscript needs to return a reference that doesn't alias with any other subscripts. Ideally, if we had generic integral parameters, we could do

fn mut_subscript<'a, int N>(&'a mut self, indexes: [uint, ..N]) -> [&'a mut T, ..N];

(with the guarantee that the same index will not occur twice in the input array).


Another approach is an Iterator-based one, which would allow for moving out of the data structure, but some care would have to go into it to deal with the variable-length slice binding (although when talking about custom data structures, that would presumably actually bind to a new instance of the data structure, rather than binding to a slice). This also has the downside that it may have to iterate twice, once for the matching and then a second time if there are any move or mut bindings (because the first time would just use normal references).

@zwarich
Copy link

zwarich commented Jun 28, 2014

@kballard The problem of how to safely move multiple elements out of a vector without breaking the soundness of the language is interesting. I don't think that vector patterns are nearly important enough to warrant a soundness hole.

However, I was thinking of a more subtle problem, which occurs even if you extend by-ref matches to user-defined data structures. In the code

match v {
    [Some(_), ...] => { ... }
    (arbitrary patterns)
    [None(_), ...] => { ... }
}

how do you ensure that the match is exhaustive? If the generated code is calling a function to get each array element, it may not get the same element the second time it calls the function. You could try caching the results of each function call, but then that means independent match arms don't have independent semantics.

This actually gets worse when combined with the problem that you mentioned. Consider a hypothetical by-move binding out of the interior of a value of sum type in a vector, e.g.

match v {
    [Some(a), Some(b), ..xs] => { ... },
    _ => { ... }
}

When matching the first arm, the generated code has to capture a pointer to the memory it is going to move if the arm is successful, since if it calls a function to get a reference again, it may not even be a pointer to a Some(_) value. Moreover, this memory can potentially change value (e.g. it could just be implemented as a Cell) when evaluating the match for the second element.

The more I think about vector patterns in a world where vectors are implemented as a library, the more broken they seem.

@bill-myers
Copy link

Is exhaustiveness really hard?

Let a be the maximum number of terms before the variable length pattern among match arms, b be the maximum number of terms after the variable length pattern among match arms and c the maximum number of terms among match arms without variable length patterns.

Now transform the match to first match on the length, and generate a case for each length <= max(a + b, c), which of course can be treated as matching on a tuple.

Then, for larger lengths only the patterns with variable length patterns can match, and you can reduce all match arms to have exactly a terms, a variable length pattern, and b terms (by adding _ terms as needed), and then treat the match as matching on an (a + b + 1)-tuple.

This is of course exponential in the nesting levels (where nesting means that the terms themselves are slices and are matched), but nesting is rare.

@lilyball
Copy link
Contributor

@zwarich

If the generated code is calling a function to get each array element, it may not get the same element the second time it calls the function.

That's a good point, one I hadn't considered before. I can think of 3 solutions:

  1. Require unsafe to implement this functionality, and put the burden on the implementor to make sure they follow the rules. Not really very good, but it has the benefit of no runtime overhead.
  2. After finding a match, re-validate that it matches when moving out of the data structure. If a function is written badly to return a different value the second time, this could trigger a runtime failure, which seems like a reasonable response to a malicious implementation.
  3. Provide a stack array of the size of appropriate size and move the elements into that, before testing. This allows the compiler to make guarantees itself e.g. that the item won't change from under it. I'm not sure if this approach can support Cons/Snoc patterns though (at least, not patterns that need to bind the wildcard).

Note that this isn't really an exhaustiveness issue. Exhaustiveness concerns whether the switch covers all possible values that could be produced. This is determined at compile-time without actually running any code. The problem with the data structure returning a different value the second time it's asked for a subscript really is only a problem for actually destructuring the value at runtime; if it returns a Some(_) the first time, and a None the second time, the contained value can't be extracted from the None. But I think it's reasonable to simply fail at runtime under these circumstances.

Note that if we're talking about actually matching a specific value, it doesn't really matter if it returns a different nonmatching value at runtime, because that exact same problem could occur if you merely asked it manually for that same value in the body of the case, and that's not related to the correctness of the switch. The only real problem is if the structure of the value changes.

@pcwalton
Copy link
Contributor

+1, too much complexity. We need the language to stay simple.

@zwarich
Copy link

zwarich commented Jun 28, 2014

@bill-myers If you have an n-tuple of Option values, then you can view patterns of the form (Some(_), None, ...) as being P1 and (not P2) and ... with an atomic proposition for each entry in the tuple. This means that a pattern match is a disjunction of these conjunctions, and to check exhaustiveness, you can check the satisfiability of the negation, a conjunction of disjunctions of possibly negated atomic propositions. Thus the problem is NP-complete in the worst case, even without slice patterns, and the superpolynomial growth occurs with the size of the tuple, not the amount of nesting. I guess the approach that is used for checking matches never hits this worst case, and it probably wouldn't hit it for slice patterns either. Not sure why I missed that this applies to the current matches...

@bill-myers
Copy link

@zwarich You are right, as long as you include _ in addition to Some(_) and None (also, it's simpler to use bool instead of Option): determining the exhaustiveness of pattern matching on n-tuples of bool is co-NP-complete...

This means that including variable-length slice patterns should not make things much worse, so they shouldn't be removed because of that.

@zwarich
Copy link

zwarich commented Jun 28, 2014

@kballard Exhaustiveness of pattern matching in other languages with pattern matching is synonymous with absence of failure. It is also important that the compiler's exhaustiveness checking guarantees that at least some case will match at runtime, not in some idealized world that might not match actual program behavior.

@glaebhoerl
Copy link
Contributor

I would prefer to make our support for pattern matching arrays more expressive, not less. With something like RFC PR #101, native language support for manipulating mutable and immutable arrays and array slices in the safe subset of the language could be considered complete. That seems kind of important for a systems language and a competitor to C. It would obviate the need for, and be more flexible and convenient than, current unsafely implemented functions like mut_split_at.

Slices are currently the only non-algebraic data type that can be destructured in a pattern.

On its own, this argument is utterly unpersuasive to me. Does this fact have any consequences? If not, the question should be whether the feature is useful and well-behaved, not whether it fits into an arbitrary categorization.

If supporting the feature caused the implementation to become intractable, that would be a persuasive argument, but per some of the above comments, I'm not sure if that holds up, either.

@ghost
Copy link
Author

ghost commented Jun 28, 2014

@glaebhoerl Yeah, I agree. I think it was a mistake to drive a language discussion with implementation details. Apologies. In any case, my PR rust-lang/rust#15186 now supports sub slices of any kind properly so the whole argument on my part is moot anyway.

@ghost
Copy link
Author

ghost commented Jun 30, 2014

Actually, at this point, I don't really have good reasons for why they should be removed, in which case I'll close this and if someone does, feel free to reopen and drive.

@ghost ghost closed this Jun 30, 2014
withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017
…n-tutorial

Rewording a sentence in TUTORIAL.md
wycats pushed a commit to wycats/rust-rfcs that referenced this pull request Mar 5, 2019
Add Ember issue to 0136-contains-to-includes.md
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants