-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove variable-length slice patterns from the language #144
Conversation
Why are slice patterns bad? You said they can't be reasoned about, but why not? A slice pattern can only have one wildcard, which means that any given slice pattern is known to either be |
I suppose supporting specific values in a variable-length slice pattern might be the hard part, e.g. If this RFC is accepted, I would very much like to see Cons (and preferably Snoc) patterns still allowed, as mentioned in the alternative. In the second paragraph of the alternative, it says not removing the feature will require runtime failure in some cases. Is that if Cons patterns are still allowed, or is that if the RFC isn't accepted? Because if that applies to if Cons patterns are still allowed, isn't runtime failure the same issue with making no changes at all? And in that case, what does restricting to Cons patterns buy us? |
@kballard It's easy with flat patterns but once you start matching substructures of slices, such as: match x {
[Some(Foo { a: false, .. }), None, ..xs, x] => (),
[None, ..xs, None, None] => ()
} we're in the SMT land. Now, restricting to Cons/Snoc patterns allows us to use the current, simplified algorithm, in which a slice pattern
No, if only Cons/Snoc patterns were to be allowed, then we can reliably determine the exhaustiveness/reachability properties of a match expression so no failure at runtime would ever be necessary. You're right, the wording isn't very clear there. |
In normal patterns over algebraic data types, the patterns only match things that are part of the structure of the value being matched, and any bindings directly bind those components. This simplifies both exhaustiveness checking and match compilation. I haven't seen a use of variable length slice patterns that looks like idiomatic Rust code. |
If both Cons and Snoc patterns are supportable, why can't we support |
@kballard So the codegen for that is trivial, agreed. But I understand that for the exhaustiveness check, the |
@kballard You then end up with a disjunction of conjunctions of patterns, which makes exhaustiveness checking difficult. In the case where all of the conjunctions are only 2 patterns, there is probably a way to develop an efficient algorithm based on 2SAT, but it would be more complex than the current exhaustiveness checker in entirety. |
I see. Well, I suppose I'm down with this as long as Cons/Snoc patterns are still supported. They may be uncommon, but if we can do exhaustiveness checking with them, then I think we should keep them. |
@kballard Is there a use of Cons/Snoc patterns that would be considered idiomatic Rust code? I haven't seen one. |
(And you also can't stay in the nice world of 2-SAT if you have nested slice patterns, since the conjunctions may grow beyond 2 formulas.) |
@zwarich, back when ~[T] patterns still existed, every command-line program that I wrote used these sorts of patterns to grab arguments out of But I suppose I don't understand your suggestion that these patterns are not idiomatic. There's a reason that we all applauded their addition in the first place, and lamented their loss as a result of the ~[T] fallout. |
@bstrie, the people I asked about this on IRC didn't seem to have many compelling use cases for slice patterns, and @jakub-'s stats from rustc, Servo, and coreutils seem to agree with this. Is it that they are only really useful in the bind-by-move case? |
I certainly care more about hypothetical vector patterns than slice patterns, yes. |
For what it’s worth, I’ve recently used (guided a new contributor to use) a slice patterns in Servo to dispatch on the length of the slice and deconstruct the elements at the same time. This code is very nice and terse. Without slice patterns, this code would have to be something like |
From the RFC:
Are they? What does non-algebraic mean here? What other types are non-algebraic? |
What does the compiler do now? How is that bad? |
@zwarich As suggested by @bstrie, I think most of the uses of this sort of pattern were removed when we lost vector patterns. I certainly remember having to do that myself, and being saddened by the uglier code that resulted. Slice patterns may be less common, but I would love to see vector patterns resurface, and they would presumably follow the semantics of slice patterns. So if slice patterns were removed now we'd just have to end up re-adding them anyway when we got vector patterns (or else be curiously inconsistent). |
@zwarich Well, we have no way right now to represent the ability to destructure an arbitrary data structure in order to move data out of it. But that's something we'd like to have. In the absence of that ability, we can certainly design traits to support being able to copy data, or being able to reference data. One possible answer might look like the following: trait Destructure<T> {
fn len(&self) -> uint;
fn subscript<'a>(&'a self, x: uint) -> &'a T;
}
trait DestructureMut<T>: Destructure<T> {
fn mut_subscript<'a>(&'a mut self, x: uint) -> &'a mut T;
} Although this is a bit handwavy, because mutable destructuring (which would let you use fn mut_subscript<'a, int N>(&'a mut self, indexes: [uint, ..N]) -> [&'a mut T, ..N]; (with the guarantee that the same index will not occur twice in the input array). Another approach is an |
@kballard The problem of how to safely move multiple elements out of a vector without breaking the soundness of the language is interesting. I don't think that vector patterns are nearly important enough to warrant a soundness hole. However, I was thinking of a more subtle problem, which occurs even if you extend by-ref matches to user-defined data structures. In the code match v {
[Some(_), ...] => { ... }
(arbitrary patterns)
[None(_), ...] => { ... }
} how do you ensure that the match is exhaustive? If the generated code is calling a function to get each array element, it may not get the same element the second time it calls the function. You could try caching the results of each function call, but then that means independent match arms don't have independent semantics. This actually gets worse when combined with the problem that you mentioned. Consider a hypothetical by-move binding out of the interior of a value of sum type in a vector, e.g. match v {
[Some(a), Some(b), ..xs] => { ... },
_ => { ... }
} When matching the first arm, the generated code has to capture a pointer to the memory it is going to move if the arm is successful, since if it calls a function to get a reference again, it may not even be a pointer to a The more I think about vector patterns in a world where vectors are implemented as a library, the more broken they seem. |
Is exhaustiveness really hard? Let a be the maximum number of terms before the variable length pattern among match arms, b be the maximum number of terms after the variable length pattern among match arms and c the maximum number of terms among match arms without variable length patterns. Now transform the match to first match on the length, and generate a case for each length <= max(a + b, c), which of course can be treated as matching on a tuple. Then, for larger lengths only the patterns with variable length patterns can match, and you can reduce all match arms to have exactly a terms, a variable length pattern, and b terms (by adding _ terms as needed), and then treat the match as matching on an (a + b + 1)-tuple. This is of course exponential in the nesting levels (where nesting means that the terms themselves are slices and are matched), but nesting is rare. |
That's a good point, one I hadn't considered before. I can think of 3 solutions:
Note that this isn't really an exhaustiveness issue. Exhaustiveness concerns whether the switch covers all possible values that could be produced. This is determined at compile-time without actually running any code. The problem with the data structure returning a different value the second time it's asked for a subscript really is only a problem for actually destructuring the value at runtime; if it returns a Note that if we're talking about actually matching a specific value, it doesn't really matter if it returns a different nonmatching value at runtime, because that exact same problem could occur if you merely asked it manually for that same value in the body of the case, and that's not related to the correctness of the switch. The only real problem is if the structure of the value changes. |
+1, too much complexity. We need the language to stay simple. |
@bill-myers If you have an n-tuple of |
@zwarich You are right, as long as you include _ in addition to Some(_) and None (also, it's simpler to use bool instead of Option): determining the exhaustiveness of pattern matching on n-tuples of bool is co-NP-complete... This means that including variable-length slice patterns should not make things much worse, so they shouldn't be removed because of that. |
@kballard Exhaustiveness of pattern matching in other languages with pattern matching is synonymous with absence of failure. It is also important that the compiler's exhaustiveness checking guarantees that at least some case will match at runtime, not in some idealized world that might not match actual program behavior. |
I would prefer to make our support for pattern matching arrays more expressive, not less. With something like RFC PR #101, native language support for manipulating mutable and immutable arrays and array slices in the safe subset of the language could be considered complete. That seems kind of important for a systems language and a competitor to C. It would obviate the need for, and be more flexible and convenient than, current
On its own, this argument is utterly unpersuasive to me. Does this fact have any consequences? If not, the question should be whether the feature is useful and well-behaved, not whether it fits into an arbitrary categorization. If supporting the feature caused the implementation to become intractable, that would be a persuasive argument, but per some of the above comments, I'm not sure if that holds up, either. |
@glaebhoerl Yeah, I agree. I think it was a mistake to drive a language discussion with implementation details. Apologies. In any case, my PR rust-lang/rust#15186 now supports sub slices of any kind properly so the whole argument on my part is moot anyway. |
Actually, at this point, I don't really have good reasons for why they should be removed, in which case I'll close this and if someone does, feel free to reopen and drive. |
…n-tutorial Rewording a sentence in TUTORIAL.md
Add Ember issue to 0136-contains-to-includes.md
Summary
Remove variable-length slice patterns from the language.
Motivation
Detailed design
Support for variable length slice patterns will be removed from match expressions. It will still be possible to pattern match fixed-length vectors as they do not suffer from the same two issues mentioned in the Motivation section.
Drawbacks
Slice patterns can be a convenient way of working with slices and vectors. In particular, they can be convenient when implementing tail-recursive algorithms that work on slices. Making this change will break users' code, however, there do not appear to be many uses of the feature in the wild. rustc contains 10 occurrences of slice patterns in a match expression. Servo uses a slice pattern once and it's a fixed-length slice pattern. coreutils uses them twice.
Alternatives
To address the second limitation without removing the syntax, slice patterns can be restricted to only allow Cons-like an Snoc-like patterns, where the variable-length part of the pattern appears at the very end or the very beginning of the pattern:
In addition, the two could not be intermixed together in a single column of a match expression.
The impact of leaving the feature intact will be that slice patterns will have to, in some scenarios, be excluded from the exhaustiveness analysis, which will result in match expressions that can fail at runtime, introducing a new class of runtime errors.
Unresolved questions
If the syntax was to be introduced again or preserved now to its full extent, what would be the best approach to implement an exhaustiveness/reachability analysis pass that's different than the current ML-like approach and correctly supports this flavour of patterns?