-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion thread for pattern-matching #10153
Comments
Every time I try to come up with a reasonably expressive syntax for this, any sample pattern explodes into a huge mess. On one hand, every extracted element must be assignable to an identifier, but on the other hand, we might want to apply a pattern to it which might not allow us to get one. I feel like restricting the pattern to extracting head(s) and a tail is a reasonable solution. For example, what would "the token stream should contain a reserved word
We're lucky that we can abandon the reserved word token and that identifiers are identified using inheritance and not a tag field. Otherwise this could look like
At this point it still doesn't look too bad, but any further features turn the patterns into an undecipherable mess worse than programs by someone trying to use every single feature of scalaz, e.g. adding regular expression-like features or recursive matches. The changes to the grammar look like this: complex-pattern
: type identifier
| recursive-pattern
| recursive-pattern identifier
| property-pattern
| property-pattern identifier
| list-pattern
;
list-pattern
: cons-pattern
| empty-list-pattern
;
cons-pattern
: car-pattern '::' cdr-pattern
;
car-pattern
: complex-pattern
| identifier
;
cdr-pattern
: list-pattern
| identifier
;
empty-list-pattern
: '[]'
; P.S. Does anyone else experience semantic satiation after reading the word 'pattern' thousands of times? |
@orthoxerox I know that the "cons" operator I do think that "cons" patterns would apply well to C#, particularly for if (list is { ReservedWordToken{ Keyword is "class" }, IdentifierToken className }) { }
if (list is [ ReservedWordToken{ Keyword is "class" }, IdentifierToken className ]) { } I like the former because it feels like a collection initializer, but I think it would preclude the option of having an implicitly typed property pattern. The latter smells like it has something to do with arrays and it is the list pattern in F# but it's not the initializer syntax. As for the verbosity of any of these, I think that's just inherent with subpatterns, whether you're dealing with property patterns or list/array/cons/whatever patterns. |
This is a lot how I feel when looking at the current spec. In my opinion, it feels like the spec is trying too hard to do too much. It tries to deal with both how the object was constructed, what the object looks like internally, and conditionals.
|
I think allowing "property patterns" to omit the type name when the compiler already knows the type would be useful both in keeping the verbosity down as well as allowing the property pattern to work with anonymous types: Rectangle rectangle = new Rectangle(100, 100);
var anonymous = new { Width = 100, Height = 100 };
if (rectangle is Rectangle { Width is 100, Height is var height }) { }
if (rectangle is { Width is 100, Height is var height }) { }
if (anonymous is { Width is 100, Height is var height }) { }
object obj = rectangle;
if (obj is Rectangle { Width is 100, Height is var height }) { }
if (obj is { Width is 100, Height is var height }) { } // compiler error This syntax might also be able to apply to dynamic types. What I don't know is whether the compiler can emit code to attempt to bind to the properties to test for their existence: dynamic dyn = rectangle;
if (dyn is { Width is 100, Height is var height }) {
// succeeds if both the Width and Height properties exist
// Width must also be successfully compared to the constant pattern 100
// height is of type dynamic and could be any value
} |
In that case I would much rather see people use a |
@HaloFour That couldn't be ambiguous. I'm assuming the syntax proposed in #5811 works,
o is { 1, 2 }
o is { P is 1 } @orthoxerox I don't know why there is a need for "cons" operator, because list is { int first, .. int[:] rest } or something like that. |
If that's not considered ambiguous either visually or by the parser then it works for me. Would it be worth having different patterns for arrays, lists, enumerables, etc. or just have the compiler emit what it thinks would be best given what it knows about the type of the operand? Also, 🍝: // list must contain at least two elements
// remaining elements will be copied/sliced to rest, which can be empty
if (list is { 1, 2, params int[] rest }) { ... } Perhaps that could be |
@HaloFour What difference does it make? All you want is to match the target with this pattern, the type which the pattern will be matched against doesn't matter (to user) — analogous to array and collection initializers. In your example, you should use // matches an array that begins with values `1` and `2` with Length >= 2
if(arr is { 1, 2, *** }) {
int[:] rest = arr[2:]; // skips the first two items
} It has nothing to do with parameters though. A syntax similar to this |
@HaloFour @alrz perhaps operator Lists, on the other hand, can be pattern-matched reasonably efficiently without side effects (if you have side effects on the indexer, blame no one but yourself) by introducing an array slice/list view/etc type that can be the type of public class ListView<T> : IReadOnlyList<T>
{
IReadOnlyList<T> innerList;
int offset;
public ListView(IReadOnlyList<T> innerList, int offset)
{
var innerView = innerList as ListView<T>;
if (innerView != null) {
this.innerList = innerView.InnerList;
this.offset = offset + innerView.Offset;
} else {
this.innerList = innerList;
this.offset = offset;
}
}
public T this[int index] {
get { return innerList[index + offset]; }
}
public int Count {
get { return innerList.Count - offset; }
}
public IEnumerator<T> GetEnumerator() {
if (offset == 0)
return innerList.GetEnumerator();
for (int i = offset; i < innerList.Count; i++) {
yield return innerList[i];
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
return (System.Collections.IEnumerator)GetEnumerator();
}
} |
@orthoxerox In F# |
Slices aren't on the list of features to be implemented for C# 7.0, so I doubt that a syntax that involves that feature would be considered for list patterns in this time frame. While there could definitely be mutation side effects the compiler could minimize exposure to them by memoizing the results of the I'm not saying that this is a good idea, just that it's possible. Mutability issues aside an |
@HaloFour well, everything's possible, but I think this kind of matching obscures the concept of I understand the rationale. I've had to write algorithms that chained together functions that transformed enumerables (e.g., "not a number"::"minus"::"positive number" -> "not a number"::"negative number") and I really wanted to have a built-in declarative way to express it, but I now think having an explicit |
This spec still makes little sense. It starts with:
Which is a recursive definition (
Yet e isn't defined in the definition. |
static class Sequence
{
public static operator is(IEnumerable data, out object first, out IEnumerable rest)
{
if (data.Any())
{
first = data.FirstOrDefault();
rest = data.Skip(1);
return true;
}
else
{
first = null;
rest = null;
return false;
}
}
} |
@gafter Ick, that could be made a lot better. But algorithmic issues aside, I hope that public static class SequenceExtensions {
public static operator bool is Sequence<T>(IEnumerable<T> data, out F first, out IEnumerable<T> rest) {
IEnumerator<T> enumerator = data.GetEnumerator();
if (enumerator.MoveNext()) {
first = enumerator.Current;
rest = new EnumeratorEnumerable(enumerator);
return true;
}
enumerator.Dispose();
first = default(T);
rest = null;
return false;
}
} |
I wouldn't put just |
@gafter Well thank goodness you guys didn't go that route when designing LINQ, otherwise Are you also suggesting that an operator like this would obviate the need for a separate form of list/cons pattern? I'd be really worried that any custom operator couldn't be as efficient or careful at stepping into the enumerable without causing side effects or other issues. |
I'm still wondering what would be the benefit of head-tail deconstruction of an |
@alrz Because it's an incredibly common type? I think that it could be done as efficiently as walking an enumerable, particularly if the compiler can keep track of how many elements it has already retrieved in order to prevent re-enumeration. Given the following (using array/list pattern, but this is just for illustration): IEnumerable<int> GetInfiniteZeros() {
while (true) {
Console.Write("zero ");
yield return 0;
}
}
var numbers = GetInfiniteZeroes();
switch (numbers) {
case { 1, 2, 3, 4, 5 }:
Console.WriteLine("foo");
break;
case { 9, 8, 7, 6, 5, 4, 3, 2, 1 }:
Console.WriteLine("bar");
break;
case { 0, 0, 0 }:
Console.WriteLine("baz");
break;
} should output:
I think for any of these patterns to be worthwhile that they should work with the common existing types. Even if Update: Changed the output, I think that the compiler could eliminate potential patterns earlier on as soon as a single element doesn't match the subpattern. |
@HaloFour That is not a head-tail deconstruction. I'm not against a pattern that could be used for an F# |
You mean like the following (again, syntax is strictly illustrative)? int Sum(IEnumerable<int> numbers) {
return numbers match (
case { }: 0,
case { var value, ** IEnumerable<int> rest }: value + Sum(rest)
);
} |
@HaloFour No, I'm not talking about the syntax (that is what you changed here). If you want to skip the let sum (source : int list) : int =
match source with
| [] -> 0
| head::tail -> head + sum tail
let sum (source: int seq) : int =
use e = source.GetEnumerator()
let mutable acc = 0
while e.MoveNext() do
acc <- acc + e.Current
acc But when your type is not suited for this style you should consider using imperative version. |
Using @alrz |
@DavidArno Those "separate types" are the types that you would use to hold the data in a discriminated union, not merely placeholders for the pattern-matching operation. A typical DU language feature defines the union'ed types as part of the union: using static DU;
class DU
{
public class Option1 : DU
{
public string Value1, Value2;
public static void operator is(Option1 self, out string Value1, out string Value2)
{
Value1 = self.Value1;
Value2 = self.Value2;
}
}
public class Option2 : DU
{
public int Value1, Value2;
public static void operator is(Option2 self, out int Value1, out int Value2)
{
Value1 = self.Value1;
Value2 = self.Value2;
}
}
}
class Program
{
static void Main(string[] args)
{
Dispatch(new Option1 { Value1 = "hello", Value2 = "there" });
Dispatch(new Option2 { Value1 = 1, Value2 = 2 });
}
static void Dispatch(DU du)
{
switch (du)
{
case Option1(string s1, string s2):
Console.WriteLine("option1");
break;
case Option2(int s1, int s2):
Console.WriteLine("option2");
break;
}
}
} |
Surely that's the complete opposite of F#? For example on F# for fun and profit's DU page, it states:
|
Having said that, with code generators, I can define something like: [union] partial struct IntOrBool
{
class Int { int i; }
class Bool { bool b; }
} It then makes sense to put the type declarations inside the union. That can then be turned into some form of union type declaration by a code generator. The |
@DavidArno From https://msdn.microsoft.com/en-us/library/dd233226.aspx we see an example in F# type Shape =
| Rectangle of width : float * length : float
| Circle of radius : float
| Prism of width : float * float * height : float As you can see, the discriminated union has three "cases" named enum class Shape
{
Rectangle(float Width, float Length),
Circle(float Radius),
Prism((float, float) width, float height)
} |
Yep, I think you are right: whilst those "cases" in F# aren't real types (as far as I can work out), types would be the way to go with C#. Using the code generation feature planned for C# 7, it'll be possible to define something like: [union] public partial class Shape
{
public partial class Square { double sideLength; }
public partial class Circle { double radius; }
} and generate the DU from that, so that it can be pattern-matched in a similar way to eg F#. The only concern I then have around that, for C# 7, the issue of completeness comes into it. For example, you or I could examine the following to know it's complete, but I don't see how the compiler could do so: double AreaOfShape(Shape shape) =>
shape match (
case Circle(var radius) : radius * radius * Math.Pi
case Square(var side) : side * side
); Having the compiler complain that it's not complete will be a pain. Not having it complain though could be a source of bugs. I guess that's why what to do about completeness is an open question as there's no easy answer. |
@DavidArno No, in case of DUs, they are still derived types of a common base in F#. But active patterns are just currying functions. |
A poke around with ILSpy has shown me you are completely correct. Interestingly, F# not only generates subtypes of the union for each of the cases, it also generates a static |
I really like the pattern matching spec for the most part, but I think I agree that the "user-defined operator is" is a bit confusing. Am I correct in understanding that for the Cartesian c, "c is Polar" is false but "c is Polar(*, *)" is true? If so, that seems surprising to me. My intuition would have been that "c is Polar" would be equivalent to "c is Polar(*, *)", and then that "c is Polar(5, *)" would be true for a subset of those cases. (That said, maybe my intuition here is lacking due to insufficient experience. In F# or Scala is it the case that adding only wildcard arguments causes a match to succeed that would otherwise fail?) In other words, I would have thought the current rule -- "is" only considers reference conversions and boxing/unboxing -- would still be true when there's a type to the right of "is", and that the positional pattern syntax would just be adding additional conditions on top of the reference-conversion check. But it seems instead the proposal only retains the above rule when the positional arguments are omitted. |
I have (independently from the current feature progress) opened an issue about more uses of the |
The changes to What would be the advantage over I really hope that the C# team reconsiders and leaves |
@ngbrown Completeness check is one possible future benefit for switches. I think non-pattern switches are still optimized as jumptables. |
How does pattern matching works with structs? For classes, okay, we can just write
especially when we don't know if we have here struct or class:
How is it handled in new language version? I mean when I write |
Hi While testing I came across 2 things related with
I get a It makes sense as there is nothing to do if
It works fine.
I get: I thought that even if I don't know if these are known issues or even issues at all but just wanted to share. |
@uTeisT I think I can answer to your questions:
I think, in both cases compiler should error |
@Pzixel 1) Using Using VS15P5 btw, if that helps. |
@uTeisT The scoping rules were changed, int j;
if (o is int) {
j = (int)i;
}
j *= 5; // compiler error, j is not definitely assigned |
@uTeisT Another way to deal with that is to just invert the check: |
@uTeisT Eric treats itself the empty statement as design mistake, so I anyway recommend to not use it at all. @HaloFour I think this is mistake. I understand why they decided do to such thing, but IMHO |
In your example, the pattern-matching operation must always succeed: void TestFunc(int i)
{
if (i is int j) { }
Console.WriteLine(j);
} But this results in a definite assignment error on the use of j. That is because the current spec says that pattern variables are definitely assigned when true for the pattern-matching operation. The implication is that they are not definitely assigned when then pattern match fails (even though that cannot happen here), and so they are not definitely assigned after the if statement. We could modify the specification so that a pattern-matching operation that always succeeds results in the pattern variables being definitely assigned. That would allow this code to compile without error. I've filed this as a separate issue #14651 |
//Old style
//New style
Is the new style more efficient because there is no unboxing? |
@andrew-vandenbrink Neither of those is boxing.
There are several other reference conversion operations available between
the new style: object value = getValue();
switch(value)
{
case string x:
// use x here
break;
} is roughly equivalent to the old style: object value = getValue();
var x = value as string;
if (x != null) { // use x here If a type pattern would need to box/unbox: object value = getValue();
switch(value)
{
case int y: // unboxing is the conversion from the reference type `object` to the value type `int`
// use y here
break;
} then it would perform a [un]boxing conversion. Roughly equivalent old style: object value = getValue();
if (value is int) {
var y = (int) value;
// use y here |
Using string is bad example of intended question that I would ask, thank you for the example with int. Looks like there is still unboxing, too bad, why the compiler would not just read it as int as at that point you are sure its int (like in dynamic type) |
Further discussion can be moved to the csharplang repo, please. |
This is a discussion thread for pattern-matching, as specified in https://github.com/dotnet/roslyn/blob/features/patterns/docs/features/patterns.md
Please check the spec before adding your comment - it might have been addressed in the latest draft.
The text was updated successfully, but these errors were encountered: