Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed changes for deconstruction, declaration expressions, and discards #14794

Closed
gafter opened this issue Oct 28, 2016 · 91 comments · Fixed by #15842
Closed

Proposed changes for deconstruction, declaration expressions, and discards #14794

gafter opened this issue Oct 28, 2016 · 91 comments · Fixed by #15842

Comments

@gafter
Copy link
Member

gafter commented Oct 28, 2016

This week's LDM meetings (2016-10-25 and 2016-10-26) proposed some possible changes to the handling of deconstruction, declaration expressions, and wildcards that, even if done later, would affect the shape of compiler APIs today. That suggests we would want to consider what to do today so that we are compatible with that possible future. This is a summary of the proposed changes and their API impact.

Wildcards

The LDM is proposing to change the character that we would use to represent a "wildcard" from * to _. We also considered some alternatives, but both * and ? have a logical (if not technical) ambiguity, because int? and int* are valid types. _ doesn't likely have the same kind of syntactic ambiguity, because it is already an identifier in all of the places where we want to support wildcards. But it may be a semantic ambiguity for that same reason. The reason we like _ is that users already introduce variables, for example parameters, named _ when their intention is to ignore them.

We want to support both the "short form" _ and the "long form" var _ just in case there is an ambiguity with, for example, a field in scope. We'll return to this later.

Declaration Expressions

We currently represent an out variable declaration using a "declaration expression" node in the syntax model. That was done because we were thinking that we may want to generalize declaration expressions in the future.

There is some slight discomfort with declaration expressions as they appear in the current API and proposed language spec, because while they are expressions in the syntax model, they are not expressions in the draft spec, and do not have types, and therefore there are special rules called out for them wherever they may appear in the specification. Besides the mismatch between the spec and model, we expect we may want to allow declaration expressions in more contexts in the future, in which case we will want them to be treated as expressions. In that case we will be better served by treating them as expressions (i.e. they can have a type) today.

Deconstruction

Possibly generalizing declaration expressions in the future makes us want to reconsider our syntax model for deconstruction today. For example, given the statement

(int x, int y) = e;

We can think of this as a special deconstruction declaration statement (as it is today), or alternatively we can think of it as a statement expressions containing an assignment expression with a tuple expression on the left-hand-side. In that case the tuple expression contains two declaration expressions. The latter formulation makes more sense in the possible future in which we generalize declaration expressions.

Similarly, given the statement

var (x, y) = e;

We can think of this as a special deconstruction declaration statement (as it is today), or alternatively we can think of it as a statement expressions containing an assignment expression with a declaration expression on the left-hand-side. The latter formulation makes more sense in the possible future in which we generalize declaration expressions.

This reformulation of deconstruction allows us to remove from the syntax model the new statement form for a deconstruction declaration. It also allows us to generalize what we allow in the future:

int x;
(x, int y) = e;

Here, the left-hand-side contains a mixture of already-existing variables (in this case x) and newly declared variables (int y). And it can be used in an expression context as well (e.g. as the body of an expression-bodied method).

Wildcards (revisited)

Given this new understanding of the direction of the syntax, there are four forms that wildcards can take. First, it can take the place of an identifier in a designator (i.e. in a declaration expression):

var (x, _) = e;
(int x, int _) = e;
M(out int _);

Since _ is already an identifier, no syntax model change is required. However, semantically we want this to create an anonymous variable, and shadow any true variable (e.g. parameter or field) from an enclosing scope named _. There is no name conflict error if wildcards are declared this way more than once in a scope.

Second, it can similarly be used to declare a pattern variable:

switch (o)
{
    case int _:
    case long _:
        Console.WriteLine("int or long");
        break;
}

Third, it can take the place of an identifier in a simple expression where an lvalue is expected and is used as a target in a deconstruction assignment or out parameter, but in that case its special behavior as a wildcard only occurs if looking up _ doesn't find a variable of that name

M(out _);
(x, _) = e;

This special name lookup is similar to the way we handle var.

Finally, it can be used where a parameter can be declared today. However, we relax the single-definition rule to allow multiple conflicting declarations (same scope or nested scopes), in which case the identifier _ binds as a wildcard.

Func<int, int, int> f = (_,_) => 3;

We have to be careful with these changes so that any program that uses _ as an identifier and is legal today continues to compile with the same meaning under these revised rules.

Syntax model changes

This allows us to simplify the handling of the for loop to handle deconstruction. Now the deconstruction is just one of the expressions in the expression list of the initializer part, and doesn't require its own placeholder in the syntax. That means that the syntax node for the for loop remains unchanged from the C# 6 version.

This requires a change to the way we handle the deconstruction form of the foreach loop. Because we want the left-hand-side to be capable of representing all of these forms

foreach ((int x, int y) in e) ...
foreach ((int x, _) in e) ...
foreach (var (x, _) in e) ...

we now need to use expression for the syntax node before the in keyword.

We can remove the syntax node for the deconstruction declaration statement, because that is just an assignment statement in this model.

Syntax.xml changes

The following changes are proposed compared to the current implementation in master. We remove

  <AbstractNode Name="VariableComponentSyntax" Base="CSharpSyntaxNode">
  </AbstractNode>
  <Node Name="TypedVariableComponentSyntax" Base="VariableComponentSyntax">
    <Kind Name="TypedVariableComponent"/>
    <Field Name="Type" Type="TypeSyntax"/>
    <Field Name="Designation" Type="VariableDesignationSyntax"/>
  </Node>
  <Node Name="ParenthesizedVariableComponentSyntax" Base="VariableComponentSyntax">
    <Kind Name="ParenthesizedVariableComponent"/>
    <Field Name="OpenParenToken" Type="SyntaxToken">
      <Kind Name="OpenParenToken"/>
    </Field>
    <Field Name="Variables" Type="SeparatedSyntaxList&lt;VariableComponentSyntax&gt;"/>
    <Field Name="CloseParenToken" Type="SyntaxToken">
      <Kind Name="CloseParenToken"/>
    </Field>
  </Node>
  <Node Name="DeconstructionDeclarationStatementSyntax" Base="StatementSyntax">
    <Kind Name="DeconstructionDeclarationStatement"/>
    <Field Name="Modifiers" Type="SyntaxList&lt;SyntaxToken&gt;"/>
    <Field Name="Assignment" Type="VariableComponentAssignmentSyntax"/>
    <Field Name="SemicolonToken" Type="SyntaxToken">
      <Kind Name="SemicolonToken"/>
    </Field>
  </Node>
  <Node Name="VariableComponentAssignmentSyntax" Base="CSharpSyntaxNode">
    <Kind Name="VariableComponentAssignment"/>
    <Field Name="VariableComponent" Type="VariableComponentSyntax"/>
    <Field Name="EqualsToken" Type="SyntaxToken">
      <Kind Name="EqualsToken"/>
    </Field>
    <Field Name="Value" Type="ExpressionSyntax"/>
  </Node>

and we remove the Deconstruction field from the ForStatementSyntax

We change the VariableComponent field of ForEachComponentStatementSyntax to be an ExpressionSyntax, and probably change the name of ForEachComponentStatementSyntax.

And we change

  <Node Name="DeclarationExpressionSyntax" Base="ExpressionSyntax">
    <Kind Name="DeclarationExpression"/>
    <Field Name="VariableComponent" Type="VariableComponentSyntax">
    </Field>
  </Node>

to

  <Node Name="DeclarationExpressionSyntax" Base="ExpressionSyntax">
    <Kind Name="DeclarationExpression"/>
    <Field Name="Type" Type="TypeSyntax"/>
    <Field Name="Designation" Type="VariableDesignationSyntax"/>
  </Node>

We leave unchanged

  <AbstractNode Name="VariableDesignationSyntax" Base="CSharpSyntaxNode">
  </AbstractNode>
  <Node Name="SingleVariableDesignationSyntax" Base="VariableDesignationSyntax">
    <Kind Name="SingleVariableDesignation"/>
    <Field Name="Identifier" Type="SyntaxToken">
      <Kind Name="IdentifierToken"/>
    </Field>
  </Node>
  <Node Name="ParenthesizedVariableDesignationSyntax" Base="VariableDesignationSyntax">
    <Kind Name="ParenthesizedVariableDesignation"/>
    <Field Name="OpenParenToken" Type="SyntaxToken">
      <Kind Name="OpenParenToken"/>
    </Field>
    <Field Name="Variables" Type="SeparatedSyntaxList&lt;VariableDesignationSyntax&gt;"/>
    <Field Name="CloseParenToken" Type="SyntaxToken">
      <Kind Name="CloseParenToken"/>
    </Field>
  </Node>

SemanticModel changes

We may want to change the behavior of GetTypeInfo on a declaration expression, depending on how the shape of the specification evolves.

We probably need to consider what the behavior of SemanticModel APIs should be on wildcards.

Summary

The changes to declaration expressions and deconstruction should be done today so that we don't have an incompatible change later.

Wildcards are an interesting problem. Even if we don't want to implement them for C# 7, we want to wall off the semantic space so that valid C# 7 programs don't change meaning or become invalid in a later language version. I suspect the simplest way to do that is to implement wildcards today.

/cc @dotnet/ldm @dotnet/roslyn-compiler

@gafter
Copy link
Member Author

gafter commented Oct 28, 2016

There is an interesting syntactic ambiguity arising from declaration expressions. Consider the code

(A < B > C, X) = e;

Here the "tuple expression" on the left has a declaration for a variable C of type A<B>. However, the same "tuple expression" elsewhere

e = (A < B > C, X);

has a first subexpression equivalent to (A < B) > C. Writing the specification (and implementing the parser) to distinguish these two contexts will be fun.

@gafter
Copy link
Member Author

gafter commented Oct 28, 2016

Similarly, consider the code

(A < B, D > C, X) = e;

Here the "tuple expression" on the left has a declaration for a variable C of type A<B,D>. However, the same "tuple expression" elsewhere

e = (A < B, D > C, X);

has a first subexpression equivalent to (A < B) and a second equivalent to (D > C).

@gafter
Copy link
Member Author

gafter commented Oct 28, 2016

Another advantage of adding support for wildcards today rather than later is that we can warn for unused pattern and out variables.

@HaloFour
Copy link

While I understand the ambiguity issues of * it still seems like a better choice than _. I can just imagine the massive pit of failure that will be people accidentally deconstructing into legally declared and used variables.

@CyrusNajmabadi
Copy link
Member

@gafter Can you show hte changes you intend to make to the foreach-syntax model?

Thanks!

@CyrusNajmabadi
Copy link
Member

Actually, n/m. I can see you describe it as:

We change the VariableComponent field of ForEachComponentStatementSyntax to be an ExpressionSyntax, and probably change the name of ForEachComponentStatementSyntax.

@alrz
Copy link
Contributor

alrz commented Oct 28, 2016

@HaloFour You can't accidentally do anything. because basically any mistake is a compile-time error. And more to the point of this thread, I can understand how it's easier to represent in AST because it would be merely a semantical change. In fact, just because it's an identifier, it allows you to use T _ to represent type-check patterns or _ to ignore lamba args without introducing any new syntax and further ambiguities.

@alrz
Copy link
Contributor

alrz commented Oct 28, 2016

@gafter Regarding declaration expression ambiguities, I think it's better to require them to always be initialized. Any use case involving an uninitialized declaration expression can use a pattern instead.

@gafter
Copy link
Member Author

gafter commented Oct 29, 2016

@gafter Regarding declaration expression ambiguities, I think it's better to require them to always be initialized. Any use case involving an uninitialized declaration expression can use a pattern instead.

That was our intent. Declaration expressions can appear in an out argument, or in a tuple expression that is in a deconstruction context such as an out argument or the left-hand-side of an assignment expression or the target of a foreach loop.

@HaloFour
Copy link

@alrz

Sure you can. It would be impossible for the compiler to tell if you intend to deconstruct into a local variable or field in scope that happens to be named _ or if you intend to use a wildcard:

int _ = 123;
var pt = new Point(0, 456);
int y = 789;
// later
(_, y) = pt; // oops, what did I really intend to do here?

Is this necessarily a common scenario? Probably not. But as a legal identifier it could never be ruled out.

@vbcodec
Copy link

vbcodec commented Oct 29, 2016

@HaloFour

for pt as 2D point
(_, y) = pt;
change _ for 0, but for 3D point, variable _ will remain unchanged

@HaloFour
Copy link

@vbcodec

Considering that parameters of deconstructor methods are always out it doesn't matter what method is resolved the _ variable would always be overwritten. I imagine that Point3D won't have a two parameter deconstructor, though.

@vbcodec
Copy link

vbcodec commented Oct 29, 2016

@HaloFour
I meant that Point3D has only one method with three outs. If it has also second method with two outs, then compiler probably will pick that method, and variable _ will be changed. But method with two outs (for Point3D), will lead to messy results.

@HaloFour
Copy link

@vbcodec

I'm not entirely sure what that has to do with this discussion. Either the compiler will find a suitable deconstructor method and _ will be overwritten, or the compiler will not find a suitable deconstructor method and that will result in a compiler error.

@vbcodec
Copy link

vbcodec commented Oct 29, 2016

@HaloFour
Wildcards allow to use methods with higher number outs than target need (two in your example). So without support for wildcards compiler generate error, but with support for wildcards, there won't be error if last (or first) parameter will be _.

@HaloFour
Copy link

@vbcodec

Wildcards allow to use methods with higher number outs than targed need

I'm quite sure that this is not the case. If you wanted to deconstruct Point3D and it only offered a 3 parameter deconstructor you'd be required to specify a combination of 3 legal identifiers and wildcards combined:

(x, _) = pt; // illegal
(x, _, _) = pt; // legal
(_, y, _) = pt; // legal
(_, _, z) = pt; // legal

@vbcodec
Copy link

vbcodec commented Oct 29, 2016

@HaloFour
Ok, I got it,, concluded my version from https://blogs.msdn.microsoft.com/dotnet/2016/08/24/whats-new-in-csharp-7-0/
If it do not allow to skip unwanted data (in term of quantity, not names), then there is low benefit.

@alrz
Copy link
Contributor

alrz commented Oct 29, 2016

@gafter I'm saying that in all those places, instead of a declaration expression we use a pattern so that the following would be possible,

F(out {X:var x});
foreach(let {Result:var result} in tasks)

Or something like that. Perhaps, only complete patterns would be allowed in these contexts.

@HaloFour This is also applicable to var. That's a legal identifier and it could never be ruled out, right? But you'd never use a lowercase identifier for class name — ok, you shouldn't but you can, and when you do you should be aware of the consequences. Your example uses a new feature (deconstruction) so we'd never have to deal with such ambiguity. In all other existing code, backward compatibility is a must and as mentioned in OP the code using _ would never stop compiling. In fact, this is just another reason to implement wildcards now, otherwise we could never use _ for wildcards.

@orthoxerox
Copy link
Contributor

@HaloFour The easiest way out is to forbid (or warn against) tuple deconstruction, out var and pattern matching into variables named _ now in v7, before wildcards are implemented. Then only out _ remains syntactically valid and the vNext compiler can emit a warning if variable _ is shadowed by a wildcard of a different type.

@HaloFour
Copy link

@alrz

The difference would be much more subtle. The declaration of var as a valid type only prevents use of var as a contextual keyword, which at worst would lead to a compiler error when trying to use them both together. That's not the case here. Accidental use of both together could easily lead to perfectly legal code that results in unexpected overwriting of existing values in variables. Types also have much simpler scopes than variables. Because _ is used as both a wildcard and as a property selector shorthand relatively commonly today there are more scenarios where this will collide with wildcards.

@orthoxerox

That's a pretty reasonable suggestion but how much it limit where wildcards could be used within the language? For example there are proposals to allow wildcards to ignore lambda arguments as well as to declare ignored variables. In both of those cases it's already perfectly legal today to use _ as the identifier name.

@gafter @MadsTorgersen

What I don't get is where the ambiguity lies with *. Yes, there is the potential collision with pointer types, but you can't use pointer types in type switch (is int* has always been illegal syntax). And in out var the lack of a following identifier would immediately disambiguate it.

M(out int *) // wildcard out declaration
M(out int * x) // pointer out declaration

if (o is int *) // illegal in C# 1.0 - 6.0, wildcard in C# 7.0.

Is it because that last case would make it seem like is supported pointers?

@AdamSpeight2008
Copy link
Contributor

Another alternative is ... (triple dot) .

var (x, ...) = e;
(int x, int ...) = e;
M(out int ...);
switch (o)
{
    case int  ...:
    case long ...:
        Console.WriteLine("int or long");
        break;
}

@jaredpar jaredpar changed the title Proposed changes for deconstruction, declaration expressions, and wildcards Proposed changes for deconstruction, declaration expressions, and discards Nov 16, 2016
@jcouv jcouv self-assigned this Nov 16, 2016
@gafter
Copy link
Member Author

gafter commented Nov 17, 2016

This work is being placed into the branch https://github.com/dotnet/roslyn/tree/features/wildcard during development.

@aluanhaddad
Copy link

Actually, in your example the _ field wouldn't be affected because you're using the "long form" of wildcards where you're declaring a new scope. In that case the compiler can unambiguously determine that _ is supposed to be a wildcard. I do believe this to be the prototypical case of using wildcards with deconstruction. The ambiguities only arise from the "short form" where the syntax might allow deconstruction or assignment into an existing identifier called _ if it happens to exist.

@HaloFour Thank you for explaining this.

@jcouv jcouv modified the milestones: 2.0 (RC.3), 2.0 (RC.2) Nov 22, 2016
@ErikSchierboom
Copy link

@gafter The link to the wildcard branch is incorrect. It should be: https://github.com/dotnet/roslyn/tree/features/wildcard

@gafter gafter modified the milestones: 2.0 (RTM), 2.0 (RC.3) Dec 2, 2016
jcouv pushed a commit that referenced this issue Dec 12, 2016
…ds. (#15548)

* Combine deconstruction assignment and declaration, and support discards.

- Combine deconstruction assignment and declaration, and support discards.
- Add wildcards.work.md to track outstanding work.
- Bind each type syntax once in a deconstruction.
- Because tuples may contain declarations, adjust lambda disambiguation
  and adjust parsing of argument lists.
- Diagnose tuple element names on the left of a deconstruction.
- Add relational operators to disambiguating tokens in 7.5.4.2

* Disallow deconstruction declarations except at the statement level.
This is now a semantic restriction (until we decide to remove it).

* Revise logic to detect `var` in a declaration expression.
Plus other changes per code review.

* Add a test that GetTypeInfo on a discard expression doesn't crash.
* Small changes per code review.
* Add (skipped) test for var invocation in parens.
* Rename "Discarded" to "Discard"
* Changes recommended via code review.
* Minor changes to the handling of declaration expressions per code review.
* Addressing blocking feedback while Neal is OOF

Fixes #14794
Fixes #14832
@jcouv jcouv reopened this Dec 16, 2016
@jcouv jcouv removed their assignment Dec 18, 2016
@gafter
Copy link
Member Author

gafter commented Jan 14, 2017

This work has been completed; closing.

@gafter gafter closed this as completed Jan 14, 2017
@dsaf
Copy link

dsaf commented Mar 10, 2017

scala/scala3#2041

@gafter
Copy link
Member Author

gafter commented Mar 27, 2017

Issue moved to dotnet/csharplang #365 via ZenHub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment