Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TokenType: Tests and optimization for pointer types #8060

Merged
merged 8 commits into from
Sep 28, 2023
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,8 @@ _ when IsValueParameterOfSetter(name) => TokenType.Keyword,
// that can not bind to a type. The only things that can bind to a type are SimpleNames (Identifier or GenericName)
// or pre-defined types. None of the pre-defined types have a nested type, so we can exclude these as well.
{ Parent: MemberAccessExpressionSyntax x } when AnyMemberAccessLeftIsNotASimpleName(x) => TokenType.UnknownTokentype,
// The left side of a pointer member access must be a pointer and can not be a type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Educational: this assertion still holds when using using unsafe from C# 12, right?

using System;
using unsafe Ptr = int*;

internal class Program
{
    private unsafe static void Main(string[] args)
    {
        int x = 3;
        Ptr y = &x;
        Console.WriteLine($"x = {y->ToString()}");
    }
}

Ptr should still be considered TokenType.UnknownTokentype, and not a type. Correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just works. Ptr is in a type position and is therefore classified as type (There is a distinguishing happening before ClassifyMemberAccess is called. An IdentfierSymtax can either be in an expression or a type context. In a type context, a single IdentfierName resolves to TokenType.TypeName and in an expression context to UnknownTokentype).

{ Parent: MemberAccessExpressionSyntax { RawKind: (int)SyntaxKind.PointerMemberAccessExpression } } => TokenType.UnknownTokentype,
_ => ClassifyIdentifierByModel(name),
};

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1329,4 +1329,50 @@ public void M({{parameterDeclaration}})
}
}
""", allowSemanticModel);

[DataTestMethod]
[DataRow("var d = [u:dateTimePointer]->Date;", false)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Educational: could you please briefly explain when a token is of an unknown type (i.e. TokenType.UnknownTokentype), and why the method parameter dateTimePointer is of such type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown is the default tokentype and is used for all tokens that are not of any of the other types:

  public enum TokenType {
    [pbr::OriginalName("UNKNOWN_TOKENTYPE")] UnknownTokentype = 0,
    [pbr::OriginalName("TYPE_NAME")] TypeName = 1,
    [pbr::OriginalName("NUMERIC_LITERAL")] NumericLiteral = 2,
    [pbr::OriginalName("STRING_LITERAL")] StringLiteral = 3,
    [pbr::OriginalName("KEYWORD")] Keyword = 4,
    [pbr::OriginalName("COMMENT")] Comment = 5,
  }

"Unknown" tokens are skipped when the info is written to the disc because there is no difference.

[DataRow("var d = dateTimePointer->[u:Date];", false)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Educational: I guess Date is of unknown type because dateTimePointer is of unknown type. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnknownTokentype here means the classification of the token isn't any of the other TokenTypes. It doesn't have anything to do with the actual .Net types involved. The enum is badly named. It should be called TokenClassification instead of TokenType.

Also:
For all identifier tokens, there can only be two possible TokenType outcomes: "TypeName" or "UnknownTokentype" *). If we can conclude that the token can never bind to a type, we are done. A pointer can never be a type. (On a normal member access, we can not be sure: Math.Pi here Math can bind to a type or a property. Here we need to consult the semantic model for Math and see what it binds to.

*) One exception is "value" in property setters, which is classified as a keyword.

[DataRow("var d = (*[u:dateTimePointer]).Date;", false)]
[DataRow("var d = (*dateTimePointer).[u:Date];", false)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Educational: [u:dateTimePointer] and [u:Date] could be assessed in the same test, correct? Decorating the token dateTimePointer with the token type assertion syntax doesn't impact the token Date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I try to assert only single tokens as it eases the test debugging. Also, the allowSemanticModel value may differ between the tokens. I just prefer the test isolation here (test a single thing).

[DataRow("var d = [u:dateTimePointer][0];", false)]
[DataRow("[u:dateTimePointer][0] = *(&[u:dateTimePointer][0]);", false)]
[DataRow("[t:Int32]* iPointer;", false)]
[DataRow("[t:Int32]?* iPointer;", false)]
[DataRow("[t:Int32]?** iPointerPointer;", false)]
[DataRow("[t:Nullable]<[t:Int32]>** iPointerPointer;", false)]
[DataRow("[u:System].Int32* iPointer;", true)]
[DataRow("System.[t:Int32]* iPointer;", false)]
[DataRow("[k:void]* voidPointer;", false)]
[DataRow("DateTime d = default; M(&[u:d]);", false)]
[DataRow("[t:DateTime]** dt = &[u:dateTimePointer];", false)]
[DataRow("_ = (*(&[u:dateTimePointer]))->Date;", false)]
[DataRow("_ = (**(&[u:dateTimePointer])).Date;", false)]
public void IdentifierToken_Unsafe_Pointers(string statement, bool allowSemanticModel) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some additional test cases that come to mind:

int?* x1; // Pointer of a nullable value type ("?" shouldn't be part of the token, right?
Nullable<int>* x2; // Nullable is of type t, whereas int is of type k, both are recognized
int** x3; // ** is not included in the token int, which is of type k

And relative assignments:

_ = &x1; // x1 of type u
_ = (*(&x1))->Value; // x1 and Value of type u
_ = (*x1).Value; // x1 and Value of type u

There are also ways of mixing @ and &, and we may want to make sure we only highlight the right thing:

var @int = (int*)null; // 2nd int and null of type k, 1st int of type u (?) with @ included
_ = *@int; // @int of type u (?), * not included, @ included
_ = &@int; // @int of type u (?), & not included, @ included

You can also similarly mix [] with * and &.

Hereafter is my understanding of one of the cases mentioned above (to be verified):

[DataRow("var [u:@int] = ([k:int]*)[k:null];_ = *[u:@int];", false)]

Would it make sense to cover (some of) them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a couple of tests. The @ tests are a bit unrelated and I separated those. The @ is just parsed as part of the identifier token, so it doesn't really change anything.

ClassifierTestHarness.AssertTokenTypes($$"""
using System;
public class C
{
public unsafe void M(DateTime* dateTimePointer)
{
{{statement}}
}
}
""", allowSemanticModel);

[DataTestMethod]
[DataRow("[k:int] @int;", false)]
[DataRow("[k:volatile] [t:@volatile] [u:@volatile];", false)]
[DataRow("[t:Int32] [u:@someName];", false)]
public void IdentifierToken_KeywordEscaping(string fieldDeclaration, bool allowSemanticModel) =>
ClassifierTestHarness.AssertTokenTypes($$"""
using System;

public class @volatile { }

public class C
{
{{fieldDeclaration}}
}
""", allowSemanticModel);
}
Loading