Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UtilityAnalyzer: Move to a syntax based classification of identifiers in the token type utility analyzer #7369

Closed
martin-strecker-sonarsource opened this issue Jun 12, 2023 · 1 comment
Assignees
Labels
Type: Performance It takes too long.
Milestone

Comments

@martin-strecker-sonarsource
Copy link
Contributor

martin-strecker-sonarsource commented Jun 12, 2023

Blocked by #7289, #7368

The token type analyzer calls TokenClassifierBase.ClassifyIdentifier for each identifier.

private TokenInfo ClassifyIdentifier(SyntaxToken token)
{
if (semanticModel.GetDeclaredSymbol(token.Parent) is { } declaration)
{
return ClassifyIdentifier(token, declaration);
}
else if (GetBindableParent(token) is { } parent && semanticModel.GetSymbolInfo(parent).Symbol is { } symbol)
{
return ClassifyIdentifier(token, symbol);
}
else
{
return null;
}
}

This method calls semanticModel.GetDeclaredSymbol(token.Parent) and optionally semanticModel.GetSymbolInfo(token.Parent) for the identifier token. Therefore an ISymbol is created, and a mapping from SyntaxNode to ISymbol is added to the semantic model. This adds a lot of pressure to any shared semantic model, as the ISymbol and the mapping need to be cached by the semantic model in a thread-safe manner. The code snippet below shows how many identifiers are present in a simple code snippet.

using System;                     // +1
using System.Collections.Generic; // +3

namespace A.B.C;                  // +3

public class D                    // +1
{
    public D()                    // +1
    {
    }
    public void M()               // +1
    {
        List<D> myList;           // +3
    }
}

TokenClassifierBase.ClassifyIdentifier can only have two outcomes:

  • The identifier is considered TokenType.TypeName (with some special casing for types that are classified as keywords)
  • The identifier is considered to be unknown.

This classification can often be done on a syntactical level. In the sample above, all identifiers can be classified without querying the semantic model, saving 20 calls to the semantic model (13 identifiers, where 6 are declarations -> 13 + 7) and the allocation of 11 symbols.

To do a proper classification on the syntax level, the test infrastructure needs to be extended and made more powerful. #7289 describes how to do it, and #7108 implements this infrastructure. Therefore, this issue is blocked by #7289

Related:
#4217
#7288
#6674

@martin-strecker-sonarsource
Copy link
Contributor Author

Closed as fixed by #7788 and #7775

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance It takes too long.
Projects
None yet
Development

No branches or pull requests

2 participants