Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to classification perf with large string literals. #72217

Merged
merged 4 commits into from
Feb 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -123,16 +123,28 @@ private static async Task<bool> TryClassifyContainingMemberSpanAsync(
return false;
}

var subTextSpan = service.GetMemberBodySpanForSpeculativeBinding(member);
if (subTextSpan.IsEmpty)
var memberBodySpan = service.GetMemberBodySpanForSpeculativeBinding(member);
if (memberBodySpan.IsEmpty)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

view with whitespace off.

{
// Wasn't a member we could reclassify independently.
return false;
}

var subSpanToTag = new SnapshotSpan(
snapshotSpan.Snapshot,
subTextSpan.Contains(changedSpan) ? subTextSpan.ToSpan() : member.FullSpan.ToSpan());
// TODO(cyrusn): Unclear what this logic is for. It looks like it's just trying to narrow the span down
// slightly from the full member, just to its body. Unclear if this provides any substantive benefits. But
// keeping for now to preserve long standing logic.
var memberSpanToClassify = memberBodySpan.Contains(changedSpan)
? memberBodySpan.ToSpan()
: member.FullSpan.ToSpan();

// Take the subspan we know we want to classify, and intersect that with the actual span being asked for.
// That way if we're only asking for a portion of a method, we still only classify that, and not the whole
// method.
var finalSpanToClassify = memberSpanToClassify.Intersection(snapshotSpan.Span);
if (finalSpanToClassify is null)
return false;

var subSpanToTag = new SnapshotSpan(snapshotSpan.Snapshot, finalSpanToClassify.Value);

// re-classify only the member we're inside.
await ClassifySpansAsync(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -106,17 +106,20 @@ public void VisitTokens(SyntaxNode node)
private void ProcessToken(SyntaxToken token)
{
_cancellationToken.ThrowIfCancellationRequested();
ProcessTriviaList(token.LeadingTrivia);

// Directives need to be processes as they can contain strings, which then have escapes in them.
if (token.ContainsDirectives)
ProcessTriviaList(token.LeadingTrivia);

ClassifyToken(token);
ProcessTriviaList(token.TrailingTrivia);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we were also previous walking down into trailing trivia (which never has structure). we were also walking into all structure, even though we don't have any embedded-lang classification in structured-trivia except for the case of string literals in directives. so this means no more walking into things like doc comments, which is nice.

}

private void ClassifyToken(SyntaxToken token)
{
if (token.Span.IntersectsWith(_textSpan) && _owner.SyntaxTokenKinds.Contains(token.RawKind))
{
var context = new EmbeddedLanguageClassificationContext(
_solutionServices, _project, _semanticModel, token, _options, _owner.Info.VirtualCharService, _result, _cancellationToken);
_solutionServices, _project, _semanticModel, token, _textSpan, _options, _owner.Info.VirtualCharService, _result, _cancellationToken);

var classifiers = _owner.GetServices(_semanticModel, token, _cancellationToken);
foreach (var classifier in classifiers)
Expand Down Expand Up @@ -146,7 +149,7 @@ private void ProcessTriviaList(SyntaxTriviaList triviaList)

private void ProcessTrivia(SyntaxTrivia trivia)
{
if (trivia.HasStructure && trivia.FullSpan.IntersectsWith(_textSpan))
if (trivia.IsDirective && trivia.FullSpan.IntersectsWith(_textSpan))
VisitTokens(trivia.GetStructure()!);
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,52 +8,63 @@
using Microsoft.CodeAnalysis.Host;
using Microsoft.CodeAnalysis.Text;

namespace Microsoft.CodeAnalysis.Classification
namespace Microsoft.CodeAnalysis.Classification;

internal readonly struct EmbeddedLanguageClassificationContext
{
internal readonly struct EmbeddedLanguageClassificationContext
internal readonly SolutionServices SolutionServices;

private readonly SegmentedList<ClassifiedSpan> _result;

/// <summary>
/// The portion of the string or character token to classify.
/// </summary>
private readonly TextSpan _spanToClassify;

public Project Project { get; }

/// <summary>
/// The string or character token to classify.
/// </summary>
public SyntaxToken SyntaxToken { get; }

/// <summary>
/// SemanticModel that <see cref="SyntaxToken"/> is contained in.
/// </summary>
public SemanticModel SemanticModel { get; }

public CancellationToken CancellationToken { get; }

internal readonly ClassificationOptions Options;
internal readonly IVirtualCharService VirtualCharService;

internal EmbeddedLanguageClassificationContext(
SolutionServices solutionServices,
Project project,
SemanticModel semanticModel,
SyntaxToken syntaxToken,
TextSpan spanToClassify,
ClassificationOptions options,
IVirtualCharService virtualCharService,
SegmentedList<ClassifiedSpan> result,
CancellationToken cancellationToken)
{
SolutionServices = solutionServices;
Project = project;
SemanticModel = semanticModel;
SyntaxToken = syntaxToken;
_spanToClassify = spanToClassify;
Options = options;
VirtualCharService = virtualCharService;
_result = result;
CancellationToken = cancellationToken;
}

public void AddClassification(string classificationType, TextSpan span)
{
internal readonly SolutionServices SolutionServices;

private readonly SegmentedList<ClassifiedSpan> _result;

public Project Project { get; }

/// <summary>
/// The string or character token to classify.
/// </summary>
public SyntaxToken SyntaxToken { get; }

/// <summary>
/// SemanticModel that <see cref="SyntaxToken"/> is contained in.
/// </summary>
public SemanticModel SemanticModel { get; }

public CancellationToken CancellationToken { get; }

internal readonly ClassificationOptions Options;
internal readonly IVirtualCharService VirtualCharService;

internal EmbeddedLanguageClassificationContext(
SolutionServices solutionServices,
Project project,
SemanticModel semanticModel,
SyntaxToken syntaxToken,
ClassificationOptions options,
IVirtualCharService virtualCharService,
SegmentedList<ClassifiedSpan> result,
CancellationToken cancellationToken)
{
SolutionServices = solutionServices;
Project = project;
SemanticModel = semanticModel;
SyntaxToken = syntaxToken;
Options = options;
VirtualCharService = virtualCharService;
_result = result;
CancellationToken = cancellationToken;
}

public void AddClassification(string classificationType, TextSpan span)
=> _result.Add(new ClassifiedSpan(classificationType, span));
// Ignore characters that don't intersect with the requested span. That avoids potentially adding lots of
// classifications for portions of a large string that are out of view.
if (span.IntersectsWith(_spanToClassify))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: in the case the user provided, while the string contains json, it's not a JsonTree. All we ahve are the embedded classifications for escapes like "" in the middle of the string. But there are literally thousands of these classifications. So this takes us down from many thousands of classifications to a few dozen.

_result.Add(new ClassifiedSpan(classificationType, span));
}
}
Loading