Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to classification perf with large string literals. #72217

Merged
merged 4 commits into from
Feb 22, 2024

Conversation

CyrusNajmabadi
Copy link
Member

Followup to #72216.

Takes classification cost down from:

image

to

image

Basically, we were producign lots of tags in the case of some very large strings. This ended up being costly with later processing. This change fixes us up to produce far fewer tags (gated to just what is in view).

--

There is still further optimizations we can do. Specifically, for regex/json classification, we can limit how we walk those language subtrees to avoid even classifying portions of the embedded language tree that is not in view.

@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner February 21, 2024 22:58
ClassifyToken(token);
ProcessTriviaList(token.TrailingTrivia);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we were also previous walking down into trailing trivia (which never has structure). we were also walking into all structure, even though we don't have any embedded-lang classification in structured-trivia except for the case of string literals in directives. so this means no more walking into things like doc comments, which is nice.

var subTextSpan = service.GetMemberBodySpanForSpeculativeBinding(member);
if (subTextSpan.IsEmpty)
var memberBodySpan = service.GetMemberBodySpanForSpeculativeBinding(member);
if (memberBodySpan.IsEmpty)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

view with whitespace off.

=> _result.Add(new ClassifiedSpan(classificationType, span));
// Ignore characters that don't intersect with the requested span. That avoids potentially adding lots of
// classifications for portions of a large string that are out of view.
if (span.IntersectsWith(_spanToClassify))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: in the case the user provided, while the string contains json, it's not a JsonTree. All we ahve are the embedded classifications for escapes like "" in the middle of the string. But there are literally thousands of these classifications. So this takes us down from many thousands of classifications to a few dozen.

@CyrusNajmabadi
Copy link
Member Author

@sharwell @ToddGrun this is ready for review.

Copy link
Contributor

@ToddGrun ToddGrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CyrusNajmabadi CyrusNajmabadi merged commit de2db03 into dotnet:main Feb 22, 2024
27 checks passed
@CyrusNajmabadi CyrusNajmabadi deleted the classificationPerf2 branch February 22, 2024 21:11
jjonescz added a commit to jjonescz/roslyn that referenced this pull request Feb 26, 2024
…ationPerf2"

This reverts commit de2db03, reversing
changes made to 8f67d64.
@jjonescz jjonescz added this to the 17.10 P2 milestone Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants