HtmlTokenizer: Cache tokens with well-known text #11924

DustinCampbell · 2025-06-06T17:43:20Z

Razor only caches a limited number of green SyntaxTokens--specifically those that have different text, such as identifiers. However, it doesn't cache green SyntaxTokens will well-known text, even though those can always be shared if they don't have any diagnostics. This PR introduces a cache for SyntaxKind to SyntaxToken to the HTML tokenizer for tokens with well-known text.

Note: This only addresses the HTML tokenizer and C# will be handled separately. Because some SyntaxKind values are shared across HTML and C# but have different text (like SyntaxKind.Equals -- "=" in HTML and "==" in C#), it'll take a more significant change to solve this in a unified way.

CI Build: https://dev.azure.com/dnceng/internal/_build/results?buildId=2724900&view=results
Toolset Run: https://dev.azure.com/dnceng/internal/_build/results?buildId=2724901&view=results

Most of the green SyntaxTokens produced by the HtmlTokenizer have well-known text and can be shared.

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/HtmlTokenizer.cs

jjonescz · 2025-06-09T09:07:26Z

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/HtmlTokenizer.cs

-                        return " ";
-                    }
-                    if (Buffer[0] == '\t')
+                    switch (Buffer[0])


nit: would a switch expression (rather than statement) be more concise here?

I considered that, but it requires adding another call to base.GetTokenContent(type) to avoid changing the current fall-through behavior. So, I opted for a regular switch to let the fall-through continue to work as it did before. So, it'd realistically only save 2 lines or so.

DustinCampbell · 2025-06-10T15:41:49Z

Thanks everyone! 💖

DustinCampbell added 3 commits June 6, 2025 10:12

HtmlTokenizer: Add nullability annotations

f9c5400

HtmlTokenizer: Minor clean up

cc41dbe

HtmlTokenizer: Introduce cache of SyntaxKind to SyntaxToken

68ab224

Most of the green SyntaxTokens produced by the HtmlTokenizer have well-known text and can be shared.

DustinCampbell requested a review from a team as a code owner June 6, 2025 17:43

ToddGrun reviewed Jun 6, 2025

View reviewed changes

src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Legacy/HtmlTokenizer.cs Show resolved Hide resolved

jjonescz approved these changes Jun 9, 2025

View reviewed changes

chsienki approved these changes Jun 9, 2025

View reviewed changes

DustinCampbell merged commit 6acba03 into dotnet:main Jun 10, 2025
11 checks passed

DustinCampbell deleted the cache-html-tokens branch June 10, 2025 15:41

dotnet-policy-service bot added this to the Next milestone Jun 10, 2025

This was referenced Jun 14, 2025

[Automated] PRs inserted in VS build main-10713.59 #11951

Closed

[Automated] PRs inserted in VS build feature.debugger.main-10716.51 #11955

Closed

[Automated] PRs inserted in VS build feature.Wix5-10716.138 #11956

Closed

RikkiGibson modified the milestones: Next, 18.0 P1 Aug 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HtmlTokenizer: Cache tokens with well-known text #11924

HtmlTokenizer: Cache tokens with well-known text #11924

Uh oh!

DustinCampbell commented Jun 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

jjonescz Jun 9, 2025

Uh oh!

DustinCampbell Jun 9, 2025

Uh oh!

DustinCampbell commented Jun 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HtmlTokenizer: Cache tokens with well-known text #11924

HtmlTokenizer: Cache tokens with well-known text #11924

Uh oh!

Conversation

DustinCampbell commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jjonescz Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

DustinCampbell Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

DustinCampbell commented Jun 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

DustinCampbell commented Jun 6, 2025 •

edited

Loading