-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Extract common lexer code into helpers #79214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
40ad7f0
0ea5fbc
a472f31
3d1fc7f
fd45aec
16573f2
e1eb1b4
362c1cb
d8a8748
712f2f5
a852cfd
8f278a3
15fbaec
daf9f80
ebf24ab
54e6e27
284cf83
e9a740c
f6817ee
1f0ec4a
e2885e4
7fc3b85
cedf899
f58146f
dcae96b
00d74c8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,8 +4,6 @@ | |
|
|
||
| using System; | ||
| using System.Collections.Generic; | ||
| using Microsoft.CodeAnalysis.CSharp.Symbols; | ||
| using Microsoft.CodeAnalysis.CSharp.Syntax; | ||
| using Microsoft.CodeAnalysis.Text; | ||
|
|
||
| namespace Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax | ||
|
|
@@ -21,6 +19,8 @@ protected AbstractLexer(SourceText text) | |
| this.TextWindow = new SlidingTextWindow(text); | ||
| } | ||
|
|
||
| protected int LexemeStartPosition => this.TextWindow.LexemeStartPosition; | ||
|
|
||
| public virtual void Dispose() | ||
| { | ||
| this.TextWindow.Dispose(); | ||
|
|
@@ -131,9 +131,18 @@ protected XmlSyntaxDiagnosticInfo MakeError(int position, int width, XmlParseErr | |
|
|
||
| private int GetLexemeOffsetFromPosition(int position) | ||
| { | ||
| return position >= TextWindow.LexemeStartPosition ? position - TextWindow.LexemeStartPosition : position; | ||
| return position >= LexemeStartPosition ? position - LexemeStartPosition : position; | ||
| } | ||
|
|
||
| protected string GetNonInternedLexemeText() | ||
| => TextWindow.GetText(intern: false); | ||
|
|
||
| protected string GetInternedLexemeText() | ||
| => TextWindow.GetText(intern: true); | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these helpers are here because GetText implicitly uses LexemeStartPosition. Once that is removed from the text window itself, it will need to be passed in (as the start position to read from, up to the text window's current position). So this means instead of having to update a huge number of sites, only this site is updated.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: i wanted all lexeme-oriented operations to have that in their name. It's not at all evident what "TextWindow.GetText" or "TextWindow.Width" even means. Names like "CurrentLexemeWidth" are much clearer that it refers to the length of the current token being lexed out. |
||
|
|
||
| protected int CurrentLexemeWidth | ||
| => this.TextWindow.Position - LexemeStartPosition; | ||
|
|
||
| protected static SyntaxDiagnosticInfo MakeError(ErrorCode code) | ||
| { | ||
| return new SyntaxDiagnosticInfo(code); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the intent is to move LexemeStartPosition into lexer, so that only the lexer cares about lexemes, and the textwindow only cares about being a fast streaming sequence of chars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a lexeme? Is that like a token?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of, and i can probably doc. It's "the entity the lexer is currently producing". This is commonly the text of BOTH trivias AND tokens (without its trivia).
It's what you generally expect to get back if you ask the Token/Trivia for its
.Textproperty (not.FullText, and not.ValueText).Ignoring things like directives, the lexer generally is pointing at some position in the source. And it will 'start' lexing a 'lexeme' at that point. It consumes forward, based on certain rules about what it is currently consuming, until it 'finishes' that lexeme. At which point it generates a result (token or trivia in the majority case). That result is given a
Kind,Text, and potentially other bits and bobs attached to it.The goal here is to make the sliding-text-window care absolutely not one whit about lexer concepts, and keep itself only in the domain of making character-retrieval efficient. So lexemes and the like move up entirely to the lexer. This actually simplifies a bunch, and makes it harder to get things wrong.
FOr example, in the last year, there was a tweak to the sliding text window to allow it to look backwards. However, because the window itself was tracking lexemes, it could get into a corrupt state when it did that, leading to bad results being returned upwards in edge-case scenarios. THis split would help avoid that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR:
It's the smallest piece of
Texthte lexer grabs out as an individual string to jam into either aTokenorTrivia. it is indivisible.