-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend Lexer interface to expose diagnostics and data as a lexing report #1668
Extend Lexer interface to expose diagnostics and data as a lexing report #1668
Conversation
c1e9a71
to
73d3461
Compare
If we functionally agree, I'm also more than happy to discuss naming which is notoriously difficult: I opted for |
@aabounegm @msujew FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you cherry pick my latest commit from https://github.com/eclipse-langium/langium/tree/msujew/indentation-diagnostics? Since you opened this PR from an organization repository, I can't push to it myself.
I'm not happy with the API by the way, but we kind of boxed ourselves into this, by simply reusing all the types that chevrotain provides us. I'll probably perform a refactoring of the whole lexer/token builder infrastructure for 4.0, since we also have a project that struggles with the way the existing API works (i.e. no known URI during the lexing phase, etc.). |
- Add support to report diagnostics during lexing process - Properly map diagnostic severities - Mark method and report as optional for backwards-compatibility For indentation: - Add dedent tokens to report until consumed for state management
73d3461
to
a84fad3
Compare
@msujew I rebased the commit and cherry-picked yours on top of it. Unfortunately, I couldn't open a branch in this repository directly. I agree with what you said about the API. I aimed to stay backwards-compatible but having the Thanks again for the very fast turnaround on this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks 👍
Using report
is fine for now, I might revisit this decision at a later point.
Unfortunately, I couldn't open a branch in this repository directly.
People usually fork into their personal account. When a PR is done from a personal account, the maintainers of the target repo can (optionally) push directly to the target branch. This is not possible from organization repos/accounts due to security reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. Nothing critical, just mostly opinions and questions regarding the code style. Nothing important
@@ -191,7 +205,7 @@ export class IndentationAwareTokenBuilder<Terminals extends string = string, Key | |||
* @param offset The current position at which to attempt a match | |||
* @returns The current and previous indentation levels and the matched whitespace | |||
*/ | |||
protected matchWhitespace(text: string, offset: number) { | |||
protected matchWhitespace(text: string, offset: number, _tokens: IToken[], _groups: Record<string, IToken[]>): { currIndentLevel: number, prevIndentLevel: number, match: RegExpExecArray | null } { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also update the JSDoc with the new parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
} | ||
|
||
export interface LexingDiagnostic extends ILexingError { | ||
severity?: 'error' | 'warning' | 'info' | 'hint'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably have some global type DiagnosticSeverity = 'error' | 'warning' | 'info' | 'hint';
since it's used a lot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that might be a good idea, I'll look into it.
@@ -42,6 +64,16 @@ export class DefaultTokenBuilder implements TokenBuilder { | |||
return tokens; | |||
} | |||
|
|||
popLexingReport(_text: string): LexingReport { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a name like finalize
or finalizeLexing
would probably be more user-friendly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the difficulty of good naming. I was debating over this and I previously had flushLexingReport
to indicate that afterwards the state is reset. My issue with finalizeX
or endX
or stopX
was the asymmetry because we do not have the opposing startX
or beginX
. In the end, I chose popLexingReport
as this is what was used in the indendation-based token provider with the popRemainingDedents
. But as usual, no hard opinions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense, but pop
also implies a corresponding push
function. I like the flush
variant more 👍
errors: ILexingError[]; | ||
report?: LexingReport; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I would mark errors
as @deprecated
and include them in the report. They can point to the same array for backwards compatibility, only to be remove in the next major version.
Also, it would be great if you can add JSDoc
protected tokenTypes: TokenTypeDictionary; | ||
|
||
constructor(services: LangiumCoreServices) { | ||
const tokens = services.parser.TokenBuilder.buildTokens(services.Grammar, { | ||
constructor( services: LangiumCoreServices) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the extra space after (
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this was a pure oversight ;-) Thanks for cacthing that!
lexerErrors: ILexingError[], | ||
lexerReport?: LexingReport |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 2 can also be merged, with one being marked as deprecated, as mentioned in LexingReport
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I didn't wanna mark parts of the API as deprecated and @msujew already mentioned that he may re-work those parts when he does a 4.x release. But I fully agree that this is an odd API now.
for (const lexerError of parseResult.lexerErrors) { | ||
const lexerDiagnostics = [...parseResult.lexerErrors, ...parseResult.lexerReport?.diagnostics ?? []] as LexingDiagnostic[]; | ||
for (const lexerDiagnostic of lexerDiagnostics) { | ||
const severity = lexerDiagnostic?.severity ?? 'error'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the diagnostic element itself can be undefined
const severity = lexerDiagnostic?.severity ?? 'error'; | |
const severity = lexerDiagnostic.severity ?? 'error'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely correct!
switch (severity) { | ||
case 'error': | ||
return diagnosticData(DocumentValidator.LexingError); | ||
case 'warning': | ||
return diagnosticData(DocumentValidator.LexingWarning); | ||
case 'info': | ||
return diagnosticData(DocumentValidator.LexingInfo); | ||
case 'hint': | ||
return diagnosticData(DocumentValidator.LexingHint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer a Record<DiagnosticSeverity, DocumentValidator>
over a switch
, and also using an enum instead of a namespace (to make this type possible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I aligned this with the toDiagnosticSeverity
function in the same file so I'd say we should at least be consistent even though I do not have any hard feelings regarding either way.
@aabounegm Thank you for your feedback! I am finalizing #1669 right now and I'll add the non-controversial changes there. For anything else (like marking API as deprecated or particular types) I'd suggest to open a new PR since this may require some discussion. |
This is based on a discussion that started in #1653 and was already marked as a TODO in-code:
For indentation:
Please note this PR is based on #1664 which was already approved but has yet to be merged.