Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Lexer interface to expose diagnostics and data as a lexing report #1668

Merged
merged 2 commits into from
Sep 6, 2024

Conversation

martin-fleck-at
Copy link
Contributor

This is based on a discussion that started in #1653 and was already marked as a TODO in-code:

// TODO: find a way to report error diagnostics message
  • Add support to report diagnostics during lexing process
  • Properly map diagnostic severities
  • Mark method and report as optional for backwards-compatibility

For indentation:

  • Add dedent tokens to report until consumed by lexer

Please note this PR is based on #1664 which was already approved but has yet to be merged.

@martin-fleck-at
Copy link
Contributor Author

If we functionally agree, I'm also more than happy to discuss naming which is notoriously difficult: I opted for ILexingReport but also considered ILexingData or similar terms. Not sure what is best in the Langium realm ;-)

@martin-fleck-at
Copy link
Contributor Author

@aabounegm @msujew FYI

Copy link
Member

@msujew msujew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you cherry pick my latest commit from https://github.com/eclipse-langium/langium/tree/msujew/indentation-diagnostics? Since you opened this PR from an organization repository, I can't push to it myself.

@msujew
Copy link
Member

msujew commented Sep 6, 2024

I'm not happy with the API by the way, but we kind of boxed ourselves into this, by simply reusing all the types that chevrotain provides us. I'll probably perform a refactoring of the whole lexer/token builder infrastructure for 4.0, since we also have a project that struggles with the way the existing API works (i.e. no known URI during the lexing phase, etc.).

martin-fleck-at and others added 2 commits September 6, 2024 13:09
- Add support to report diagnostics during lexing process
- Properly map diagnostic severities
- Mark method and report as optional for backwards-compatibility

For indentation:
- Add dedent tokens to report until consumed for state management
@martin-fleck-at
Copy link
Contributor Author

@msujew I rebased the commit and cherry-picked yours on top of it. Unfortunately, I couldn't open a branch in this repository directly.

I agree with what you said about the API. I aimed to stay backwards-compatible but having the lexerErrors and a separate but optional lexerReport is indeed a bit unexpected. But I'm happy to leave the API breakages to you where also the overall terminology (e.g., is "report" really the best term?) could be re-worked.

Thanks again for the very fast turnaround on this!

Copy link
Member

@msujew msujew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks 👍

Using report is fine for now, I might revisit this decision at a later point.

Unfortunately, I couldn't open a branch in this repository directly.

People usually fork into their personal account. When a PR is done from a personal account, the maintainers of the target repo can (optionally) push directly to the target branch. This is not possible from organization repos/accounts due to security reasons.

@msujew msujew merged commit 51d99a6 into eclipse-langium:main Sep 6, 2024
4 checks passed
@msujew msujew added this to the v3.2.0 milestone Sep 6, 2024
Copy link
Member

@aabounegm aabounegm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Nothing critical, just mostly opinions and questions regarding the code style. Nothing important

@@ -191,7 +205,7 @@ export class IndentationAwareTokenBuilder<Terminals extends string = string, Key
* @param offset The current position at which to attempt a match
* @returns The current and previous indentation levels and the matched whitespace
*/
protected matchWhitespace(text: string, offset: number) {
protected matchWhitespace(text: string, offset: number, _tokens: IToken[], _groups: Record<string, IToken[]>): { currIndentLevel: number, prevIndentLevel: number, match: RegExpExecArray | null } {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update the JSDoc with the new parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

}

export interface LexingDiagnostic extends ILexingError {
severity?: 'error' | 'warning' | 'info' | 'hint';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have some global type DiagnosticSeverity = 'error' | 'warning' | 'info' | 'hint'; since it's used a lot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that might be a good idea, I'll look into it.

@@ -42,6 +64,16 @@ export class DefaultTokenBuilder implements TokenBuilder {
return tokens;
}

popLexingReport(_text: string): LexingReport {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a name like finalize or finalizeLexing would probably be more user-friendly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the difficulty of good naming. I was debating over this and I previously had flushLexingReport to indicate that afterwards the state is reset. My issue with finalizeX or endX or stopX was the asymmetry because we do not have the opposing startX or beginX. In the end, I chose popLexingReport as this is what was used in the indendation-based token provider with the popRemainingDedents. But as usual, no hard opinions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense, but pop also implies a corresponding push function. I like the flush variant more 👍

Comment on lines 24 to +25
errors: ILexingError[];
report?: LexingReport;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would mark errors as @deprecated and include them in the report. They can point to the same array for backwards compatibility, only to be remove in the next major version.
Also, it would be great if you can add JSDoc

protected tokenTypes: TokenTypeDictionary;

constructor(services: LangiumCoreServices) {
const tokens = services.parser.TokenBuilder.buildTokens(services.Grammar, {
constructor( services: LangiumCoreServices) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the extra space after (?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was a pure oversight ;-) Thanks for cacthing that!

Comment on lines +27 to +28
lexerErrors: ILexingError[],
lexerReport?: LexingReport
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 can also be merged, with one being marked as deprecated, as mentioned in LexingReport

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't wanna mark parts of the API as deprecated and @msujew already mentioned that he may re-work those parts when he does a 4.x release. But I fully agree that this is an odd API now.

for (const lexerError of parseResult.lexerErrors) {
const lexerDiagnostics = [...parseResult.lexerErrors, ...parseResult.lexerReport?.diagnostics ?? []] as LexingDiagnostic[];
for (const lexerDiagnostic of lexerDiagnostics) {
const severity = lexerDiagnostic?.severity ?? 'error';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the diagnostic element itself can be undefined

Suggested change
const severity = lexerDiagnostic?.severity ?? 'error';
const severity = lexerDiagnostic.severity ?? 'error';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely correct!

Comment on lines +252 to +260
switch (severity) {
case 'error':
return diagnosticData(DocumentValidator.LexingError);
case 'warning':
return diagnosticData(DocumentValidator.LexingWarning);
case 'info':
return diagnosticData(DocumentValidator.LexingInfo);
case 'hint':
return diagnosticData(DocumentValidator.LexingHint);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer a Record<DiagnosticSeverity, DocumentValidator> over a switch, and also using an enum instead of a namespace (to make this type possible)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I aligned this with the toDiagnosticSeverity function in the same file so I'd say we should at least be consistent even though I do not have any hard feelings regarding either way.

@martin-fleck-at
Copy link
Contributor Author

@aabounegm Thank you for your feedback! I am finalizing #1669 right now and I'll add the non-controversial changes there. For anything else (like marking API as deprecated or particular types) I'd suggest to open a new PR since this may require some discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants