Extend Lexer interface to expose diagnostics and data as a lexing report #1668

martin-fleck-at · 2024-09-06T08:52:41Z

This is based on a discussion that started in #1653 and was already marked as a TODO in-code:

// TODO: find a way to report error diagnostics message

Add support to report diagnostics during lexing process
Properly map diagnostic severities
Mark method and report as optional for backwards-compatibility

For indentation:

Add dedent tokens to report until consumed by lexer

Please note this PR is based on #1664 which was already approved but has yet to be merged.

martin-fleck-at · 2024-09-06T09:14:39Z

If we functionally agree, I'm also more than happy to discuss naming which is notoriously difficult: I opted for ILexingReport but also considered ILexingData or similar terms. Not sure what is best in the Langium realm ;-)

martin-fleck-at · 2024-09-06T09:40:04Z

@aabounegm @msujew FYI

msujew

Can you cherry pick my latest commit from https://github.com/eclipse-langium/langium/tree/msujew/indentation-diagnostics? Since you opened this PR from an organization repository, I can't push to it myself.

msujew · 2024-09-06T10:48:06Z

I'm not happy with the API by the way, but we kind of boxed ourselves into this, by simply reusing all the types that chevrotain provides us. I'll probably perform a refactoring of the whole lexer/token builder infrastructure for 4.0, since we also have a project that struggles with the way the existing API works (i.e. no known URI during the lexing phase, etc.).

- Add support to report diagnostics during lexing process - Properly map diagnostic severities - Mark method and report as optional for backwards-compatibility For indentation: - Add dedent tokens to report until consumed for state management

martin-fleck-at · 2024-09-06T11:13:08Z

@msujew I rebased the commit and cherry-picked yours on top of it. Unfortunately, I couldn't open a branch in this repository directly.

I agree with what you said about the API. I aimed to stay backwards-compatible but having the lexerErrors and a separate but optional lexerReport is indeed a bit unexpected. But I'm happy to leave the API breakages to you where also the overall terminology (e.g., is "report" really the best term?) could be re-worked.

Thanks again for the very fast turnaround on this!

msujew

Looks good to me, thanks 👍

Using report is fine for now, I might revisit this decision at a later point.

Unfortunately, I couldn't open a branch in this repository directly.

People usually fork into their personal account. When a PR is done from a personal account, the maintainers of the target repo can (optionally) push directly to the target branch. This is not possible from organization repos/accounts due to security reasons.

aabounegm

Sorry for the delay. Nothing critical, just mostly opinions and questions regarding the code style. Nothing important

aabounegm · 2024-09-06T12:16:52Z

packages/langium/src/parser/indentation-aware.ts

@@ -191,7 +205,7 @@ export class IndentationAwareTokenBuilder<Terminals extends string = string, Key
     * @param offset The current position at which to attempt a match
     * @returns The current and previous indentation levels and the matched whitespace
     */
-    protected matchWhitespace(text: string, offset: number) {
+    protected matchWhitespace(text: string, offset: number, _tokens: IToken[], _groups: Record<string, IToken[]>): { currIndentLevel: number, prevIndentLevel: number, match: RegExpExecArray | null } {


Can you also update the JSDoc with the new parameters?

Good point!

aabounegm · 2024-09-06T12:18:48Z

packages/langium/src/parser/token-builder.ts

+}
+
+export interface LexingDiagnostic extends ILexingError {
+    severity?: 'error' | 'warning' | 'info' | 'hint';


We should probably have some global type DiagnosticSeverity = 'error' | 'warning' | 'info' | 'hint'; since it's used a lot

Yeah, that might be a good idea, I'll look into it.

aabounegm · 2024-09-06T12:20:41Z

packages/langium/src/parser/token-builder.ts

@@ -42,6 +64,16 @@ export class DefaultTokenBuilder implements TokenBuilder {
        return tokens;
    }

+    popLexingReport(_text: string): LexingReport {


I think a name like finalize or finalizeLexing would probably be more user-friendly

Ah, the difficulty of good naming. I was debating over this and I previously had flushLexingReport to indicate that afterwards the state is reset. My issue with finalizeX or endX or stopX was the asymmetry because we do not have the opposing startX or beginX. In the end, I chose popLexingReport as this is what was used in the indendation-based token provider with the popRemainingDedents. But as usual, no hard opinions.

Yeah that makes sense, but pop also implies a corresponding push function. I like the flush variant more 👍

aabounegm · 2024-09-06T12:22:42Z

packages/langium/src/parser/lexer.ts

    errors: ILexingError[];
+    report?: LexingReport;


Personally, I would mark errors as @deprecated and include them in the report. They can point to the same array for backwards compatibility, only to be remove in the next major version.
Also, it would be great if you can add JSDoc

aabounegm · 2024-09-06T12:25:11Z

packages/langium/src/parser/lexer.ts

    protected tokenTypes: TokenTypeDictionary;

-    constructor(services: LangiumCoreServices) {
-        const tokens = services.parser.TokenBuilder.buildTokens(services.Grammar, {
+    constructor( services: LangiumCoreServices) {


Why the extra space after (?

Ah, this was a pure oversight ;-) Thanks for cacthing that!

aabounegm · 2024-09-06T12:26:15Z

packages/langium/src/parser/langium-parser.ts

+    lexerErrors: ILexingError[],
+    lexerReport?: LexingReport


These 2 can also be merged, with one being marked as deprecated, as mentioned in LexingReport

Yeah, I didn't wanna mark parts of the API as deprecated and @msujew already mentioned that he may re-work those parts when he does a 4.x release. But I fully agree that this is an odd API now.

aabounegm · 2024-09-06T12:35:19Z

packages/langium/src/validation/document-validator.ts

-        for (const lexerError of parseResult.lexerErrors) {
+        const lexerDiagnostics = [...parseResult.lexerErrors, ...parseResult.lexerReport?.diagnostics ?? []] as LexingDiagnostic[];
+        for (const lexerDiagnostic of lexerDiagnostics) {
+            const severity = lexerDiagnostic?.severity ?? 'error';


I don't think the diagnostic element itself can be undefined

Suggested change

const severity = lexerDiagnostic?.severity ?? 'error';

const severity = lexerDiagnostic.severity ?? 'error';

Absolutely correct!

aabounegm · 2024-09-06T12:38:01Z

packages/langium/src/validation/document-validator.ts

+    switch (severity) {
+        case 'error':
+            return diagnosticData(DocumentValidator.LexingError);
+        case 'warning':
+            return diagnosticData(DocumentValidator.LexingWarning);
+        case 'info':
+            return diagnosticData(DocumentValidator.LexingInfo);
+        case 'hint':
+            return diagnosticData(DocumentValidator.LexingHint);


I personally prefer a Record<DiagnosticSeverity, DocumentValidator> over a switch, and also using an enum instead of a namespace (to make this type possible)

I aligned this with the toDiagnosticSeverity function in the same file so I'd say we should at least be consistent even though I do not have any hard feelings regarding either way.

martin-fleck-at · 2024-09-06T13:20:53Z

@aabounegm Thank you for your feedback! I am finalizing #1669 right now and I'll add the non-controversial changes there. For anything else (like marking API as deprecated or particular types) I'd suggest to open a new PR since this may require some discussion.

martin-fleck-at force-pushed the indentation-diagnostics branch from c1e9a71 to 73d3461 Compare September 6, 2024 09:01

msujew reviewed Sep 6, 2024

View reviewed changes

martin-fleck-at and others added 2 commits September 6, 2024 13:09

Minor issues

a84fad3

martin-fleck-at force-pushed the indentation-diagnostics branch from 73d3461 to a84fad3 Compare September 6, 2024 11:10

msujew approved these changes Sep 6, 2024

View reviewed changes

msujew merged commit 51d99a6 into eclipse-langium:main Sep 6, 2024
4 checks passed

msujew added this to the v3.2.0 milestone Sep 6, 2024

aabounegm reviewed Sep 6, 2024

View reviewed changes

martin-fleck-at mentioned this pull request Sep 6, 2024

Introduce tokenizing options for full and partial mode #1669

Merged

aabounegm mentioned this pull request Sep 19, 2024

Perform parser optimizations in production mode #1688

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend Lexer interface to expose diagnostics and data as a lexing report #1668

Extend Lexer interface to expose diagnostics and data as a lexing report #1668

martin-fleck-at commented Sep 6, 2024

martin-fleck-at commented Sep 6, 2024

martin-fleck-at commented Sep 6, 2024

msujew left a comment

msujew commented Sep 6, 2024

martin-fleck-at commented Sep 6, 2024

msujew left a comment •

edited

Loading

aabounegm left a comment

aabounegm Sep 6, 2024

martin-fleck-at Sep 6, 2024

aabounegm Sep 6, 2024

martin-fleck-at Sep 6, 2024

aabounegm Sep 6, 2024

martin-fleck-at Sep 6, 2024

aabounegm Sep 6, 2024

aabounegm Sep 6, 2024

aabounegm Sep 6, 2024

martin-fleck-at Sep 6, 2024

aabounegm Sep 6, 2024

martin-fleck-at Sep 6, 2024

aabounegm Sep 6, 2024

martin-fleck-at Sep 6, 2024

aabounegm Sep 6, 2024

martin-fleck-at Sep 6, 2024

martin-fleck-at commented Sep 6, 2024

	const severity = lexerDiagnostic?.severity ?? 'error';
	const severity = lexerDiagnostic.severity ?? 'error';

Extend Lexer interface to expose diagnostics and data as a lexing report #1668

Extend Lexer interface to expose diagnostics and data as a lexing report #1668

Conversation

martin-fleck-at commented Sep 6, 2024

martin-fleck-at commented Sep 6, 2024

martin-fleck-at commented Sep 6, 2024

msujew left a comment

Choose a reason for hiding this comment

msujew commented Sep 6, 2024

martin-fleck-at commented Sep 6, 2024

msujew left a comment • edited Loading

Choose a reason for hiding this comment

aabounegm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martin-fleck-at commented Sep 6, 2024

msujew left a comment •

edited

Loading