Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A "fail" statement might cause its branch statement to be aborted #3457

Closed
deathaxe opened this issue Jul 10, 2020 · 8 comments
Closed

A "fail" statement might cause its branch statement to be aborted #3457

deathaxe opened this issue Jul 10, 2020 · 8 comments

Comments

@deathaxe
Copy link
Collaborator

Description

The Java.sublime-syntax at deathaxe/sublime-packages@d1ad03d which used to work in ST 4074 now

  1. causes branching related syntax tests to fail
  2. results in 2 to 3 times higher lexing lexing duration of the same files

The first failing test is at line 987.

public enum EnumTest { int {} }
//^^^^^^^^^ meta.enum.java
//         ^^^^^^^^^^ meta.enum.identifier.java
//                   ^^^^^^^^^^ meta.enum.java meta.block.java
//     ^^^^ storage.type.enum.java
//         ^ - entity - keyword - storage
//          ^^^^^^^^ entity.name.enum.java
//                  ^ - entity - punctuation
//                   ^ punctuation.section.block.begin.java
//                     ^^^ storage.type.primitive.java
//                         ^ punctuation.section.block.begin.java
//                          ^ punctuation.section.block.end.java
//                            ^ punctuation.section.block.end.java

The int { } expression is not highlighted at all. It should be matched by the expression-pop branch at Java.sublime-syntax line 223...232.

  declarations:
    - include: import
    - include: module
    - include: package
    - match: (?=\S)
      branch_point: declarations
      branch:
        - constructor-declaration
        - method-declaration
        - field-declaration
        - class-declaration
        - enum-declaration
        - interface-declaration
        - expression-pop    # <- should match the `int { }`

It seems the fail statement of the field-declaration branch at line 710 causes the whole branch statement to be canceled. If it is executed, the branches class-declaration to expression-pop are not evaluated.

  field-names:
    - meta_scope: meta.field.type.java
    - match: (?=[{(]|{{declaration_keywords}}\b)   # <- the culprit
      fail: declarations
    - match: ''
      set:
        - field-list
        - field-value
        - field-name-meta
        - field-name

As a result the rest of the file is handled as part of the enum body (meta.enum meta.block).

If the { is removed from the match the int is matched as storage type of a field. It's not correct, but causes the the syntax to look less broken.

May it be caused by the numerous none-consuming matches which are evaluated one after another?

Expected behavior

The fail statement should cause the following branches to be evaluated

grafik

Actual behavior

Syntax breaks at...

grafik

Lexing takes 2 to 3 times longer than before. Running the Syntax Test - Performance against syntax_test_java.java used to take about 27ms with ST 4074. It requires 86ms in ST 4076.

Environment

  • Build: 4076
  • Operating system and version: Windows 10 x64 2004
@deathaxe
Copy link
Collaborator Author

deathaxe commented Jul 12, 2020

After some more investigation I found the real root cause of this issue. It is most likely the fix for issue #3158: As of ST 4075 non-consuming rules (lookaheads) are matched correctly against the first token after pushing a new context on stack by a non-consuming match (Nice to see that working now).

As a result the expression-terminator context now matches int and pops the expression-pop, which leaves declarations unmatched. I fixed the issue in sublimehq/Packages@4e28444 (This commit does not have any impact on parsing performance though)

I can only guess all lookaheads being evaluated on the first token correctly to cause the higher lexing times - especially the reserved-pop context which is included very often.

I was able to increase parsing performance of the syntax_test_java.java from 85ms to 33ms by reworking the annotations contexts (see sublimehq/Packages@ecc7bed).

It does not have any effect on my real world benchmark file though, which still needs 33 instead of 13ms to parse.
see: https://github.com/eclipse/lemminx/blob/master/org.eclipse.lemminx/src/main/java/org/eclipse/lemminx/services/XMLCompletions.java

The only thing left to check is performance. Otherwise ST's lexer is working as expected. Thus this issue may be closed.

@wbond wbond self-assigned this Jul 20, 2020
@deathaxe
Copy link
Collaborator Author

With regards to performance fluctuations - here is another strange thing I found:

The related commit is deathaxe/sublime-packages@396adf2#diff-f7fcf52cd87d3ec05e5df7f8b58c5ba1

With the lines below, parsing the syntax_test_java.java takes 32ms. Without them 14ms.

https://github.com/deathaxe/sublime-packages/blob/ec2ee127f375ad6815cfb19e4caadfa706af2a2b/Java/syntax_test_java.java#L3327-L3365

It seems some of the test case comments trigger the issue. At some point removing them in a bisect way the parsing time switches from 33 to14ms.

What's going on there? How can a dozen of simple lines have such an impact?

@deathaxe
Copy link
Collaborator Author

More benchmarks may be found at sublimehq/Packages#2464

@wbond
Copy link
Member

wbond commented Sep 9, 2020

I see numbers different than yours, but there is no difference removing the lines you mention. All of my runs are 65-70ms.

My hunch is that you are running a Windows machine, which is causing your runs to vary slightly, and the Windows timer resolution is causing you to get numbers that look like multiples of ~16ms. See https://carpediemsystems.co.uk/2019/07/18/windows-system-timer-granularity/ for a little more info.

@deathaxe
Copy link
Collaborator Author

deathaxe commented Sep 9, 2020

Yes, I am running a Windows machine only.

If nothing unusal can be revealed with regards to possible bugs, I guess we can close this issue. In most situations I see quite constant benchmark results as well and just wanted to make sure I didn't hit any edge case.

@deathaxe deathaxe closed this as completed Sep 9, 2020
@wbond
Copy link
Member

wbond commented Sep 10, 2020

I fixed the stray loop and am getting 55ms pretty consistently now on my Mac.

@wbond wbond added this to the Build 4086 milestone Sep 14, 2020
@wbond
Copy link
Member

wbond commented Sep 14, 2020

Build 4086 should hopefully have fixed the performance degradation you were seeing.

@wbond wbond added the R: fixed label Sep 14, 2020
@deathaxe
Copy link
Collaborator Author

All benchmarks show 25% better results than ST 4074 now. Great job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants