Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Erlang] sublime-syntax format version 2 #2464

Merged
merged 4 commits into from
Sep 14, 2020

Conversation

deathaxe
Copy link
Collaborator

@deathaxe deathaxe commented Aug 25, 2020

This PR updates Erlang.sublime-syntax to version 2.

  • removes no longer required lookaheads
  • uses pop: 1 in favor of pop: true
  • replaces all anonymous contexts which are used in multi-push and multi-set statements.

Notes:

  1. This PR doesn't introduce any changes in behavior
  2. Unfortunatelly commit b00ace8 and a75fcc8 reduce parsing performance by about 15-30% depending on the used files. Benchmarks were made with all the files from the last rewrite. As those commits just reorganize anonymous contexts in named ones, the performance drop may be related with context name tracking?

Benchmarks

The whole design and developement process included benchmarking of different solutions to find the best possible performance for the different sets of rules. The following examples provide an impression of the performance improvements achieved with the new design.

10k binary

  Bin  = << << (X*2) >> || <<X>> <= << 1,2,3 >> >>.

10k maps

  Expr#{name=>"adam",{age,24}:=fct(),4.0=>{july,29}}

10k records

  #record{field1="val1", Field2=3, field3, 'Field-4'={}, _=atom}

10k fun types

  -type fun() :: fun(() -> int()).

10k spec

  -spec is_foo(Pid) -> boolean() when Pid :: pid() | if(); (Pid) -> ok.

ErlangOTP files

lib\wx\include\gl.hrl

lib\stdlib\test\re_testoutput1_split_test.erl

lib\xmerl\test\xmerl_sax_std_SUITE.erl

lib\crypto\test\crypto_SUITE.erl

Results

Engine Related Performance

The following table shows how parsing performance slowed down between ST4074 and ST4084 without significant changes to Erlang.sublime-syntax (only some scope name changes).

Sublime Text Version ST4074 ST4084 diff
10k binaries 117ms 165ms +41%
10k maps 128ms 213ms +66%
10k records 98ms 180ms +84%
10k fun type 66ms 111ms +68%
10k spec 147ms 216ms +47%
gl.hrl 20ms 27ms +35%
re_testoutput1_split_test.erl 251ms 362ms +44%
xmerl_sax_std_SUITE.erl 100ms 131ms +31%
crypto_SUITE.erl 22ms 27ms +23%

Performance drops by 23..84% due to recent core changes.

Syntax Definition Related Performance

The following table shows how parsing performance slows down just because of opting in to version 2 and replacing all anonymous by named contexts using ST 4084.

Syntax Definition before after diff
10k binaries 165ms 206ms +25%
10k maps 213ms 263ms +23%
10k records 180ms 175ms -5%
10k fun type 111ms 130ms +17%
10k spec 216ms 274ms +27%
gl.hrl 27ms 28ms +4%
re_testoutput1_split_test.erl 362ms 431ms +19%
xmerl_sax_std_SUITE.erl 131ms 163ms +24%
crypto_SUITE.erl 27ms 32ms +19%

Performance drops by another 25% by this commit.

DeathAxe added 4 commits August 23, 2020 09:35
This commit ...

1. sorts header contexts
2. replaces `pop: true` by `pop: 1`
3. removes obsolete lookahead patterns which have been used to maintain
   meta scope boundaries before.
This commit converts anonymous contexts into named contexts and uses
those in all multi-push and multi-set statements in order to improve
readability.

Note: This commit does not introduce functional changes.
This commit...

1. converts anonymous contexts into named contexts and uses those in
   all multi-push and multi-set statements in order to improve
   readability.
2. creates a `record-fields-declaration` context, which is used in both
   - the record declaration preprocessor statement (preproc-record) and
   - the record instantiation statement.
   It is to reduce duplicated patterns.

Note: This commit does not introduce functional changes.
@wbond
Copy link
Member

wbond commented Sep 9, 2020

My hunch is that the primary change in performance you are seeing is a result of properly handling scope names.

The old approach to scope names allowed pretty much everything to be pre-computed at syntax compile time. However, this had edge cases with a number of situations. As a result, in a handful of situations, the scope names now have to be resolved at run time during the matches.

For the version: 2 changes, there are definitely a number of those tweaks that result in run-time scope handling.

In terms of the general slowdown between 4074 and 4084, beyond the scope name mangling at run time, we now also have to sort matches when there are multiple capture groups, and handful of other small tweaks. Most of the bug fixes were things where the simple/naive/fast approach was wrong, and the correct approach was more computationally expensive. I could definitely see how the run time tweaks could cause significant changes when running over 10k samples.

@deathaxe
Copy link
Collaborator Author

deathaxe commented Sep 9, 2020

Makes sense as the performance drop is nearly the same with all syntaxes I have made benchmarks for. Only my new Java syntax which heavily makes use of branch-points to support case insensitive object types etc. runs quite a bit more slower than before.

Just wanted to be sure everything is working as expected and we don't miss something. Most files are small enough, so users might not even notice it.

@wbond
Copy link
Member

wbond commented Sep 9, 2020

Erf, or maybe a useless loop over every named context was added to a hot spot in the lexing engine… 🤦‍♂️

@deathaxe
Copy link
Collaborator Author

deathaxe commented Sep 9, 2020

:)

@wbond
Copy link
Member

wbond commented Sep 10, 2020

Funny story, but I removed that debug loop and was getting about 25% better performance…

@wbond wbond merged commit 16f1117 into sublimehq:master Sep 14, 2020
@deathaxe deathaxe deleted the pr/erlang/syntax-version-2 branch September 15, 2020 15:55
mitranim pushed a commit to mitranim/Packages that referenced this pull request Mar 25, 2022
* [Erlang] Convert to sublime-syntax version 2

This commit ...

1. sorts header contexts
2. replaces `pop: true` by `pop: 1`
3. removes obsolete lookahead patterns which have been used to maintain
   meta scope boundaries before.

* [Erlang] Remove unnamed contexts from multi-push

This commit converts anonymous contexts into named contexts and uses
those in all multi-push and multi-set statements in order to improve
readability.

Note: This commit does not introduce functional changes.

* [Erlang] Refactor record contexts

This commit...

1. converts anonymous contexts into named contexts and uses those in
   all multi-push and multi-set statements in order to improve
   readability.
2. creates a `record-fields-declaration` context, which is used in both
   - the record declaration preprocessor statement (preproc-record) and
   - the record instantiation statement.
   It is to reduce duplicated patterns.

Note: This commit does not introduce functional changes.

* [Erlang] Refactor fun contexts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants