[Erlang] sublime-syntax format version 2 #2464

deathaxe · 2020-08-25T19:04:19Z

This PR updates Erlang.sublime-syntax to version 2.

removes no longer required lookaheads
uses pop: 1 in favor of pop: true
replaces all anonymous contexts which are used in multi-push and multi-set statements.

Notes:

This PR doesn't introduce any changes in behavior
Unfortunatelly commit b00ace8 and a75fcc8 reduce parsing performance by about 15-30% depending on the used files. Benchmarks were made with all the files from the last rewrite. As those commits just reorganize anonymous contexts in named ones, the performance drop may be related with context name tracking?

Benchmarks

The whole design and developement process included benchmarking of different solutions to find the best possible performance for the different sets of rules. The following examples provide an impression of the performance improvements achieved with the new design.

10k binary

  Bin  = << << (X*2) >> || <<X>> <= << 1,2,3 >> >>.

10k maps

  Expr#{name=>"adam",{age,24}:=fct(),4.0=>{july,29}}

10k records

  #record{field1="val1", Field2=3, field3, 'Field-4'={}, _=atom}

10k fun types

  -type fun() :: fun(() -> int()).

10k spec

  -spec is_foo(Pid) -> boolean() when Pid :: pid() | if(); (Pid) -> ok.

ErlangOTP files

lib\wx\include\gl.hrl

lib\stdlib\test\re_testoutput1_split_test.erl

lib\xmerl\test\xmerl_sax_std_SUITE.erl

lib\crypto\test\crypto_SUITE.erl

Results

Engine Related Performance

The following table shows how parsing performance slowed down between ST4074 and ST4084 without significant changes to Erlang.sublime-syntax (only some scope name changes).

Sublime Text Version	ST4074	ST4084	diff
10k binaries	117ms	165ms	+41%
10k maps	128ms	213ms	+66%
10k records	98ms	180ms	+84%
10k fun type	66ms	111ms	+68%
10k spec	147ms	216ms	+47%
gl.hrl	20ms	27ms	+35%
re_testoutput1_split_test.erl	251ms	362ms	+44%
xmerl_sax_std_SUITE.erl	100ms	131ms	+31%
crypto_SUITE.erl	22ms	27ms	+23%

Performance drops by 23..84% due to recent core changes.

Syntax Definition Related Performance

The following table shows how parsing performance slows down just because of opting in to version 2 and replacing all anonymous by named contexts using ST 4084.

Syntax Definition	before	after	diff
10k binaries	165ms	206ms	+25%
10k maps	213ms	263ms	+23%
10k records	180ms	175ms	-5%
10k fun type	111ms	130ms	+17%
10k spec	216ms	274ms	+27%
gl.hrl	27ms	28ms	+4%
re_testoutput1_split_test.erl	362ms	431ms	+19%
xmerl_sax_std_SUITE.erl	131ms	163ms	+24%
crypto_SUITE.erl	27ms	32ms	+19%

Performance drops by another 25% by this commit.

This commit ... 1. sorts header contexts 2. replaces `pop: true` by `pop: 1` 3. removes obsolete lookahead patterns which have been used to maintain meta scope boundaries before.

This commit converts anonymous contexts into named contexts and uses those in all multi-push and multi-set statements in order to improve readability. Note: This commit does not introduce functional changes.

This commit... 1. converts anonymous contexts into named contexts and uses those in all multi-push and multi-set statements in order to improve readability. 2. creates a `record-fields-declaration` context, which is used in both - the record declaration preprocessor statement (preproc-record) and - the record instantiation statement. It is to reduce duplicated patterns. Note: This commit does not introduce functional changes.

wbond · 2020-09-09T02:16:33Z

My hunch is that the primary change in performance you are seeing is a result of properly handling scope names.

The old approach to scope names allowed pretty much everything to be pre-computed at syntax compile time. However, this had edge cases with a number of situations. As a result, in a handful of situations, the scope names now have to be resolved at run time during the matches.

For the version: 2 changes, there are definitely a number of those tweaks that result in run-time scope handling.

In terms of the general slowdown between 4074 and 4084, beyond the scope name mangling at run time, we now also have to sort matches when there are multiple capture groups, and handful of other small tweaks. Most of the bug fixes were things where the simple/naive/fast approach was wrong, and the correct approach was more computationally expensive. I could definitely see how the run time tweaks could cause significant changes when running over 10k samples.

deathaxe · 2020-09-09T16:06:38Z

Makes sense as the performance drop is nearly the same with all syntaxes I have made benchmarks for. Only my new Java syntax which heavily makes use of branch-points to support case insensitive object types etc. runs quite a bit more slower than before.

Just wanted to be sure everything is working as expected and we don't miss something. Most files are small enough, so users might not even notice it.

wbond · 2020-09-09T16:07:00Z

Erf, or maybe a useless loop over every named context was added to a hot spot in the lexing engine… 🤦‍♂️

deathaxe · 2020-09-09T16:14:52Z

:)

wbond · 2020-09-10T00:47:38Z

Funny story, but I removed that debug loop and was getting about 25% better performance…

* [Erlang] Convert to sublime-syntax version 2 This commit ... 1. sorts header contexts 2. replaces `pop: true` by `pop: 1` 3. removes obsolete lookahead patterns which have been used to maintain meta scope boundaries before. * [Erlang] Remove unnamed contexts from multi-push This commit converts anonymous contexts into named contexts and uses those in all multi-push and multi-set statements in order to improve readability. Note: This commit does not introduce functional changes. * [Erlang] Refactor record contexts This commit... 1. converts anonymous contexts into named contexts and uses those in all multi-push and multi-set statements in order to improve readability. 2. creates a `record-fields-declaration` context, which is used in both - the record declaration preprocessor statement (preproc-record) and - the record instantiation statement. It is to reduce duplicated patterns. Note: This commit does not introduce functional changes. * [Erlang] Refactor fun contexts

DeathAxe added 4 commits August 23, 2020 09:35

[Erlang] Convert to sublime-syntax version 2

f15f356

This commit ... 1. sorts header contexts 2. replaces `pop: true` by `pop: 1` 3. removes obsolete lookahead patterns which have been used to maintain meta scope boundaries before.

[Erlang] Remove unnamed contexts from multi-push

9323650

This commit converts anonymous contexts into named contexts and uses those in all multi-push and multi-set statements in order to improve readability. Note: This commit does not introduce functional changes.

[Erlang] Refactor fun contexts

a75fcc8

deathaxe mentioned this pull request Aug 30, 2020

A "fail" statement might cause its branch statement to be aborted sublimehq/sublime_text#3457

Closed

wbond merged commit 16f1117 into sublimehq:master Sep 14, 2020

deathaxe deleted the pr/erlang/syntax-version-2 branch September 15, 2020 15:55

jrappen mentioned this pull request Jan 5, 2021

Curated list of languages trishume/syntect#168

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Erlang] sublime-syntax format version 2 #2464

[Erlang] sublime-syntax format version 2 #2464

deathaxe commented Aug 25, 2020 •

edited

Loading

wbond commented Sep 9, 2020

deathaxe commented Sep 9, 2020

wbond commented Sep 9, 2020

deathaxe commented Sep 9, 2020

wbond commented Sep 10, 2020

[Erlang] sublime-syntax format version 2 #2464

[Erlang] sublime-syntax format version 2 #2464

Conversation

deathaxe commented Aug 25, 2020 • edited Loading

Benchmarks

Results

Engine Related Performance

Syntax Definition Related Performance

wbond commented Sep 9, 2020

deathaxe commented Sep 9, 2020

wbond commented Sep 9, 2020

deathaxe commented Sep 9, 2020

wbond commented Sep 10, 2020

deathaxe commented Aug 25, 2020 •

edited

Loading