Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

javascript processing speed #1375

Closed
eiselekd opened this issue Nov 19, 2016 · 2 comments
Closed

javascript processing speed #1375

eiselekd opened this issue Nov 19, 2016 · 2 comments

Comments

@eiselekd
Copy link

Hi, the javascript target in antlr4 seems very slow
for bigger files. I use a 130k vhdl file scanned in with a antlr4
grammar. I did a little test to measure:

cpp-target: 3 sec
javascript-target: 2 minutes
cpp-target-converted_with_emscripten_to_javascript: 15 sec

The test is here: https://github.com/eiselekd/jshdl . Using
emscripten to convert the cpp target parser is
usable however output size is big...
// Greetings Konrad

@sharwell
Copy link
Member

A huge thank you for putting this test together. I used the test to run some benchmarks on the TypeScript target which is just showing signs of life.

In a direct conversion of the test, the TypeScript target completed parsing in 20 seconds. However, if I ran the parser a second time on the same file, it was only a hair shy of 20 seconds, which tells me the parser is spending most of its time in full-context prediction, which by default doesn't use a DFA. So I took things a step further...

The primary approach to reducing full-context prediction time is leveraging 2-stage parsing, where we first attempt to parse the file with full-context disabled altogether - in most cases for many grammars, this works for valid input and full-context is never needed. For the VHDL grammar, this did not work, and based on the number of errors it reported in SLL mode it appears that some rules in the grammar might need to be rewritten for 2-stage to ever provide benefits. tl;dr: Two-stage parsing does not improve performance for the VHDL grammar used in this test.

The optimized fork of ANTLR 4, along with the two targets derived from that fork (Tunnel Vision Labs' C# target and now the TypeScript target) support using a DFA for full-context prediction. So the next step in evaluating this was to enable that feature for the test. I ran the sample parse operation 4 times in sequence, and the results were as follows:

Pass Description Time
1 First pass, no warm-up 7781ms
2 Parse input again, DFA is reused 1736ms
3 Clear DFA and parse again 6739ms
4 Parse input again, DFA is reused 1689ms

The time difference between pass 1 and 3 is primarily due to the fact that the JavaScript files are already loaded (and presumably JIT compiled) for pass 3. This load/compilation time would be included in pass 1's time.

@eiselekd
Copy link
Author

Not familiar with the parser details but I guess that the
main problem in the javascript target is the javascript runtime and
that parser uses javascript objects. On the other side emscripten generated
javascript code (based on the cpp target output) uses ArrayBuffer and no
objects, I guess that is why it can get close to the cpp speed when running jitted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants