Add incremental parsing support #2527

dberlin · 2019-04-07T21:00:55Z

This commit adds incremental parsing support to ANTLR4
.
I have only updated the Java target, and the out-of-tree typescript target (see tunnelvisionlabs/antlr4ts#414), but it should be very easy to update the other targets for someone who understands that language. The changes are deliberately minimal.

The Java version here is actually a backport of the typescript version, and took O(2 hours).
(as an aside, i have not written Java in a few years, so i totally expect there are things that could be done better). The comments were originally written for the typescript version, I will go through and clean them up.

A detailed description of how it works is here (which also lists the outstanding issues), but it's a very straightforward implementation of detection of rules that could be affected by token changes. Rule contexts that can't have been affected by a set of token changes are reused and the rules are not re-run. To account for possibly infinite lookahead/lookbehind, we keep track of how far ahead/behind the parser looked last time on each rule, and use that as the bounds to detect changes in.

The tests currently test on a simple grammar and the JavaLR grammar (which exercises the left recursion removal support).

The only class i've added that requires anything even mildly interesting of the runtime is the IncrementalParserData class.
Most of the work there is related to changing the start/end tokens of rule contexts to realign them with the token stream changes. If you only care about the text of the parse tree, and not the position/etc info, this is obviously unnecessary. I have not made this an option.

To track changed tokens and stream adjustments, the Java version of IncrementalParseData uses TreeMap/TreeSet. The Typescript versions uses arrays of ranges and binary search (see https://github.com/dberlin/antlr4ts/blob/incremental/src/IncrementalParserData.ts)

I am happy to encapsulate this into a data structure in the runtime if anyone thinks it is worth it.

As for why do this at all: Yes, ANTLR is actually pretty fast.
My use case is a bit weird - large GCode files, which are often 20+ megabytes. As such, a single parse takes 6-10 seconds (for a 20 meg file).
Users often make small edits to various pieces.
(It's part of a vscode extension).

Lexing GCode is also completely trivial to do in a contextless fashion.
The incremental parser brings the reparse time down to <50ms.

I may get around to adding incremental lexing. As i'm sure Terrence knows, this is " trickier".

I have the beginnings of support (elsewhere) based on some papers, but it is incestuous (the parser tells the lexer what tokens could be valid at a given change point and the lexer tries those rules). There are ways that don't do this, but some require being able to store/rewind/replay the transition state at each token, etc.

Clashsoft · 2020-01-10T19:08:00Z

runtime/Java/src/org/antlr/v4/runtime/IncrementalParser.java

+public abstract class IncrementalParser extends Parser implements ParseTreeListener {
+	// Current parser epoch. Incremented every time a new incremental parser is
+	// created.
+	private static int _PARSER_EPOCH = 0;


This should be an AtomicInteger to avoid race conditions when multiple threads instantiate this class.

dberlin added 2 commits April 7, 2019 13:58

Basic incremental parser working

363a23d

Fix changed rule detection during IncrementalParserData walk

691533a

dberlin force-pushed the incremental branch from 57b6761 to 691533a Compare April 22, 2019 04:15

Don't create ranges where the token index offset is 0

aa47883

Clashsoft suggested changes Jan 10, 2020

View reviewed changes

BurtHarris mentioned this pull request Apr 14, 2020

Incremental Parsing support tunnelvisionlabs/antlr4ts#414

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add incremental parsing support #2527

Add incremental parsing support #2527

dberlin commented Apr 7, 2019

Clashsoft Jan 10, 2020

Add incremental parsing support #2527

Are you sure you want to change the base?

Add incremental parsing support #2527

Conversation

dberlin commented Apr 7, 2019

Clashsoft Jan 10, 2020

Choose a reason for hiding this comment