Replace edge representation in DFAState class with a fast hashmap (Java runtime). #2015

mdakin · 2017-09-17T17:55:02Z

Note: PR comment is edited to reflect latest changes.

This change fixes lexer performance degradation for non ASCII inputs.

Background:
Existing code uses a DFAState[] to represent edges from the DFAState class to other states and LexerATNSimulator and ParserATNSimulator classes were responsible of creating this array, inserting and retrieving elements.

Lexer edges are for deciding which state to go next when a character is read:
state --character--> otherState

Current implementation of lexer uses a 127 slot array to look up which state to go. This provides fast lookup for characters < 127 and a much slower path to determine next state for the rest of characters.

Parser: Parser also uses the same DFAState class, but in this case edges are defined for tokens instead of characters: state --token--> otherState, so parsers uses same array initialized with the maxTokenType.

Problems with current approach:

The edge representation is not encapsulated properly and exposed
publicly, lifetime of edges are controlled by the Lexer and Parser
clients. resulting in a fragile and error prone structure.
Because Lexer uses a limited sized lookup table, when it is exposed to input with non ASCII characters its performance degrades considerably. This is not an uncommon case, a lot of code contains non English comments, names, symbols, addresses etc. The slowdown is worse than 10x even if the input contains 10% non Ascii characters.
Existing lookup tables by design wastes memory. Each Lexer DFAState object uses ~1KB memory in 64 bit systems. Parser DFAState objects uses 8*maxTokenType bytes each. For complex grammars maxTokenType could be >100. Most of these tables are sparse. (See the histogram below)
Existing array based solution requires positive integer keys, so they are modified to handle -1 as a key, complicating access especially because edge access is not properly encapsulated.

New Solution
To represent DFAState transitions, we now use a small and fast
<int, DFAState> map and remove the DFAState[] .

The proposed map is a very limited hashmap with int keys. It uses open access with linear probing, also uses 2^n capacity for fast modulo look ups. I instrumented the map in a different branch and in most cases it finds the object at the first lookup.

Thread safety
Antlr lexer and parser threads shares the same DFAState graph so graph access must be thread safe. During parsing and lexing reads outnumber writes by a large margin as the number of inputs are processed. So I opted for cheap read-write lock trick as described here (see item 5) which offers good read performance, some reads may get a bit stale data but this is not a problem in antlrs case. (See @sharwell's comment below)

Performance
Performance is an important aspect of this patch, I tried to address
performance degeneration while not regressing fast cases.

I downloaded source code of some open source projects (Java8 SDK,
Linux kernel etc) . For each language, I copied content of all source
files into a separate file. You can download these from here
Tokenized each file and only counted tokens to isolate pure tokenizing performance. An example benchmark can be found here
Tried tests a few times and on 2 different CPU architectures
(AMD Ryzen 1700x and Intel Xeon E5-1650)

Results:

New approach still keeps performance close to existing approach.

When input is pure ASCII performance is similar on AMD Ryzen, up to ~30% regression
on Intel Xeon. Newer Intel architectures and Arm may get different results.
For non ASCII input, completely removes the performance regression and works as fast as pure ASCII input.

Performance numbers are not very stable and changes depends on many factors like JVM parameters, active cpu governor, if tests are done together or separately etc.

Also keep in mind that, small differences in pure tokenizing performance might not matter much. To test this, I repeated performance benchmarks but this time did a little bit more meaningful tasks with read tokens, e.g. counted individual token types, token lengths and checked tokens if they contain only spaces. For pure ASCII inputs, small performance differences between old and new diminishes considerably.

I did not particularly focus on parsing, similar to Lexer with pure ASCII input, I do not expect a big regression on performance because parsing usually involves more work on each state transition, diminishing the importance of a few cycles spent on finding next state. (as explained above). I welcome more benchmarks focused on parsing as well.

Memory Usage
Because its capacity grows only when needed and starting size is quite small (2),
hashmaps may use memory more efficiently. This is not a big concern in case of Lexer, as even
for complex grammars Lexers tend to have <500 DFAState and <10K edges.

Java8 Lexer:
Total DFA states: 348
Total DFA edges: 5805

The histogram of number of edges of all Lexer DFAState nodes:

Edge count histogram:
(0..4]: 85 41.67%
(4..8]: 20 53.33%
(8..10]: 17 58.06%
(10..16]: 12 66.39%
(16..64]: 49 93.61%
(64..100]: 20 99.17%
(100..200]: 3 100.00%

Edges on Parser nodes tend to be even more sparse and number of state objects may get much bigger.
I did not specifically measure the memory usage for parser state node, but if max number of tokens are bigger than a certain amount, there could be gains there. For complex grammars it is not uncommon to have >100 tokens.

This change fixes lexer performance degradation for non ASCII inputs. Background: Existing code uses a DFAState[] to represent edges from the DFAState class to other states and LexerATNSimulator and ParserATNSimulator classes were responsible of creating this array, inserting and retrieving elements. Lexer edges are for deciding which state to go next when a character is read: state --character--> otherState Current impementation used a 127 slot array to look up which state to go. This provides fast lookup for characters < 127 and a much slower path to determine next state for the rest of characters. Parser: Parser also uses the same DFAState class, but in this case edges are defined for tokens instead of characters: state --token--> nextState So parsers uses same array initialized with the maxTokenType. Problems with current approach: 1. The edge representation is not encapsulated properly and exposed publicly, lifetime of edges are controlled by the Lexer and Parser clients. resulting in a fragile and error prone structure. 2. Because Lexer uses a limited sized lookup table, when it is exposed to input with non ASCII characters its performance degrades considerably. This is not an uncommon case, a lot of code contains non English comments, names, symbols, addresses etc. 3. Because lookup tables by design wastes memory. Each Lexer DFAState object uses ~1KB memory (in 64 bit systems). Parser DFAState objects uses 8*maxTokenType memory each. 4. Existing array based solution requires positive integer keys, so they are modified to handle -1 as a key, complicating access especially because edge access is not properly encapsulated. New Solution: To represent DFAState transitions, we now use a small and fast <int, DFAState> map and remove the DFAState[] . The proposed map is a very limited hashmap with int keys. It uses open access with linear probing, also uses 2^n capacity for fast modulo look ups. I instrumented the map in a different branch and in most cases (>99%) it finds the object at the first lookup. Because its capacity grows only when it is needed, it may use memory more efficiently. This is not a big concern in case of Lexer, as even for complex grammars Lexers tend to have <500 DFAState and <10K edges. Java8 Lexer: Total DFA states: 348 Total DFA edges: 5805 Edge count histogram: (0..4]: 85 41.67% (4..8]: 20 53.33% (8..10]: 17 58.06% (10..16]: 12 66.39% (16..64]: 49 93.61% (64..100]: 20 99.17% (100..200]: 3 100.00% Performance: 1. I downloaded source code of some open source projects (Java8 SDK, Linux kernel etc) . For each language, I copied content of all source files into a separate file. 2. Tokenized each file and only counted tokens to isolate pure tokenizing performance. 3. Tried tests a few times and on 2 different CPU architectures (AMD Ryzen 1700x and Intel Xeon E5-1650) I summarized results here: https://docs.google.com/spreadsheets/d/1LqpdXfDz19d0VEItl0T_WF7i1bM9WOj1gs9-IZWCY44 In short: - New approach still keeps performance very close to existing approach when input is pure ASCII (same or better on AMD Ryzen, ~30% regression on Intel Xeon) - For non ASCII input, completely removes the performance regression. A few interesting things to note here, even if input contains less than ~0.5% non ASCII characters, hashmap version beats the lookup in all cases. Also keep in mind that because of Amdahl's Law, small differences in pure tokenizing perfomance might not matter much, to test this, I repeated performance benchmarks but this time did a little bit more meaningful tasks with read tokens, e.g. counted individual token types, token lengths and checked tokens if they contain only spaces. For pure ASCII inputs, small performance difference between old and new approaches diminished considerably. (Data is in the spread sheet linked above) I did not particularly focus on parsing, but by intuition, because parsing usually also involves more work on each state transition, the importance of a few cycles won should be minimal (as explained above) But I welcome more benchmarks focused on parsing as well.

Slightly improves performance, but makes get method a bit more complex. Also make initial map size 2, we do not care much about map resize because this access characteristic for the map is write once read many. And it is quite common to have a DFA node to have only 1 edge.

sharwell · 2017-10-07T21:53:10Z

⚠️ @parrt Keep in mind there is an exceptionally high likelihood that this change will break the thread safety guarantees previously in place. Specifically, we previously guaranteed that two independent instances of a generated parser could be used with independent input streams, and each would return correct results. Race conditions in the shared DFA could result in cases where the parsers produce incorrect/unpredictable results and/or corrupt the internal DFA state.

mdakin · 2017-10-08T00:41:37Z

@sharwell
Could you point me which particular change may break the thread safety guarantees? I can have a look, i don't think current method has any inherent issues with thread safety, on the contrary it should now be easier because the edge representation is now private in DFAState class. Last time i checked all java runtime unit tests passed, but of course there could be things i have missed. I will have a closer look at parser bits tomorrow.

@parrt , @sharwell I would appreciate a code review.

mdakin · 2017-10-08T13:08:35Z

As far as I can see only changes concerning thread safety in this patch are the places where a new edge is inserted to the DFAState objects, e.g.:

PartserATNSimulator:

		synchronized (from) {
			if ( from.edges==null ) {
				from.edges = new DFAState[atn.maxTokenType+1+1];
			}

			from.edges[t+1] = to; // connect
		}

and very similar code in LexerAtnSimulator But note that Lexer version actually locks on the edges array not the DFAState object.

The one in DFA is a bit different:

		// synchronization on s0 here is ok. when the DFA is turned into a
		// precedence DFA, s0 will be initialized once and not updated again
		synchronized (s0) {
			// s0.edges is never null for a precedence DFA
			if (precedence >= s0.edges.length) {
				s0.edges = Arrays.copyOf(s0.edges, precedence + 1);
			}

			s0.edges[precedence] = startState;
		}

These are changed to addEdge method calls in DFAState class:

public void addEdge(int symbol, DFAState state) {
    synchronized(this) {
        edges.put(symbol, state);
    }
}

In the current version access edges array in DFAState class is created or replaced in a synchronized block in caller classes. In new version it is protected in the DFAState class.

Because accessing edges is not protected in both cases, both are susceptible to race conditions , if multiple threads write / read simultaneously. However new approach could be more fragile in the case of accessing it because backing arrays in the hashmap may be resized when new elements are inserted. Current approach mostly does not resize the array other than initialization (the one in DFA class does though) so probability of such a problem is smaller.

It is possible to add extra protection, either on the callers or in the DFAState class. I will check the performance hit that it may inflict.

sharwell · 2017-10-08T13:53:27Z

@mdakin The read operation lies on the fast path (DFA already exists), so care was taken to eliminate that lock. The unprotected read will end up reading one of the following values if a race occurs:

Null (no edge)
A non-null, valid edge

A fully protected read would ensure the second always occurred. However, in the rare case a null is read, the worst thing that happens is it recalculates an edge with the same result it found previously. By allowing the map to resize, references in the array may move around during a write operation, eliminating the previous guarantees.

About 3-4 years ago, I worked with an ANTLR user who was dealing with intense lock contention during parsing. Solving the problem for this user required all locks on the fast path be eliminated. However, I believe this particular user is working with my fork that uses fewer locks, so they would not be directly impacted if this change was merged. The users most impacted by adding a lock on the read operation will be those running concurrent parsing on machines with 32+ cores, with greatest impacts for multi-socket scenarios.

mdakin · 2017-10-08T17:13:32Z

@sharwell
Before offering any alternatives, I have a few questions;

So while parsing code, does new DFA states and edges continue to be generated? if so, when parser object is shared between threads (I guess reasoning was because memory footprint of parser is large?) this indeed could cause problems for a non thread safe hashmap.
Is this a problem for lexers as well? meaning, after the Lexer is generated, is there a possibility of adding new DFA states / edges while tokenizing?

mdakin · 2017-10-09T09:51:07Z

@sharwell @parrt @ahmetaa

To answer my own question, indeed new edges are created while lexer / parser runs. So this means that edge representation must be multi thread write / multi thread read and non thread safe hashmap is problematic. There are a few solutions I can try:

First abandon hashmap approach for parser, usually token count is not that high and definitely not unbounded like lexer inputs, so this was only saving some memory.
Keep hashmap in the Lexer, but move the map reference to a ThreadLocal storage, this will slow things down, but not as much as making getters locked and it is not susceptible to contention. The additional memory use increase is (IMO) negligible.

or implement / use a lock free hashmap, this would again offer better characteristics. But implementing a lockless hashmap even a simple one like my version is tricky, and I did not try existing ones to gauge the performance impact.

mdakin · 2017-10-09T10:42:58Z

@mike-lischke
Mike, As far as I remember you changed this to an unordered map in C++ version as well, how did you handle concurrent writers/readers?

mike-lischke · 2017-10-09T11:19:28Z

@mdakin see here: https://github.com/antlr/antlr4/blob/master/runtime/Cpp/runtime/src/dfa/DFA.cpp#L73

mdakin · 2017-10-09T11:38:57Z

@mike-lischke

Thanks, that bit is ok, I was more interested in reads, and I think I found it, reads are also lock protected:

https://github.com/antlr/antlr4/blob/master/runtime/Cpp/runtime/src/atn/LexerATNSimulator.cpp#L188

which is:
static antlrcpp::SingleWriteMultipleReadLock _edgeLock;

antlr4/runtime/Cpp/runtime/src/atn/ATNSimulator.h

Line 60 in 990d484

    
           static antlrcpp::SingleWriteMultipleReadLock _edgeLock; // Lock for the sparse edge map in DFA states.

C++ version has -possibly- the same issues with my Java version then (If I add locking to reads), slower than using a lookup array also possible lock contention if many threads share the lexer/parser.

Another question is, how does SingleWriteMultipleReadLock work? It seems when lexer or parser is shared multiple threads can write.

mike-lischke · 2017-10-09T12:01:15Z

Yes, right, this check makes the entire process slower. There has already been a discussion about slow parsing speed with multiple parsers. But I don't see another way to synchronize access to the static DFA.

And indeed you also found a bug. How could I miss that? The state lock should move to the DFA class. Each ATNSimulator instance has an own lock currently, hence the access serialization doesn't work as intended. We should write a thread safe unsorted map class (or use an existing one, preferably).

mdakin · 2017-10-09T12:01:34Z

@sharwell @mike-lischke @parrt

To add more context to discussion, I made a very quick test with different locking schemes for protecting reads, here are the very rough numbers:
Platform: Intel Xeon, will test AMD Ryzen later,
I tested tokenizing 200MB Turkish text with our grammar

Current: 45.8s
Hashmap, No locking: 3.2s
Hashmap, Threadlocal: 4.5s
Hashmap, ReentrantLock: 6.9s
Hashmap, synchronized block: 7.2s

For Java8 lexer , 80MB Java SDK code:
Current: 1.21s
Hashmap, No locking: 1.41s
Hashmap, Threadlocal: 2.01s

I think Threadlocal offers a good compromise, some regression on pure ASCII input, no lock contention, and total memory footprint is possibly still smaller than using 127[] edge arrays if thread count is below a certain number.

mike-lischke · 2017-10-09T12:04:41Z

How does the thread local impl look like? What exactly is thread local? You can't make the DFA thread local.

mdakin · 2017-10-09T12:06:03Z

@mike-lischke
But you can make edge map inside DFAState class threadlocal, each thread would have its own copy of it. It increases memory footprint but I don't see a reason it should not work.

mdakin · 2017-10-09T12:08:13Z

Something like this (my crude attempt), I hope I am not missing anything :)

	private final ThreadLocal<SimpleIntMap<DFAState>> localEdges = new ThreadLocal<>();
...
	public DFAState(ATNConfigSet configs) {
		this.configs = configs;
		localEdges.set(new SimpleIntMap<DFAState>(2));
	}

	public void addEdge(int symbol, DFAState state) {
		SimpleIntMap<DFAState> edges = localEdges.get();
		synchronized(this) {
			edges.put(symbol, state);
		}
	}

	public DFAState getTargetState(int symbol) {
		return localEdges.get().get(symbol);
	}

mike-lischke · 2017-10-09T12:10:01Z

Hmm, but then you are no longer sharing the values in the edge map, duplicating potentially a lot of memory (depending on the number of parallel threads). Also this approach doesn't help with synchronizing access to the DFA. The state lock also costs time.

mdakin · 2017-10-09T12:21:39Z

@mike-lischke
(I am speaking for Lexer)
For lexers, the number of edges are not big, if you check the numbers I mentioned in the description of the PR, even for large and complex grammars like Java8, number of States and edges is small:

Java8 Lexer:
Total DFA states: 348
Total DFA edges: 5805

And most of the edges has very few elements, so the size of the hashmap usually (80%) below 20. compared to The default case of using a DFAState[127] . So after a certain amount of threads (~6) it starts using more memory than current, but it should not be a problem because number of states is small.

also for syncronization, it is fine, getTargetState does not use synchronization and it is contention free (threads always read from their own version), and also I think because it is threadlocal I don't actually need the one in the addEdge.

For parser, I guess it is more complicated because it has more nodes and size of the DFAState array is limited to maxToken.

Another thing is I did not use ThreadLocal for very long time, I might have missed a limitation for it and maybe all of this is moot :) Need to read a bit about it to brush up. (e.g. I just realized I had to initialize threadlocal variable in the first addEdge call not in constructor :) )

mdakin · 2017-10-09T13:01:49Z

Sorry for spamming, but I have another question regarding lexer, Lexers take input streams in their constructor, how exactly are they shared between threads? Tokenizing same input with multiple threads does not look a feasible concurrency scheme.

mike-lischke · 2017-10-09T13:17:09Z

You cannot share lexers. Each parser needs an own lexer. What you share is only the static data (DFA etc.)

mdakin · 2017-10-09T13:18:58Z

@mike-lischke
I am not following, then why do we lock anything in ATNLexerSimulator if it is always used by a single thread?

mike-lischke · 2017-10-09T13:57:23Z

We are not locking lexers but static data (as I said before), like the DFA and the ATN, which can be shared between multiple parsers, even on different threads.

mdakin · 2017-10-09T23:55:37Z

@mike-lischke
I see, thanks. Probably these (from the generated lexers):

  protected static final DFA[] _decisionToDFA;
  
  public static final ATN _ATN =
    new ATNDeserializer().deserialize(_serializedATN.toCharArray());

I will go ahead and change the lexer DFAState to use ThreadLocal hashmaps, and rollback parser edges to array version (maybe encapsulate it in the DFAState class though). Hopefully will update in a few days.

sharwell · 2017-10-10T05:48:46Z

📝 I think I figured out a way to do this efficiently. I'll take a look tomorrow and see how it plays out.

mdakin · 2017-10-10T09:55:02Z

@sharwell
Curious, I hope it plays out. For lexer, to preserve concurrency guarantees and performance together, the only way I can think of is to keep the existing array and fall back to a synchronized hashmap only if input is out of range. (TBH maybe this is not a bad solution anyway).

sharwell · 2017-10-14T17:06:59Z

The three slow lines are the single threaded cases.

mdakin · 2017-10-14T17:41:16Z

@sharwell I made a small change to this patch, it only replaces the edgemap when it is expanding, getters may get the older version, but this is ok for the case of antlr? Doesn't this kind of scheme solves the problem as well?

mdakin · 2017-10-14T18:13:42Z

@sharwell
My approach has the benefit of creating a new map only when it is needed to expand, making it use less memory, but it also becomes a bit slower because each rewrite of the map flushes it from the cache, but this does not happen very often.
On the other hand, I don't care which implementation is used. The existing array lookup behaves terribly when input is unicode and I would appreciate if you could send your change as a separate patch for inclusion in next version of the antlr (at least for lexers).

@parrt
Could you chime in on how to progress on this?

mdakin · 2017-10-15T15:59:11Z

@sharwell I am also curious,,why do you expand the array on collision, why not only when it needs to expand (e.g. reaches a threshold like my map) is it because of performance concerns, that a hashmap with colliding keys would be slower?

sharwell · 2017-10-15T23:18:17Z

@mdakin My implementation behaves as a perfect hash (one hash function, and only one item allowed in a bucket). There is no advantage to expanding the underlying array prior to a collision. For a hash map with single-bucketing and a perfect hash function, there is no performance penalty even for operating at 100% capacity.

mdakin · 2017-10-16T06:28:40Z

@sharwell
The advantage of non perfect hashmap is that it tends to use less memory though, because of caching effects it can also be faster than the perfect version, is the benchmark you used available so I can test as well?

Another question is why did you need to make key array atomic? It seems there is no reason for it.

But in the end, this is a nice solution, should be fine to add as default in antlr?

mdakin · 2017-10-16T06:51:26Z

@sharwell
But in principle I agree that a simple perfect hashmap scheme could be a good fit for antlr case, it is basically (almost)write once, read many map. I wonder if a simple / fast scrambler method would help or make things worse:

 public static int mix( final int x ) {
    final int h = x * 0x9E3779B9; // int phi
    return h ^ (h >> 16);
}

I will try to add a similar scheme after coming back from holidays.

@ahmetaa
You have quite a bit experience with perfect hashmaps, any input on this?

mdakin · 2017-10-16T08:38:20Z

@sharwell
BTW, Never mind the atomic key array question, it is ok to only protect key array. You also check if keys are same after getting the value so it is ok.

ahmetaa · 2017-10-16T11:40:02Z

@mdakin In the perfect and minimal perfect hash functions I dealt with, keys were known before constructing of the data structures. Main motivation was compression. I am not sure if these can be applied here. I need to understand the discussion better before making a further comment.

parrt · 2017-10-21T16:39:51Z

Guys,This is fascinating but I'm pretty nervous about changing the fundamental data structure we have, particularly in light of the potential for threading race conditions. Is there a way we can make this an option? I can see maybe adding an extra field to DFAState or perhaps allowing us to swap in a different DFAState subclass?

btw, This is too big of a change for our point release 4.7.1 coming up, if we decide to do it.

mike-lischke · 2017-10-21T18:13:05Z

@parrt This patch changes things that certainly need consideration and improvement. I have already done similar changes in the C++ target and Sam in his optimized fork. The change has been backed with some tests and proves for its effectivity. It would be a shame if that would have to wait another year to be included. I vote for inclusion in 4.7.1. even if that means a week delay e.g. to write more tests and do another deep review.

parrt · 2017-10-21T19:25:24Z

My main concern would be thread safety I guess. Any thoughts on how we could fold this in as an option in 4.7.1?

mdakin · 2017-10-24T11:09:05Z

@parrt
I will update the PR with a new proposal in a few days (Something similar to @sharwell 's perfect map), that will hopefully address all concerns.

mdakin · 2017-10-26T11:01:34Z

@sharwell
A qq, the benchmark results you provided above with HashEdgeMap is against SparseEdgeMap, not the current antlr head implementation (DFAState[]) right?

sharwell · 2017-10-26T14:03:43Z

benchmark results you provided above with HashEdgeMap is against SparseEdgeMap

Yes, that is correct.

mdakin · 2017-10-26T22:12:00Z

@parrt
For thread safety I used cheap read-write lock trick as described here (see item 5)
Basically writes are synchronized, it rewrites the map when it needs to expand. Reads are not synchronized, but in the worst case they see a possibly stale but consistent version of it.

Otherwise it is still the same simple hashmap with int keys, I also tried a more complex approach like starting with a perfect hashmap similar to @sharwell 's (single lookup, no collisions) and falling back to int map after a certain size. However this did not give any benefit on my home PC (AMD Ryzen), but i will double check with an intel machine as it seems architectures behave quite differently. That version is in a different branch I opted for this simpler version.

Access is cheap through volatile reference, but nature of map and added indirections still inflicts a performance penalty for lexer when input is pure ASCII (up to 30%) depending on the input and grammar. For mixed inputi (10% non ascii), it still speeds up >10x.

I abstracted away edge access in a new class, if necessary we can still add a single array based solution in it if we know the upper limit of keys beforehand (e.g. parsers) in this class. Class has relatively well unit tests.

How should we proceed? Could you test it as well?

parrt · 2017-10-27T15:24:13Z

Any thoughts on how we could fold this in as an option in 4.7.1?

mdakin · 2017-10-28T13:51:04Z

@parrt
I am not sure it is easy to make it an option, main problem is that in current antlr head, lifetime of the the edge representation (a raw DFASTate[]) is entirely controlled by the other classes (Parser, Lexer etc) without abstracting away the access into a separate class (e.g. similar to DFAEDgeCache class I introduced) it would be very ugly.

Is it possible to make this default and have a release candidate so it can be tried and we can address if anything arises?

Another question is, what is the mechanism to add options to antlr (or individually to parser - lexer classes)?

mdakin · 2017-10-29T10:50:59Z

@parrt
My preference is that we bite the bullet and just use the map version.

…ix to get method, some doc updates.

mdakin · 2017-10-29T13:36:25Z

@parrt, @sharwell

I updated the PR description to reflect the changes. Thanks for all the discussion and suggestions.

I also tried to test moving the current array version into DFAEdgeCache class, but I did not observe any significant performance benefit, so I think keeping it simple is the way to go.

If you have time, I would also appreciate a code review and testing, it is not much code, ~200 lines + tests.

parrt · 2017-11-01T17:24:16Z

Given that the DFA stuff is the secret sauce, you can imagine why I am hesitant to just give it the old smoke test. ;) It would affect hundreds of thousands of users around the world. As you say, other classes update its fields. Perhaps we come up with a patch that drops new classes in on top so it's less of an option and more of a monkey patch for people that want to try it?

mdakin · 2017-11-02T21:59:16Z

@parrt
Unfortunately I don't have any time or energy left for this patch, feel free to close it.

parrt · 2017-11-02T22:33:32Z

Ok, I get it. Sorry for the hassle but you can imagine that I've worked on this for 30 years and I'm hesitant to completely replace something that is so critical to the correct functioning of the tool. Thank you for all of your contributions nonetheless and this could prove very useful in the future.

mdakin mentioned this pull request Sep 17, 2017

[Java] Lexer is >10x slower if input is not in Ascii range #1613

Open

mdakin added 2 commits September 20, 2017 09:30

Fix typos, tidy up test code a bit.

bd1e275

Removed test constructor.

718e04f

mdakin changed the title ~~Replace edge representation in DFAState class with a fast hashmap.~~ Replace edge representation in DFAState class with a fast hashmap (Java runtime). Oct 6, 2017

mdakin added 2 commits October 7, 2017 18:33

Small simplification.

2bf0a67

A crude attempt to make writes concurrent but reads lock free.

42d20fa

mdakin added 3 commits October 26, 2017 22:39

Introduce DFAEdgeCache class.

f009db1

Merge branch 'master' of https://github.com/antlr/antlr4

b5844c0

Fix test failures.

c51a8a7

Make inner class private static

7c16fbd

Remove unused impoorts and variables, Make map not generic, a small f…

02be5b4

…ix to get method, some doc updates.

parrt closed this Nov 2, 2017

Replace edge representation in DFAState class with a fast hashmap (Java runtime). #2015

Replace edge representation in DFAState class with a fast hashmap (Java runtime). #2015

Conversation

mdakin commented Sep 17, 2017 • edited Loading

sharwell commented Oct 7, 2017 • edited Loading

mdakin commented Oct 8, 2017 • edited Loading

mdakin commented Oct 8, 2017

sharwell commented Oct 8, 2017 • edited Loading

mdakin commented Oct 8, 2017 • edited Loading

mdakin commented Oct 9, 2017 • edited Loading

mdakin commented Oct 9, 2017

mike-lischke commented Oct 9, 2017

mdakin commented Oct 9, 2017

mike-lischke commented Oct 9, 2017 • edited Loading

mdakin commented Oct 9, 2017 • edited Loading

mike-lischke commented Oct 9, 2017

mdakin commented Oct 9, 2017

mdakin commented Oct 9, 2017

mike-lischke commented Oct 9, 2017 • edited Loading

mdakin commented Oct 9, 2017 • edited Loading

mdakin commented Oct 9, 2017

mike-lischke commented Oct 9, 2017

mdakin commented Oct 9, 2017

mike-lischke commented Oct 9, 2017 • edited Loading

mdakin commented Oct 9, 2017

sharwell commented Oct 10, 2017

mdakin commented Oct 10, 2017

sharwell commented Oct 14, 2017

mdakin commented Oct 14, 2017

mdakin commented Oct 14, 2017

mdakin commented Oct 15, 2017

sharwell commented Oct 15, 2017

mdakin commented Oct 16, 2017 • edited Loading

mdakin commented Oct 16, 2017 • edited Loading

mdakin commented Oct 16, 2017

ahmetaa commented Oct 16, 2017

parrt commented Oct 21, 2017

mike-lischke commented Oct 21, 2017

parrt commented Oct 21, 2017

mdakin commented Oct 24, 2017 • edited Loading

mdakin commented Oct 26, 2017

sharwell commented Oct 26, 2017

mdakin commented Oct 26, 2017 • edited Loading

parrt commented Oct 27, 2017

mdakin commented Oct 28, 2017 • edited Loading

mdakin commented Oct 29, 2017

mdakin commented Oct 29, 2017 • edited Loading

parrt commented Nov 1, 2017

mdakin commented Nov 2, 2017

parrt commented Nov 2, 2017

mdakin commented Sep 17, 2017 •

edited

Loading

sharwell commented Oct 7, 2017 •

edited

Loading

mdakin commented Oct 8, 2017 •

edited

Loading

sharwell commented Oct 8, 2017 •

edited

Loading

mdakin commented Oct 8, 2017 •

edited

Loading

mdakin commented Oct 9, 2017 •

edited

Loading

mike-lischke commented Oct 9, 2017 •

edited

Loading

mdakin commented Oct 9, 2017 •

edited

Loading

mike-lischke commented Oct 9, 2017 •

edited

Loading

mdakin commented Oct 9, 2017 •

edited

Loading

mike-lischke commented Oct 9, 2017 •

edited

Loading

mdakin commented Oct 16, 2017 •

edited

Loading

mdakin commented Oct 16, 2017 •

edited

Loading

mdakin commented Oct 24, 2017 •

edited

Loading

mdakin commented Oct 26, 2017 •

edited

Loading

mdakin commented Oct 28, 2017 •

edited

Loading

mdakin commented Oct 29, 2017 •

edited

Loading