Machine readable Parser ATN simulator output #3814

parrt · 2022-08-12T18:55:35Z

parrt
Aug 12, 2022
Maintainer

Per the discussion here, #3718, it would be very helpful to compare the flow of control through the ATN simulators (parsers/lexers) across targets, including predicate evaluation and DFA state creation.

There could be order issues depending on how set are traversed by the various targets but we can worry about that later.

I suspect that we can get really close to Emmanuel difference tool between target but it might still require human evaluation. Ideally we would have something integrated into the standard testing mechanism.

Pinging @KvanTTT @ericvergnaud @jimidle @kaby76

parrt · 2022-08-12T18:56:15Z

parrt
Aug 12, 2022
Maintainer Author

@kaby76 has repo https://github.com/kaby76/issue-3718 that is very helpful and leads the way to showing the difference between the output of the various debug flags across a few target.

0 replies

parrt · 2022-08-12T19:05:32Z

parrt
Aug 12, 2022
Maintainer Author

Looking at the output, e.g., https://github.com/kaby76/issue-3718/blob/055806acc3297769aa7f5f7336deada7781d9d74/original-grammar/csharp/out.txt#L3517, there's a huge amount that gets tracked. Seems like we should start with the critical DFA state creation expand from there. Currently, I see output such as the following from Java target:

adding new DFA state: 1:[(14,1,[32 $]), (18,1,[32 $]), (20,1,[32 $]), (66,1,[28 32 $]), (26,1,[32 $]), (14,2,[32 $],up=12), (18,2,[32 $],up=12), (20,2,[32 $],up=12), (66,2,[28 32 $],up=12), (26,2,[32 $],up=12)],conflictingAlts={1, 2},dipsIntoOuterContext=>1
...
EDGE 0:[(62,1,[31 $],{9>=prec}?), (64,2,[35 $],{8>=prec}?), (38,3,[$],{4>=prec}?), (41,4,[$],{3>=prec}?), (44,5,[$],{7>=prec}?), (51,6,[$],{6>=prec}?)],hasSemanticContext=True -> -1:[(63,1,[31 $],{9>=prec}?)],hasSemanticContext=True,uniqueAlt=1=>[({9>=prec}?, 1)] upon EQ<13>
adding new DFA state: 1:[(63,1,[31 $],{9>=prec}?)],hasSemanticContext=True,uniqueAlt=1=>[({9>=prec}?, 1)]

See #3718 (comment) for more comparisons.

0 replies

parrt · 2022-08-12T21:24:03Z

parrt
Aug 12, 2022
Maintainer Author

Ok, playing around with a few ideas. Output looks like:

NEW STATE: 0 in DFA for ATN.s0 18
	0:[(11,1,[$]), (14,1,[$]), (5,2,[$],up=1)],dipsIntoOuterContext
NEW STATE: 1 in DFA for ATN.s0 18
	1:[(15,1,[$])],uniqueAlt=1=>1
EDGE: 0 -> 1 upon 4 in DFA for ATN.s0 18
	0:[(11,1,[$]), (14,1,[$]), (5,2,[$],up=1)],dipsIntoOuterContext
	->
	1:[(15,1,[$])],uniqueAlt=1=>1
NEW STATE: 0 in DFA for ATN.s0 16
	0:[(11,1,[$],{2>=prec}?), (14,2,[$],{1>=prec}?)],hasSemanticContext=true
NEW STATE: 1 in DFA for ATN.s0 16
	1:[(15,2,[$],{1>=prec}?)],hasSemanticContext=true,uniqueAlt=2=>[({1>=prec}?, 2)]
EDGE: 0 -> 1 upon 4 in DFA for ATN.s0 16
	0:[(11,1,[$],{2>=prec}?), (14,2,[$],{1>=prec}?)],hasSemanticContext=true
	->
	1:[(15,2,[$],{1>=prec}?)],hasSemanticContext=true,uniqueAlt=2=>[({1>=prec}?, 2)]

PR in progress: #3817

2 replies

parrt Aug 12, 2022
Maintainer Author

Will be problem to parse for arbitrary semantic predicates as it stands :(

jimidle Aug 13, 2022

Will be problem to parse for arbitrary semantic predicates as it stands :(

Yes - I think it is tricker thing to get right for machine comparison, than first it seems

parrt · 2022-11-06T21:19:59Z

parrt
Nov 6, 2022
Maintainer Author

@kaby76 how about a zoom call to discuss? Seems like a few more of these:

if ( debug || debug_list_atn_decisions )  {
	System.out.println("predictATN decision "+ dfa.decision+
								" exec LA(1)=="+ getLookaheadName(input) +
								", outerContext="+ outerContext.toString(parser));
}

in the simulators and we are close to getting enough output. Then gotta standardize output and make it kinda readable or diff'able.

0 replies

kaby76 · 2022-11-06T23:39:42Z

kaby76
Nov 6, 2022

What is going to be the plan for these specific debug statements? Are you thinking about replacing them, or getting rid of them? They are pretty useful in telling out AdaptivePredict/closure is working.

3 replies

parrt Nov 7, 2022
Maintainer Author

I think we need to add some more and then standardize what exactly the output should look like so that we can do differences with other targets. The good news is that we should be able to get a deterministic set of output to compare across targets

kaby76 Nov 7, 2022

This sounds good. Those debug prints in AdaptivePredict and below, e.g., L336, L367, L443, L592, etc., are all very helpful. I seem to be diffing the output at least once a month between Java/CSharp and one of the other targets.

Some of the unnecessary diffs are:

"new DFA" versus "NewDFA".
"exec LA(1)==Const<34>" vs "exec LA(1)==<34>"
Only in Go "34 ttype out of range: ..."
"s0-34->:s1=>2" vs "s0-Const->:s1=>2"

I had been making private copies of the runtime that I modified to remove the diffs, but I've been moving to using a sed script to remove the diffs because I can't keep up with the changes in the runtime.

parrt Nov 7, 2022
Maintainer Author

Yep, we need to standardize this.

ericvergnaud · 2022-11-07T04:39:41Z

ericvergnaud
Nov 7, 2022
Maintainer

Yes that’s exactly the approach I’ve followed to fix behavior differences between targets. I’d output to a text file via cold line then text diff to locate the first diff then patiently debug…Envoyé de mon iPhoneLe 7 nov. 2022 à 01:42, Terence Parr ***@***.***> a écrit : I think we need to add some more and then standardize what exactly the output should look like so that we can do differences with other targets. The good news is that we should be able to get a deterministic set of output to compare across targets —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

parrt · 2022-11-07T17:27:35Z

parrt
Nov 7, 2022
Maintainer Author

Ok guys, I've decided this needs to be my next priority: getting a consistent way to compare parsing / lexing / ATN simulation.

Following the approach you guys have taken, I think getting good inconsistent output during simulation and then comparing the output is the best approach. Do we add a flag to the testing rig mechanism that says "dump the simulation output"? Do we make it specific to a particular decision state or dump all from state for start rule? We've seen big issues in DFA state processing, particularly in the hashing any quality area. Go needs some attention, but @jimidle is on it. We should dump more DFA-specific stuff to help him out.

Sounds like @kaby76 and @ericvergnaud have the most experience with this so I will keep you guys in the loop and send you questions for your experience. Here are my initial questions:

To get really targeted unit tests, I would love to just go into the target language itself and write some critical tests, but I just can't manage that myself learning all those targets. Rather than use parsing as a proxy to get down into the DFA state equality function, I'd rather just call it. Can you guys think of a way to do this targeted stuff in a general fashion? I suppose one way would be to add another template to the testing rig that asked it to compare some states but it'd be hard to specify which states etc.
Part of the issue is then we need to be able to turn on this output dynamically, but that adds an IF statement that doesn't get compiled away in the critical ATN simulation software. Currently they look like this in Java:
```
public static final boolean dfa_debug = false;
```
Should I just clean up and augment the Java target and then try to fix the languages I know, like Python, to be consistent?
Should this be embedded in the existing runtime test rig? Ah. It looks like we already have a flag that gets injected into the templates:
```
[flags]
showDiagnosticErrors
```
That seems to do this in the helper templates which doesn't trigger that ATN simulation dump (just reports ambiguities and so on):
```
<if(debug)>
        parser.addErrorListener(new DiagnosticErrorListener());
<endif>
```

2 replies

kaby76 Nov 7, 2022

On Question "2", I was never a fan static final boolean/static readonly bool for the debugging flag. I could never turn on the flag in the debugger when I needed it. If you really don't want to get the perf hit of an added if-statement (or it not being a "SAFE" assembly because of uses of System.Console.WriteLine()), deliver two builds, Release and Debug. On Question "1", I don't see any way around not writing target-specific code to do these comparisons at a unit test level. But, we could add a Github Workflow to test the "dev" branch of Antlr4 against grammars-v4, once a week for example, which would compare detailed parser functionality across targets.

parrt Nov 8, 2022
Maintainer Author

Do you turn on debug or debug_list_atn_decisions or dfa_debug?

ericvergnaud · 2022-11-07T17:28:35Z

ericvergnaud
Nov 7, 2022
Maintainer

To ease alignment and maintenance, we could add flags in the test runner that enforce logging, and check the output in a dedicated test

…

Le 7 nov. 2022 à 18:23, Terence Parr ***@***.***> a écrit : Yep, we need to standardize this. — Reply to this email directly, view it on GitHub <#3814 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZNQJAE53WZVGEJFSC27XLWHE3JZANCNFSM56MSDYGA>. You are receiving this because you were mentioned.

0 replies

ericvergnaud · 2022-11-07T17:35:01Z

ericvergnaud
Nov 7, 2022
Maintainer

just crossed emails…. Re 1: if the logging goes to the console, I don’t think there is a need for a new template, just one or more tests that check the output ? Re 2: if the flag is imported from a dedicated class, then it’s feasible to rewrite that class as part of testing prep and rebuild. Just needs to guarantee isolation.

…

Le 7 nov. 2022 à 18:27, Terence Parr ***@***.***> a écrit : Ok guys, I've decided this needs to be my next priority: getting a consistent way to compare parsing / lexing / ATN simulation. Following the approach you guys have taken, I think getting good inconsistent output during simulation and then comparing the output is the best approach. Do we add a flag to the testing rig mechanism that says "dump the simulation output"? Do we make it specific to a particular decision state or dump all from state for start rule? We've seen big issues in DFA state processing, particularly in the hashing any quality area. Go needs some attention, but @jimidle <https://github.com/jimidle> is on it. We should dump more DFA-specific stuff to help him out. Sounds like @kaby76 <https://github.com/kaby76> and @ericvergnaud <https://github.com/ericvergnaud> have the most experience with this so I will keep you guys in the loop and send you questions for your experience. Here are my initial questions: To get really targeted unit tests, I would love to just go into the target language itself and write some critical tests, but I just can't manage that myself learning all those targets. Rather than use parsing as a proxy to get down into the DFA state equality function, I'd rather just call it. Can you guys think of a way to do this targeted stuff in a general fashion? I suppose one way would be to add another template to the testing rig that asked it to compare some states but it'd be hard to specify which states etc. Part of the issue is then we need to be able to turn on this output dynamically, but that adds an IF statement that doesn't get compiled away in the critical ATN simulation software. Currently they look like this in Java: public static final boolean dfa_debug = false; — Reply to this email directly, view it on GitHub <#3814 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZNQJFAVFPM54RYF5H2O23WHE3ZFANCNFSM56MSDYGA>. You are receiving this because you were mentioned.

2 replies

parrt Nov 7, 2022
Maintainer Author

hahah. And I just updated it again. ;)

re 1. Yep, I guess we are capturing the output and so maybe this is ON for all tests and every test then gets it simulation checked?

re 2. The problem is that the Boolean flag, wherever it is, should not result in a run time branch (IF flag)... Trying to get the compiler to drop it out, but I think it only happens if its final in Java. At least that's the only guarantee. The JIT might figure it out that it's always false but...

parrt Nov 7, 2022
Maintainer Author

Thoughts on 3 and 4 @ericvergnaud ?

ericvergnaud · 2022-11-07T17:53:22Z

ericvergnaud
Nov 7, 2022
Maintainer

Re 3: I fear it would slow down all tests and make them hard to understand (many tests check a simple output) Re 4: I’d need to test it. I don’t know if the optimization is done by the compiler or the jit.

…

Le 7 nov. 2022 à 18:40, Terence Parr ***@***.***> a écrit : Thoughts on 3 and 4 @ericvergnaud <https://github.com/ericvergnaud> ? — Reply to this email directly, view it on GitHub <#3814 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZNQJHILLO6TIAM65SBQ43WHE5IRANCNFSM56MSDYGA>. You are receiving this because you were mentioned.

0 replies

parrt · 2022-11-11T19:09:09Z

parrt
Nov 11, 2022
Maintainer Author

OK I am working on this today. I wasted a few hours trying to tighten up estimates of parse times for the JVM. No luck even with the central limit theorem working in my favor ha ha. Was hoping to figure out the cost of converting that Constant Boolean into a variable boolean...

1 reply

parrt Nov 11, 2022
Maintainer Author

I think there's nothing to do but simply change it and see if we notice anything about performance because we need to turn that capability on and off I think. Sub classing won't work because we have lots of code that doesn't know to switch to a sub class to turn on debugging so we can't use empty override methods.

My idea right now is to make a test rig that is separate from the standard runtime testing since it will be only used occasionally I hope, when we are trying to track down something in the ATN simulation.

First we run the java Target on a grammar and some input and save the debugging output. Then we run one of the other targets and save the debugging output. The programmers then free to use diff tools to compare. I can reuse some of the generic "run a program in a specified target" from the runtime-test stuff.

parrt · 2022-11-11T19:30:58Z

parrt
Nov 11, 2022
Maintainer Author

Starting a branch #3817

0 replies

parrt · 2022-11-12T02:25:59Z

parrt
Nov 12, 2022
Maintainer Author

Damn. can't capture stdout as tests are multi-threaded out. Will/would have to make ParserATNSimulator (for java only) write to an output stream passed in from test rig. yuck.

0 replies

parrt · 2022-11-12T17:18:58Z

parrt
Nov 12, 2022
Maintainer Author

Ok, thought about it overnight and decided to remove the tests that compare ATN simulator output... The output is so big it's going to be hard for a test rig to tell you the difference etc. I will use the infrastructure of the runtime tests but create the command line tool to generate output that can be diff'd between targets.

0 replies

parrt · 2022-11-12T21:00:44Z

parrt
Nov 12, 2022
Maintainer Author

@kaby76 want to give this a try? It works with Java and C++ at moment. Will update Go next.

$ cd ~/antlr/code/antlr4
$ mvn install -DskipTests=true
$ cd runtime-tests
$ mvn install -DskipTests=true  # again
$ bash ~/antlr/code/antlr4/scripts/traceatn.sh /tmp/JSON.g4 json -target Cpp /tmp/foo.json
closure((1,1,[$]))
closure((37,1,[$]))
...
$ bash ~/antlr/code/antlr4/scripts/traceatn.sh /tmp/JSON.g4 json -target Java /tmp/foo.json
adaptivePredict decision 1 exec LA(1)=='{'<1> line 1:0
predictATN decision 1 exec LA(1)=='{'<1>, outerContext=[obj value json]
closure((13,1,[$]))
closure((24,2,[$]))
adding new DFA state: 0:[(13,1,[$]), (24,2,[$])]
...

1 reply

parrt Nov 12, 2022
Maintainer Author

#3957

parrt · 2022-11-12T23:47:59Z

parrt
Nov 12, 2022
Maintainer Author

output getting closer for java and c++

0 replies

ericvergnaud · 2022-11-13T00:42:07Z

ericvergnaud
Nov 13, 2022
Maintainer

That is going to be very useful. Bugs are generally reported in one target language only, now we’ll be able to check whether they’re target language specific or notEnvoyé de mon iPhoneLe 13 nov. 2022 à 00:48, Terence Parr ***@***.***> a écrit : output getting closer for java and c++ —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

1 reply

parrt Nov 13, 2022
Maintainer Author

Yep. This is painful but I should’ve done it along time ago

parrt · 2022-11-19T23:58:26Z

parrt
Nov 19, 2022
Maintainer Author

Merged #3957 PHP and Swift and Dart are left.

0 replies

Machine readable Parser ATN simulator output #3814

parrt Aug 12, 2022 Maintainer

Replies: 18 comments · 12 replies

parrt Aug 12, 2022 Maintainer Author

parrt Aug 12, 2022 Maintainer Author

parrt Aug 12, 2022 Maintainer Author

parrt Aug 12, 2022 Maintainer Author

jimidle Aug 13, 2022

parrt Nov 6, 2022 Maintainer Author

kaby76 Nov 6, 2022

parrt Nov 7, 2022 Maintainer Author

kaby76 Nov 7, 2022

parrt Nov 7, 2022 Maintainer Author

ericvergnaud Nov 7, 2022 Maintainer

parrt Nov 7, 2022 Maintainer Author

kaby76 Nov 7, 2022

parrt Nov 8, 2022 Maintainer Author

ericvergnaud Nov 7, 2022 Maintainer

ericvergnaud Nov 7, 2022 Maintainer

parrt Nov 7, 2022 Maintainer Author

parrt Nov 7, 2022 Maintainer Author

ericvergnaud Nov 7, 2022 Maintainer

parrt Nov 11, 2022 Maintainer Author

parrt Nov 11, 2022 Maintainer Author

parrt Nov 11, 2022 Maintainer Author

parrt Nov 12, 2022 Maintainer Author

parrt Nov 12, 2022 Maintainer Author

parrt Nov 12, 2022 Maintainer Author

parrt Nov 12, 2022 Maintainer Author

parrt Nov 12, 2022 Maintainer Author

ericvergnaud Nov 13, 2022 Maintainer

parrt Nov 13, 2022 Maintainer Author

parrt Nov 19, 2022 Maintainer Author

parrt
Aug 12, 2022
Maintainer

Replies: 18 comments 12 replies

parrt
Aug 12, 2022
Maintainer Author

parrt
Aug 12, 2022
Maintainer Author

parrt
Aug 12, 2022
Maintainer Author

parrt Aug 12, 2022
Maintainer Author

parrt
Nov 6, 2022
Maintainer Author

kaby76
Nov 6, 2022

parrt Nov 7, 2022
Maintainer Author

parrt Nov 7, 2022
Maintainer Author

ericvergnaud
Nov 7, 2022
Maintainer

parrt
Nov 7, 2022
Maintainer Author

parrt Nov 8, 2022
Maintainer Author

ericvergnaud
Nov 7, 2022
Maintainer

ericvergnaud
Nov 7, 2022
Maintainer

parrt Nov 7, 2022
Maintainer Author

parrt Nov 7, 2022
Maintainer Author

ericvergnaud
Nov 7, 2022
Maintainer

parrt
Nov 11, 2022
Maintainer Author

parrt Nov 11, 2022
Maintainer Author

parrt
Nov 11, 2022
Maintainer Author

parrt
Nov 12, 2022
Maintainer Author

parrt
Nov 12, 2022
Maintainer Author

parrt
Nov 12, 2022
Maintainer Author

parrt Nov 12, 2022
Maintainer Author

parrt
Nov 12, 2022
Maintainer Author

ericvergnaud
Nov 13, 2022
Maintainer

parrt Nov 13, 2022
Maintainer Author

parrt
Nov 19, 2022
Maintainer Author