ATN serialization improvements #3556

KvanTTT · 2022-02-23T17:12:15Z

Move decode method from ATNSerializer to ATNDeserializerHelper
Get rid of excess ATN serialization (for interpreter data and generated file)

…sts)

parrt · 2022-02-26T18:54:54Z

Hi. can you remind me what the primary purposes of shuffling code? as usual I don't like creating new files unless we have to. could you help me understand the excess serialization going on? thanks!

KvanTTT · 2022-02-26T19:00:50Z

I've moved decode out of the serializer because of the following reasons:

It's used only for tests. There is no need to keep it in the ANTLR core package
Actually it's for deserialization, not for serialization

parrt · 2022-02-26T19:31:57Z

It's used only for tests. There is no need to keep it in the ANTLR core package

Actually it's for deserialization, not for serialization

haha! Yes, I see that now. It's only for debugging/testing I think. A better name is maybe summarize() or describe() as decode sounds like deserialize.

Not sure we need separate class/file for a function. Why not just put decode() into TestATNSerialization.java?

parrt · 2022-02-26T19:33:48Z

can you remind me what the excess serialization stuff you're referring to? thanks. looks like there is an API change internally that causes fairly widespread changes.

KvanTTT · 2022-02-26T19:58:05Z

A better name is maybe summarize() or describe() as decode sounds like deserialize.

Ok.

Not sure we need separate class/file for a function. Why not just put decode() into TestATNSerialization.java?

Because it's already quite a big class and decode method is also not small. I just would like to split tests and helper code.

can you remind me what the excess serialization stuff you're referring to?

I've encountered it during debugging. I've noticed the ATN is being serialized (call of ATNSerializer.serialize) both for .interp file and for serialized data for lexer/parser.

parrt · 2022-02-27T00:03:33Z

Because it's already quite a big class and decode method is also not small. I just would like to split tests and helper code.

Yes, big is bad but on the other hand so it is making a class with one method and introducing a new file. I think I would prefer just dumping the decode method into the test class.

Hmm... looks like some of the targets have implemented decode() like C++ (ATNSerializer.h):

    virtual std::string decode(const std::wstring& data);

I've noticed the ATN is being serialized (call of ATNSerializer.serialize) both for .interp file and for serialized data for lexer/parser.

Ah. Hmm...let me pull in your branch and use a good difference tool instead of the website see what the changes really look like.

parrt · 2022-02-27T00:15:47Z

tool/src/org/antlr/v4/Tool.java

 			}
 		}
 		content.append("\n");

-		IntegerList serializedATN = ATNSerializer.getSerialized(g.atn, g.getLanguage());
-		// Uncomment if you'd like to write out histogram info on the numbers of


seems like we should leave this in

parrt · 2022-02-27T00:17:41Z

tool/src/org/antlr/v4/codegen/BlankOutputModelFactory.java

@@ -29,7 +30,7 @@
 	public ParserFile parserFile(String fileName) { return null; }

 	@Override
-	public Parser parser(ParserFile file) { return null; }


at first glance it looks strange that lexer() function below does not also need an ATN.

Yes, because lexer works a bit differently and gets atnData from somewhere else. I'll take a look if it's possible to unify the parameter.

It depends on this code: https://github.com/antlr/antlr4/blob/dev/tool/src/org/antlr/v4/codegen/OutputModelController.java#L140-L152 There is no call to lexer method from OutputModelController but call to constructor directly.

parrt · 2022-02-27T00:22:19Z

tool/src/org/antlr/v4/codegen/model/Lexer.java


-	public Lexer(OutputModelFactory factory, LexerFile file) {
-		super(factory);
-		this.file = file; // who contains us?


seems like that's a useful comment

Maybe renamed file to containingFile instead of using such a comment? I prefer getting rid of comments if possible.

parrt · 2022-02-27T00:22:47Z

tool/src/org/antlr/v4/codegen/model/Parser.java


-	public Parser(OutputModelFactory factory, ParserFile file) {
-		super(factory);
-		this.file = file; // who contains us?


same as lexer; a useful comment

parrt · 2022-02-27T00:23:57Z

tool/src/org/antlr/v4/codegen/model/Recognizer.java

@@ -63,13 +65,14 @@ public Recognizer(OutputModelFactory factory) {

 		ruleNames = g.rules.keySet();
 		rules = g.rules.values();
-		atn = new SerializedATN(factory, g.atn);


I'm not a fan of flipping all of these IF to ?:. I don't think they are is easy to read.

you're doing all sorts of code revisions that are not really needed but are costing my time to review.

not a problem. You are a valuable contributor and a hardcore tech guy!! I just think maybe we need to synchronize better :) your instincts seem very good but I have to fit that into my limited focus time.

parrt · 2022-02-27T00:27:56Z

tool/src/org/antlr/v4/Tool.java

-				errMgr.toolError(ErrorType.CANNOT_WRITE_FILE, ioe);
-			}
-		}
+		IntegerList atnData = gencode && g.tool.getNumErrors()==0


not sure I like passing this ATN everywhere. A widespread change for a worthy goal but not sure it's worth this particular solution. Why not simply move this code that writes out the interp file to the "if gencode" section? then we are reusing and not recomputing it right?

Unfortunately, it's not so simple. Code generator uses atnData deeply internally during generation. StringTemplate library uses the serialized field from SerializedATN.

Stack trace:

atnData for interp file is being calculated in Tool.java. It should be passed to the code generator somehow.

parrt · 2022-03-26T21:35:58Z

I will close this as it is superseded by the big PR I just pulled in.

KvanTTT added 2 commits February 23, 2022 19:28

Move decode method from ATNSerializer to ATNDeserializerHelper (to te…

acf0c22

…sts)

Get rid of excess ATN serialization

e0fca8e

KvanTTT mentioned this pull request Feb 24, 2022

Set version to 4.10 #3460

Closed

parrt reviewed Feb 27, 2022

View reviewed changes

KvanTTT mentioned this pull request Mar 21, 2022

Use signed ints for ATN serialization not uint16, except for java #3591

Merged

parrt closed this Mar 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ATN serialization improvements #3556

ATN serialization improvements #3556

KvanTTT commented Feb 23, 2022 •

edited

Loading

parrt commented Feb 26, 2022

KvanTTT commented Feb 26, 2022

parrt commented Feb 26, 2022

parrt commented Feb 26, 2022 •

edited

Loading

KvanTTT commented Feb 26, 2022 •

edited

Loading

parrt commented Feb 27, 2022

parrt Feb 27, 2022

parrt Feb 27, 2022

KvanTTT Feb 28, 2022

KvanTTT Mar 1, 2022

parrt Feb 27, 2022

KvanTTT Mar 1, 2022 •

edited

Loading

parrt Feb 27, 2022

parrt Feb 27, 2022

parrt Feb 27, 2022

KvanTTT Feb 27, 2022

parrt Feb 27, 2022

parrt Feb 27, 2022

KvanTTT Feb 28, 2022

parrt commented Mar 26, 2022

ATN serialization improvements #3556

ATN serialization improvements #3556

Conversation

KvanTTT commented Feb 23, 2022 • edited Loading

parrt commented Feb 26, 2022

KvanTTT commented Feb 26, 2022

parrt commented Feb 26, 2022

parrt commented Feb 26, 2022 • edited Loading

KvanTTT commented Feb 26, 2022 • edited Loading

parrt commented Feb 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KvanTTT Mar 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parrt commented Mar 26, 2022

KvanTTT commented Feb 23, 2022 •

edited

Loading

parrt commented Feb 26, 2022 •

edited

Loading

KvanTTT commented Feb 26, 2022 •

edited

Loading

KvanTTT Mar 1, 2022 •

edited

Loading