Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ATN serialization improvements #3556

Closed
wants to merge 2 commits into from

Conversation

KvanTTT
Copy link
Member

@KvanTTT KvanTTT commented Feb 23, 2022

  • Move decode method from ATNSerializer to ATNDeserializerHelper
  • Get rid of excess ATN serialization (for interpreter data and generated file)

@KvanTTT KvanTTT mentioned this pull request Feb 24, 2022
@parrt
Copy link
Member

parrt commented Feb 26, 2022

Hi. can you remind me what the primary purposes of shuffling code? as usual I don't like creating new files unless we have to. could you help me understand the excess serialization going on? thanks!

@KvanTTT
Copy link
Member Author

KvanTTT commented Feb 26, 2022

I've moved decode out of the serializer because of the following reasons:

  • It's used only for tests. There is no need to keep it in the ANTLR core package
  • Actually it's for deserialization, not for serialization

@parrt
Copy link
Member

parrt commented Feb 26, 2022

  • It's used only for tests. There is no need to keep it in the ANTLR core package
  • Actually it's for deserialization, not for serialization

haha! Yes, I see that now. It's only for debugging/testing I think. A better name is maybe summarize() or describe() as decode sounds like deserialize.

Not sure we need separate class/file for a function. Why not just put decode() into TestATNSerialization.java?

@parrt
Copy link
Member

parrt commented Feb 26, 2022

can you remind me what the excess serialization stuff you're referring to? thanks. looks like there is an API change internally that causes fairly widespread changes.

@KvanTTT
Copy link
Member Author

KvanTTT commented Feb 26, 2022

A better name is maybe summarize() or describe() as decode sounds like deserialize.

Ok.

Not sure we need separate class/file for a function. Why not just put decode() into TestATNSerialization.java?

Because it's already quite a big class and decode method is also not small. I just would like to split tests and helper code.

can you remind me what the excess serialization stuff you're referring to?

I've encountered it during debugging. I've noticed the ATN is being serialized (call of ATNSerializer.serialize) both for .interp file and for serialized data for lexer/parser.

@parrt
Copy link
Member

parrt commented Feb 27, 2022

Because it's already quite a big class and decode method is also not small. I just would like to split tests and helper code.

Yes, big is bad but on the other hand so it is making a class with one method and introducing a new file. I think I would prefer just dumping the decode method into the test class.

Hmm... looks like some of the targets have implemented decode() like C++ (ATNSerializer.h):

    virtual std::string decode(const std::wstring& data);

I've noticed the ATN is being serialized (call of ATNSerializer.serialize) both for .interp file and for serialized data for lexer/parser.

Ah. Hmm...let me pull in your branch and use a good difference tool instead of the website see what the changes really look like.

}
}
content.append("\n");

IntegerList serializedATN = ATNSerializer.getSerialized(g.atn, g.getLanguage());
// Uncomment if you'd like to write out histogram info on the numbers of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we should leave this in

@@ -29,7 +30,7 @@
public ParserFile parserFile(String fileName) { return null; }

@Override
public Parser parser(ParserFile file) { return null; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at first glance it looks strange that lexer() function below does not also need an ATN.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because lexer works a bit differently and gets atnData from somewhere else. I'll take a look if it's possible to unify the parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on this code: https://github.com/antlr/antlr4/blob/dev/tool/src/org/antlr/v4/codegen/OutputModelController.java#L140-L152 There is no call to lexer method from OutputModelController but call to constructor directly.


public Lexer(OutputModelFactory factory, LexerFile file) {
super(factory);
this.file = file; // who contains us?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like that's a useful comment

Copy link
Member Author

@KvanTTT KvanTTT Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe renamed file to containingFile instead of using such a comment? I prefer getting rid of comments if possible.


public Parser(OutputModelFactory factory, ParserFile file) {
super(factory);
this.file = file; // who contains us?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as lexer; a useful comment

@@ -63,13 +65,14 @@ public Recognizer(OutputModelFactory factory) {

ruleNames = g.rules.keySet();
rules = g.rules.values();
atn = new SerializedATN(factory, g.atn);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of flipping all of these IF to ?:. I don't think they are is easy to read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're doing all sorts of code revisions that are not really needed but are costing my time to review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a problem. You are a valuable contributor and a hardcore tech guy!! I just think maybe we need to synchronize better :) your instincts seem very good but I have to fit that into my limited focus time.

errMgr.toolError(ErrorType.CANNOT_WRITE_FILE, ioe);
}
}
IntegerList atnData = gencode && g.tool.getNumErrors()==0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I like passing this ATN everywhere. A widespread change for a worthy goal but not sure it's worth this particular solution. Why not simply move this code that writes out the interp file to the "if gencode" section? then we are reusing and not recomputing it right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it's not so simple. Code generator uses atnData deeply internally during generation. StringTemplate library uses the serialized field from SerializedATN.

image

Stack trace:

image

atnData for interp file is being calculated in Tool.java. It should be passed to the code generator somehow.

@parrt
Copy link
Member

parrt commented Mar 26, 2022

I will close this as it is superseded by the big PR I just pulled in.

@parrt parrt closed this Mar 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants