Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Construct a Lark grammar from ABNF format (RFC 5234) #318

Open
Zac-HD opened this issue Feb 4, 2019 · 5 comments
Open

Construct a Lark grammar from ABNF format (RFC 5234) #318

Zac-HD opened this issue Feb 4, 2019 · 5 comments

Comments

@Zac-HD
Copy link
Contributor

Zac-HD commented Feb 4, 2019

RFC 5234 describes the standard grammar format for internet standards, such as the notoriously-hard-to-validate email addresses.

Because this standard is a dialect of EBNF and does not allow for embedded code, it should be relatively easy to construct a Lark object for a given ABNF grammar - at least easier than converting from Nearley! Hopefully it's easy enough that runtime conversion in a new Lark.from_abnf method (or group of methods) would be practical.

This feature request is based on HypothesisWorks/hypothesis#170, where I eventually realized that parsing ABNF was going to be easier as well as more widely useful upstream. I'd be happy to work on this with some guidance about where to start, and have already translated the grammar of ABNF from ABNF to Lark's format.

@erezsh
Copy link
Member

erezsh commented Feb 4, 2019

I think it's a nice idea. I think your best bet is to create a Lark parser that reproduces the output of the GrammarLoader parser : https://github.com/lark-parser/lark/blob/master/lark/load_grammar.py#L663

You might have to do some post-processing (for example, in a Transformer) to make them a perfect fit.

Once you have that working, I'll add an interface to plug it in there when called with ABNF grammar.

While using Lark.from_abnf isn't bad (and can be complemented with Lark.open_abnf), how about doing this instead?

parser = Lark("... grammar ... ", syntax='abnf')

That opens the door to adding other formats in the future.

@davaya
Copy link
Contributor

davaya commented May 22, 2019

The output of GrammarLoader is an instance of the Grammar class with rule_defs, term_defs and ignore variables. Is there a "GrammarSave" function that takes a Grammar instance and produces a Lark file or returns a string?

It will be necessary to convert an ABNF file to a Lark file, because there are inevitably features of Lark that are not supported in ABNF. The workflow would be 1) convert ABNF to a Lark file, then 2) tweak the Lark file to achieve the desired results. The GrammarSave function is needed both to develop an ABNF loader in the first place and to do grammar optimization once it is available.

@erezsh
Copy link
Member

erezsh commented May 22, 2019

@davaya

I don't understand what the difficulty is, and what you're trying to do to solve it.

convert an ABNF file to a Lark file

Why does it have to go through files?

there are inevitably features of Lark that are not supported in ABNF

So give Lark a default (like an empty list, or whatever if appropriate)

@davaya
Copy link
Contributor

davaya commented May 23, 2019

If there is not a two-way lossless conversion ABNF <-> Lark, then something is lost in translation. Some feature of Lark, e.g., tree shaping, simply cannot be expressed at all in ABNF. If a developer wishes to use that feature, then supporting ABNF in GrammarLoader is not a complete solution.

Instead, the developer will take ABNF as a starting point. Reading an ABNF in GrammarLoader, then saving the grammar to a Lark file, allows the developer edit that file to add features to the grammar. If the updated grammar is saved in ABNF format, those features will be lost.

Conformance to an ABNF specification is validated based solely on the ABNF. But there can be multiple implementations of an ABNF specification, some cleaner than others. Any feature that changes the AST to make it easier to use, but does not change the data on the wire, is a reason to be able to save an ABNF grammar in Lark format.

@erezsh
Copy link
Member

erezsh commented May 23, 2019

simply cannot be expressed at all in ABNF

In that situation, it's common to add new language features that don't break the old one. For example a new operator, that works in lark and not ABNF, but you don't have to use it.

saving the grammar to a Lark file, allows the developer edit that file to add features to the grammar

That seems a bit cumbersome. Why not just a have an translator from Extended-ABNF into ABNF? It should be fairly easy, just removing and canonizing some nodes, and then writing it back. It's simple enough that Lark's reconstructor might even be able to handle it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants