Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABNF grammar support #1017

Closed
wants to merge 7 commits into from
Closed

Conversation

t-higuchi
Copy link

This PR implements feature request #318

As proposed in #318, 'syntax' option need to be set to load ABNF grammar:
parser = Lark("... grammar ... ", syntax='abnf')

It accepts ABNF grammar described in RFC5234 and RFC7405.
Non-standard extension to ABNF is kept minimal. Only '%import' directive is implemented.

Please see an example (url_parser_abnf.py ) for ABNF-specific decorator usage and post-processing to parse tree.

Some implementation notes:

  • ABNFGrammar.compile() is implemented so that it outputs terminals and compiled_rules
    just as Grammar.compile() does for lark's EBNF grammar. This approach makes minimal modifications to existing code base.
  • A few modifications to existing code is needed since python complains if hyphens are used in ABNF rule names.

This fix is needed for ABNF grammar support.
Syntax:
 %import module
 %import module (rule1, rule2, ...)

Example:
 %import core-rules                ; import rules from lark/grammars/core-rules.abnf
 %import core-rules (CRLF, DIGITS) ; import specified rules (CRLF and DIGITS) only
@MegaIng
Copy link
Member

MegaIng commented Oct 12, 2021

I would personally prefer that if we add this stuff, we do it in a way that allows easy adding of new syntax variations in the future via registering a dialect with the grammar loader.

I think we don't need to copy over the Grammar object and the GrammarBuilder. Instead we should try to make an interface where a syntax dialect takes in a source files and outputs a EBNF tree that is compatible with what the default lark syntax would output

I have started an implementation of that, but I think I stopped because I had problems with detecting what is and what is not a terminal name (similar to what you are describing).

@erezsh
Copy link
Member

erezsh commented Oct 12, 2021

I agree, it's much better if non-lark grammars were written as a plugin that's implemented with a .lark file + transformer (e.g. abnf.lark) and returned the same tree that the load_grammar._parse_grammar returns.

Of course, we can refactor / modify whatever is necessary in load_grammar in order to make that possible.

@MegaIng What do you mean by "problems with detecting what is and what is not a terminal name" ?

@t-higuchi Good start! Don't worry, we'll help you get this PR in shape.

@MegaIng
Copy link
Member

MegaIng commented Oct 12, 2021

What do you mean by "problems with detecting what is and what is not a terminal name" ?

The rule uppercase is terminal might not be that applicable to all dialects, and some dialects might want to use other rules. That would mean that the original Transformers needs to tell the GrammarLoaders what is and what is no a Terminal. However, call to .isupper or similar are scattered across the code base (ok, it's like three places). However, I didn't find a good way of changing this situation. Maybe you can think of something.

@erezsh
Copy link
Member

erezsh commented Oct 12, 2021

@MegaIng We already replace terminal/rule names with Terminal/NonTerminal objects. We should just do it earlier, so that all isupper() happen inside _parse_grammar. That should work, no?

@MegaIng
Copy link
Member

MegaIng commented Oct 12, 2021

@erezsh Should. But I had problem because different places complained about getting Symbol objects instead of strings.

@erezsh
Copy link
Member

erezsh commented Oct 12, 2021

@MegaIng I don't mind doing that refactor myself. I'll try to get to it this week.

@t-higuchi
Copy link
Author

Okay, I will give a try in ABNF-to-EBNF converter approach.
I think most part of ABNFGrammarLoader could be reused to implement such a converter.

@t-higuchi
Copy link
Author

@erezsh @MegaIng Thank you for your support in refactoring. #1018 helped very much to rewrite and cleanup ABNF support.
I close this PR and move to #1022.

@t-higuchi t-higuchi closed this Oct 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants