Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Add support for GBNF inline comment #125

Merged
merged 1 commit into from
Dec 15, 2024
Merged

Conversation

observerw
Copy link
Contributor

Current grammar_parser.cc can't handle inline comment correctly:

root ::= a # inline comment
a ::= "a"

Raise following error:

EBNF parse error at line 2, column 3: Expect element

Fixed by preventing inline comments from consuming line breaks.

P.S. How about integrating llama.cpp's GBNF parsing implementation? Current implementation doesn't seems to be robust enough.

Copy link
Collaborator

@Ubospica Ubospica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @observerw!

Do you have any other specific examples where the current parser lacks robustness or fails to perform well? While we currently use GBNF grammar, we may consider extending it in the future to support additional features (e.g., inline regex, special lexer tokens, etc.). Therefore, at the moment we still want to continuously enhance this parser and add new features to it.

@observerw
Copy link
Contributor Author

LGTM, thanks @observerw!

Do you have any other specific examples where the current parser lacks robustness or fails to perform well? While we currently use GBNF grammar, we may consider extending it in the future to support additional features (e.g., inline regex, special lexer tokens, etc.). Therefore, at the moment we still want to continuously enhance this parser and add new features to it.

I haven't fully tested it yet, and as far as I've observed, there are some minor issues such as the inability to include escape symbols in character groups:

root ::= [\[\]]

Will raise Invalid escape sequence; and left recursion problem (#126).

These issues can be fixed with a few changes and the addition of unit tests (I can help with that!), but I think extra effort is needed to find and fix all of these issue, as well as compatible with GBNF standard (considering that the GBNF syntax was created by the llama.cpp, any changes they make will be the gold reference). Customizability seems to be an important consideration, maybe can be addressed by improving on a fork version?

BTW thank you for your great project! I'm looking forward to doing some research about formal language generation using xgrammar 😄

@Ubospica
Copy link
Collaborator

@observerw, thanks for the detailed feedback and suggestions! The issues you mentioned will be prioritized for fixes along with added unit tests.

Regarding adopting llama.cpp's GBNF parser, we recognize its robustness, but we’ve chosen to maintain our own parser to allow greater flexibility and extensibility. Maintaining our own parser also allows us to better tailor it to xgrammar's goals, such as supporting more diverse structures (beyond context-free grammars), and fosters community collaboration to address evolving needs.

That said, we’ll continue tracking updates to the GBNF standard and ensure compatibility. Your insights are greatly appreciated, and if you’d like to contribute unit tests or fixes, we’d be more than happy to collaborate. Thank you for your support, and we’re excited to see your research on formal language generation! 😊

@Ubospica Ubospica merged commit 1509eac into mlc-ai:main Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants