Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

greedy lexer definition #57

Open
mmoskal opened this issue Nov 19, 2024 · 2 comments
Open

greedy lexer definition #57

mmoskal opened this issue Nov 19, 2024 · 2 comments

Comments

@mmoskal
Copy link
Collaborator

mmoskal commented Nov 19, 2024

For greedy lexer /aaa/, /a+b+z/, /b+x/ we will not be able to sample string aaabbx even though it would be in some sense "valid".

We start with all three lexemes as possible, after first a the /b+x/ is impossible, and after the first b we also reject /aaa/, and the remaining doesn't match /a+b+z/.

We require the lexer to be able to determine lexeme boundaries based on a single byte lookahead.

Now, it seems it wouldn't be a problem for a typical programming language lexer with /if/, /while/, etc and an identifier rule /[a-z]+/, so it's probably OK.

Also, for a lexeme like /"[^"]+"/ it doesn't matter if it's greedy or lazy, since it's "self-limiting"; this probably applies to lots of lexers.

The issue is to document all this, and possibly (unlikely) change the behaviour.

Thank you @v-jkegler for finding the issue

cc @hudson-ai

@mmoskal
Copy link
Collaborator Author

mmoskal commented Nov 21, 2024

@v-jkegler says that in Marpa parser he keeps the most recent lexeme match; thus when we get to x which would normally fail, we can just fall back to the cached match, scan it, and then re-run the lexer on bbx; if there was no /b+x/ lexeme, this would be an error (correctly placed at x, since if x was z instead, we would be OK) and otherwise we're good.

@v-jkegler
Copy link
Contributor

And there won't be an error, since you use the Ruby Slippers to constrain your generated input to valid inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants