Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change between 2.1 and 2.4/2.5 #762

Closed
vincent-herlemont opened this issue Jan 6, 2023 · 5 comments · Fixed by #766
Closed

Breaking change between 2.1 and 2.4/2.5 #762

vincent-herlemont opened this issue Jan 6, 2023 · 5 comments · Fixed by #766

Comments

@vincent-herlemont
Copy link

Describe the bug

During this discussion Implicit expression · #761 I have discovered that there is a difference of behavior for the following syntax between pest 2.1 and pest 2.4/2.5, for now I don't know if it's the expected behavior or not. I open an issue to know how to decide this.

To Reproduce
For the following grammar:

program = _{ SOI ~ implicit ~ EOI  }
implicit= ${ or ~ (WHITESPACE+ ~ or )* }

or  = !{ and ~ (or_op ~ and)+ | and }
and = { comp ~ (and_op ~ comp)+ | comp }
comp = { array ~ eq_op ~ array | array }

array = ${ term }

term = _{ ASCII_ALPHANUMERIC+ }
or_op = { "||" }
and_op = { "&&" }
eq_op = { "=" }
WHITESPACE = _{ " " | "\t" | NEWLINE }

The input a a return this error for pest 2.4/2.5 ❌ :

 --> 1:3
  |
1 | a a
  |   ^---
  |
  = expected EOI, or_op, and_op, or eq_op

Success for 2.1 ✔️

Expected behavior

Run without error and return two array.

@tomtau
Copy link
Contributor

tomtau commented Jan 6, 2023

It may be a side-effect of some optimization work that has been done since 2.1, e.g. #473 ?

@tomtau
Copy link
Contributor

tomtau commented Jan 9, 2023

@vincent-herlemont I managed to reproduce it. I tried git bisect to find the commit where it stopped parsing and it turned out to be this one b720edc from that optimization PR #473.
If my understanding is correct, that optimization will try to rewrite rules of the form a ~ b | a to a ~ b? which seems reasonable on its own, but I guess it may be problematic in some contexts due to the greedy backtracking nature of parsing here.

@SkiFire13 @dragostis @CAD97 do you have any suggestions on how to proceed here -- should that optimization be reverted, or should it stay as it is and @vincent-herlemont could rewrite that grammar in some way (I guess one dummy workaround is to introduce extra rules, so that the optimization fails to be applied), or can you see an easy way to restrict that optimization?

@CAD97
Copy link
Contributor

CAD97 commented Jan 9, 2023

Transforming a ~ b | a to a ~ b? is incorrect when implicit whitespace is on, yeah. Generally the consistency/predicability of whitespace behavior is really not great.

With the implied whitespace made explicit, it's transforming

@{ a ~ WHITESPACE* ~ b | a }
// to
@{ a ~ WHITESPACE* ~ b? }
// which is incorrect, it should be
@{ a ~ (WHITESPACE* ~ b)? }

Probably the "correct" fix is to "desugar" all of the implicit whitespace rules before applying any optimization transformations.

tomtau pushed a commit to tomtau/pest that referenced this issue Jan 10, 2023
…ules

closes pest-parser#762

the full fix would require a bigger overhaul / refactoring of the optimizer
@tomtau
Copy link
Contributor

tomtau commented Jan 10, 2023

made a quick fix in #766
There are two follow ups:

  1. (if possible) writing up a fuzzer that will try to fuzz grammar+input, and check the parser results of the optimized and non-optimized rules match;
  2. fixing the optimizer: the way forward is probably what @dragostis mentioned before with egg and ruler, but that probably only works for a cleaned up / simpler meta grammar, i.e. pest3

tomtau added a commit that referenced this issue Jan 11, 2023
…ules (#766)

* fix: restrict the factorizer case to only atomic and compoundatomic rules

closes #762

the full fix would require a bigger overhaul / refactoring of the optimizer

* add alloc

Co-authored-by: Tomas Tauber <me@tomtau.be>
@vincent-herlemont
Copy link
Author

@tomtau Thanks for the fix :) !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants