Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Chumsky for parsing #2

Closed
wants to merge 3 commits into from
Closed

Conversation

clarkmcc
Copy link
Owner

@clarkmcc clarkmcc commented May 3, 2023

Working on a chumsky parser. Has some benefits:

  • Easier to read and modify than LALRPOP grammar
  • Much better error reporting and syntax assistance

@clarkmcc clarkmcc changed the title Started on chumsky parser Switch to Chumsky for parsing May 3, 2023
@hardbyte
Copy link
Contributor

👀 I've been doing something similar while learning Rust in https://github.com/hardbyte/common-expression-language if you want to collaborate/steal.

@clarkmcc
Copy link
Owner Author

@hardbyte Ooo very nice! I hit a wall on the chumsky parser and never got around to it again. I will definitely be doing a bit of stealing, or if you're interested, push a PR!

@hardbyte
Copy link
Contributor

Cool I'm very keen to combine efforts. Keep in mind I'm new at rust though!

First I had a go at getting your branch going - I wired up your chumsky parser to your CLI but evaluating an expression seemed to quickly take up all my ram.

Then I noticed that you were using the pre-release of Chumsky 1.0.0 so I thought a good first step would be porting my parser to the same version so I can hopefully copy some of the implementation across. There were a few api changes between 0.9.2 and 1.0.0-alpha.4 and I'd implemented my parser on char rather than &str.

I've updated my code with the glaring exception that my .collect() calls are no longer compiling... I've pushed what I have to hardbyte/common-expression-language#5 if you felt like taking a look.

Another option might be to use the latest stable version of Chumsky with my parser (adapted for your AST).

@clarkmcc
Copy link
Owner Author

clarkmcc commented Jul 29, 2023

That was the wall I hit, I think there's a left fold somewhere in the code for recursive expressions that shouldn't be used. I briefly discussed it with the chumsky author but I didn't get too far.

Yeah that final option seems like a good route to take. How much of the parsing is currently supported right now? I think I noticed at least one to-do item last I checked?

One other thing, were you able to correctly handle order of operations in chumsky? That was one other thing I couldn't get quite right.

@hardbyte
Copy link
Contributor

hardbyte commented Jul 31, 2023

Hmm hopefully together we can solve it - I think I've run into the same issue with an infinity recursive definition now. I've started porting across to your AST in hardbyte/common-expression-language#6 and think I've gotten over the infinite bug.

Most of the grammar is supported in my parser - haven't done in relation, FieldInits, conditionals or ternaries. I'm keen to try out the pratt parser that will be in the next official release of chumsky but figured a basic version that follows the grammar would be a good starting point.

@clarkmcc
Copy link
Owner Author

clarkmcc commented Aug 1, 2023

@hardbyte fantastic! I want to take a look at this but am a little preoccupied with #11. Will get back to this as soon as I can.

@hardbyte
Copy link
Contributor

hardbyte commented Aug 4, 2023

Fair enough!

I'll open an issue in this repo to track the next steps as it might not make sense to work from this branch to integrate.

@hardbyte
Copy link
Contributor

hardbyte commented Aug 4, 2023

Oh, did you intend to disable issues in this repo?

In any case, I've merged in those changes to my main branch and finished everything in the parser except FieldInits. I think the next steps are:

  • a small refactor PR in this repository to hide larlpop internals and expose a parser function and parser errors from your cel-parser crate?
  • a bigger PR to replace this one with the work I've done in my repo

How does that sound?

@clarkmcc
Copy link
Owner Author

clarkmcc commented Aug 4, 2023

Huh, weird not sure how issues got disabled. Sorry about that! I've re-enabled them. Yes, this makes good sense to me. At some point I'd also like to benchmark LALRPOP vs Chumsky. If the performance differences are notable enough, we might keep both parsers available behind feature flags where the advantage to Chumsky is better parsing error messages. But we can cross that bridge once we have parsing results. I can work up a benchmark for the parser in the mean time.

@clarkmcc
Copy link
Owner Author

clarkmcc commented Aug 4, 2023

On a totally unrelated note, I read through your readme and found the reference to the cel-go unit tests. I love the idea of testing against those same tests. I will definitely be looking at that in the near future!

@clarkmcc
Copy link
Owner Author

Alright, I've just released the latest project I was working on and now have some free cycles to look at this again. Where are we at on this, and what makes the most sense for me to work on?

@hardbyte
Copy link
Contributor

Awesome, I've written up everything in this issue #14

The short version is I think my chumsky based parser is now slightly ahead feature-wise to the existing lalrpop one, although I'm sure there will be corner cases to hunt down. I'll open a PR to replace this one - adding support for the chumsky parser rather than switching it out entirely. Once I can run the tests I'll post an update with any differences found.

Do you want to take on the benchmarking and/or testing with the cel-go data?

@clarkmcc
Copy link
Owner Author

Looking forward to reviewing! Yes, I can definitely take benchmark and tests.

@hardbyte hardbyte mentioned this pull request Aug 28, 2023
@clarkmcc clarkmcc closed this Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants