Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing a(ba)* in 1.0.0-alpha.4 #478

Open
01mf02 opened this issue Jul 14, 2023 · 5 comments
Open

Parsing a(ba)* in 1.0.0-alpha.4 #478

01mf02 opened this issue Jul 14, 2023 · 5 comments

Comments

@01mf02
Copy link

01mf02 commented Jul 14, 2023

I am converting a parser from chumsky 0.9 to 1.0.0-alpha.4.
During this, I noticed a pattern that I found quite nontrivial to translate.
In particular, a parser for a(ba)*, where a and b are parsers that have the same output type T, and we want to obtain all parsed elements as Vec<T>. (It's like separated_by, but keeping the separators.)
In chumsky 0.9, this looked something like:

a.chain(b.chain(a).repeated().flatten()).collect()

In the new chumsky, however, there is no more chain.
After some struggling, I came up with the following equivalent in chumsky 1.0.0-alpha.4:

let head = a.map(|x| Vec::from([x]));
head.foldl(b.then(a).repeated(), |mut acc, (x, y)| {
    acc.push(x);
    acc.push(y);
    acc
})

This is significantly longer, and IMHO, much harder to understand. (My actual example even involves nested foldl
Is there some more canonical way to do this?

@01mf02
Copy link
Author

01mf02 commented Jul 14, 2023

More generally, I think that the general approach to parser iterators in 1.0.0-alpha.4 can be improved:

  • It would be awesome if we could actually use IterParser more like Iterator. That means, chaining iterators together, but perhaps some other methods like map, cloned etc. could be nice.1 The fact that IterParser implements enumerate and count, which are both also in Iterator, further suggests such a direction.
    A method like chumsky::iter::once could be used to lift an parser for a single value to a parser for multiple values (returning a sequence consisting of a single value).

  • It is currently hard to discover in the documentation which objects implement IterParser (the docs for the trait do not show it) --- frankly, right now I have no clue on what objects I can call IterParser::collect().

  • The documentation for repeated currently says:

    The output type of this parser can be any Container.

    Here, I would have expected that the output is IterParser (instead of Container). That would make things more like in Iterator in the sense that we know intuitively that we can collect() the result. (Whereas a user new to chumsky reading this first has to learn what a Container is.)

Here's how I would love to do things:

use chumsky::iter::once;
once(a).chain(once(b).chain(once(a)).repeated().flatten()).collect()

Footnotes

  1. Rayon does something similar with the ParallelIterator trait.

@01mf02
Copy link
Author

01mf02 commented Jul 14, 2023

Yet another thought: or_not returns an Option. It would be nice if we could somehow translate a Parser<_, Option<T>, _> into an IterParser<..., T, ...>. Like Option::into_iter.

That would help parsing a?b*, where both a and b are parsers with the output T:

a.or_not().into_iter().chain(b.repeated())

@01mf02
Copy link
Author

01mf02 commented Jul 14, 2023

Yet another idea: It might be a nice idea to be able to convert Iterators to IterParsers, such that yielding the next element would not perform any parsing. This would again strengthen the relationship between the two traits.

@zesterer
Copy link
Owner

Hello!

In the new chumsky, however, there is no more chain.

In 0.9, chain was usually used to collect up characters/bytes into a string, but we now have the slice/map_slice combinators that are much more efficient and ergonomic for things like this. Of course, your use-case is not this, and it's unfortunate that this case has become a bit more difficult. As you say though, there are still solutions - although they are unfortunately a bit more noisy.

More generally, I think that the general approach to parser iterators in 1.0.0-alpha.4 can be improved

I definitely agree! There's an open issue for this that you might find interesting. It would allow you to do something like a.then(b) and have the resulting parser implement IterParser<_, O> if both a and b also implement IterParser<_, O>. The same would also generalise to .or_not() and other such parsers that might be interpreted as IterParsers.

Obviously, this represent a pretty substantial reworking of the crate internals, and this is something I've not yet had time to make progress on, sadly.

If you have other ideas that you don't think are mentioned on the above issue, I'd definitely love to see suggestions added there!

@zesterer
Copy link
Owner

Going to close this since #425 pretty much covers the requested features in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants