How to `take_until` another parser matches? #39

mainrs · 2021-12-03T16:57:05Z

mainrs
Dec 3, 2021

My input looks like this: { some free-form text that might contain hashes #tag1,#tag2 }. The first element is some unknown text that can contain basically anything besides the closing bracket. The last part is a list of tags. I was wondering if this is possible to parse with chumsky. I'd say yes, since it should still be LL(k), k being the number of characters of the tag list. But I can't make a parser that parses it.

Having the # be a special character would make it easier, as I can use filter in that case.

This is what I have right now (the parser for the input is the attributed_string function):

fn is_identifier(c: &char) -> bool {
    c.is_ascii_lowercase() || c.is_ascii_digit() || *c == '-'
}

fn inline_whitespace() -> impl Parser<char, (), Error = Simple<char>> + Clone {
    filter(|c: &char| *c == ' ' || *c == '\t')
        .repeated()
        .ignored()
}

fn tag_list() -> impl Parser<char, Vec<String>, Error = Simple<char>> {
    let tag = just('#').ignore_then(
        filter(is_identifier)
            .repeated()
            .collect::<String>()
            .labelled("metadata tag"),
    );

    tag.clone()
        .chain(just(',').ignore_then(tag.clone()).repeated())
        .or_not()
        .flatten()
        .labelled("metadata tags")
}

fn attributed_string() -> impl Parser<char, TextNode, Error = Simple<char>> {
    just('{')
        .ignore_then(
            filter(|c: &char| *c != '#').repeated()
                .collect::<String>()
                .then_ignore(inline_whitespace())
                .then(tag_list())
                .padded_by(inline_whitespace()),
        )
        
        .then_ignore(just('}'))
        .map(|(text, tags)| TextNode::WithMetadata(text, tags))
}

However, this does not allow hashtags to be part of the text.

zesterer · 2021-12-03T18:09:00Z

zesterer
Dec 3, 2021
Maintainer

This is a tough one!

It looks like your syntax is straying surprisingly close to ambiguity. Part of me wonders whether it might be easier to have the section between the braces be parsed from right to left, although I don't think this is quite necessary. That said, Chumsky is inherently a left-to-right parser, so potential solutions might not be efficient in pessimistic cases (a single extra character after the tags would suddenly require that the parser backtracks through all of the tags again!).

You might be interested in this PR that added a .rewind() combinator. I haven't looked into it enough to fully understand its potential, but it seems like this might be a case that can be handled with it.

I'm interested to hear how you get on with it: if nothing else, this might make for a good doc example that I can add for the next release!

2 replies

mainrs Dec 3, 2021
Author

The rewind method does indeed look like it might help me out! I'll try it out with the latest main branch.

That said, Chumsky is inherently a left-to-right parser, so potential solutions might not be efficient in pessimistic cases (a single extra character after the tags would suddenly require that the parser backtracks through all of the tags again!).

In theory, a custom parser that greedly consumes anything besides the closing character } could be used. The matching string then gets reversed and used as input for another parser. I would then match the tags first and the rest would be the free-form text. The results would have to be reversed again. This only works in my case as the syntax doesn't include keywords and the like.

zesterer Dec 3, 2021
Maintainer

Yes. Another alternative is that you consume until }, then split by whitespace and take all sub-strings from the end that fit the pattern of a tag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to `take_until` another parser matches? #39

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How to take_until another parser matches? #39

mainrs Dec 3, 2021

Replies: 1 comment · 2 replies

zesterer Dec 3, 2021 Maintainer

mainrs Dec 3, 2021 Author

zesterer Dec 3, 2021 Maintainer

How to `take_until` another parser matches? #39

mainrs
Dec 3, 2021

Replies: 1 comment 2 replies

zesterer
Dec 3, 2021
Maintainer

mainrs Dec 3, 2021
Author

zesterer Dec 3, 2021
Maintainer