or combinator prefers output of second parser, even if first parser succeeds #567

dexterdy · 2023-11-15T11:36:03Z

In the code snippet below, I am trying to parse a tree structure, where something can be either a node (an html string, a piece of template code and then another html string) or a leaf (just an html string). I try to parse a node first and if that fails try parsing a leaf, but the or combinator seems to prefer the output of the leaf parser, even if the node parser succeeds. I tested this by removing the or combinator and testing the node parser on the exact same input, which does produce the desired result:

fn html<T: Parser<Token, AST, Error = Simple<Token>> + Clone>(
    ast_parser: T,
) -> impl Parser<Token, Node, Error = Simple<Token>> {
    let html_frag = select! { Token::HtmlFragment(x) => x };
    let node = html_frag
        .clone()
        .then(ast_parser)
        .then(html_frag.clone())
        .map(|((before, middle), after)| Html::Node(before, middle, after));
    let leaf = html_frag.map(|str| Html::Leaf(str));
    node.or(leaf).map(|html| Node::Html(html))
}

Test code and result

#[test]
fn ast_simple() {
    let simple_html = "\
<html>
<body>
<span>
{{ x }}
</span>
</body>
</html>\
";
    let first_part = "\
<html>
<body>
<span>
"
    .to_string();
    let last_part = "
</span>
</body>
</html>\
"
    .to_string();
    let tokenized = token_parser().parse(simple_html).unwrap();
    let ast = ast_parser().parse(tokenized);
    assert_eq!(
        ast,
        Ok(vec![Node::Html(Html::Node(
            first_part,
            vec![Node::Expression(vec![StackToken::Ident(vec![
                IdentSpecifiers::Ident('x'.to_string())
            ])])],
            last_part
        ))])
    )
}

assertion `left == right` failed
  left: Ok([Html(Leaf("<html>\n<body>\n<span>\n")), Expression([Ident([Ident("x")])]), Html(Leaf("\n</span>\n</body>\n</html>"))])
 right: Ok([Html(Node("<html>\n<body>\n<span>\n", [Expression([Ident([Ident("x")])])], "\n</span>\n</body>\n</html>"))])

dexterdy · 2023-11-15T11:37:10Z

for clarity: removing .or(leaf) makes the test succeed. Expected is that the test succeeds with the code unchanged

zesterer · 2023-11-15T14:18:46Z

Do you have a more complete example that I can run locally? I don't see anything wrong here, but the behaviour of ast_parser could be thing that's producing the output you see.

dexterdy · 2023-11-15T14:53:04Z

ast.txt

This is the src file for tokenizer and ast.

zesterer · 2023-11-16T12:19:14Z

I've not had time to properly look at this yet, but have you considered moving to 1.0? If you don't have all that much work done using the 0.9 API, you'll find that 1.0 is far more expressive, performant, and more likely free of such odd issues.

dexterdy · 2023-11-16T12:30:19Z

That depends on how stable the new API is currently. If it's unlikely to change significantly, I'm willing to try it, but otherwise id rather stay with 0.9

dexterdy · 2023-11-16T12:31:20Z

Would you recommend the alpha release or the GitHub main branch?

zesterer · 2023-11-16T22:58:21Z

That depends on how stable the new API is currently.

1.0 is almost ready. There is more work to be done, but it's largely around docs & minor cleanups (see #543). In all likelihood, the worst breaking change you might expect from here on out might be something like a method being renamed: the API as it sits today will remain pretty much unchanged come 1.0 stable, all of the core concepts & features are in place and aren't going anywhere.

Would you recommend the alpha release or the GitHub main branch?

1.0.0-alpha.6 is probably the best one to go for since it's a non-moving target (although it's effectively no different to main in practice).

dexterdy · 2023-11-18T15:50:56Z

is there an easy way to recreate the take_until function from 0.9.3 in 1.0.0 alpha 6?

zesterer · 2023-11-19T14:09:04Z

Yes, check out and_is and not. You can use them in combination like this:

any().and_is(ending.not()).repeated()

Where ending is the terminating pattern.

dexterdy · 2023-12-10T18:53:16Z

The bug persists after I upgraded to chumsky alpha 6

dexterdy · 2023-12-10T18:56:47Z

Could It have something to do with the fact that ast_parser is a recursive parser that in its body calls the html parser again?

Zij-IT · 2023-12-11T08:25:50Z

It's not a bug in Chumsky, but with your parser. If you put a map(|x| dbg!(x)) immediately after the then_ast line in your node parser, you will see the following output:

[src/main.rs:267] x = (
    "<html>\n<body>\n<span>\n",
    [
        Expression(
            [
                Ident(
                    [
                        Ident(
                            "x",
                        ),
                    ],
                ),
            ],
        ),
        Html(
            Leaf(
                "\n</span>\n</body>\n</html>",
            ),
        ),
    ],
)

From this we can determine that your ast_paser is consuming the final leaf, meaning that the final html_frag will fail to parse. Because the final html_frag fails to parse, the node fragment fails. This in turn leads your parser to parse the singular html_frag as a leaf.

dexterdy · 2023-12-12T13:42:24Z

damn you're right. Thank you!

dexterdy · 2023-12-12T13:48:28Z

instead of ast_parser, I can use the following to make it work for anyone who has the same issue as me in the future (unlikely)

let ast_non_html = choice((block(ast_parser), flat()))
        .repeated()
        .collect::<AST>();

dexterdy closed this as completed Dec 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

or combinator prefers output of second parser, even if first parser succeeds #567

or combinator prefers output of second parser, even if first parser succeeds #567

dexterdy commented Nov 15, 2023

dexterdy commented Nov 15, 2023

zesterer commented Nov 15, 2023

dexterdy commented Nov 15, 2023

zesterer commented Nov 16, 2023

dexterdy commented Nov 16, 2023

dexterdy commented Nov 16, 2023 •

edited

Loading

zesterer commented Nov 16, 2023

dexterdy commented Nov 18, 2023

zesterer commented Nov 19, 2023

dexterdy commented Dec 10, 2023 •

edited

Loading

dexterdy commented Dec 10, 2023

Zij-IT commented Dec 11, 2023 •

edited

Loading

dexterdy commented Dec 12, 2023

dexterdy commented Dec 12, 2023 •

edited

Loading

or combinator prefers output of second parser, even if first parser succeeds #567

or combinator prefers output of second parser, even if first parser succeeds #567

Comments

dexterdy commented Nov 15, 2023

dexterdy commented Nov 15, 2023

zesterer commented Nov 15, 2023

dexterdy commented Nov 15, 2023

zesterer commented Nov 16, 2023

dexterdy commented Nov 16, 2023

dexterdy commented Nov 16, 2023 • edited Loading

zesterer commented Nov 16, 2023

dexterdy commented Nov 18, 2023

zesterer commented Nov 19, 2023

dexterdy commented Dec 10, 2023 • edited Loading

dexterdy commented Dec 10, 2023

Zij-IT commented Dec 11, 2023 • edited Loading

dexterdy commented Dec 12, 2023

dexterdy commented Dec 12, 2023 • edited Loading

dexterdy commented Nov 16, 2023 •

edited

Loading

dexterdy commented Dec 10, 2023 •

edited

Loading

Zij-IT commented Dec 11, 2023 •

edited

Loading

dexterdy commented Dec 12, 2023 •

edited

Loading