Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

or combinator prefers output of second parser, even if first parser succeeds #567

Closed
dexterdy opened this issue Nov 15, 2023 · 14 comments
Closed

Comments

@dexterdy
Copy link

In the code snippet below, I am trying to parse a tree structure, where something can be either a node (an html string, a piece of template code and then another html string) or a leaf (just an html string). I try to parse a node first and if that fails try parsing a leaf, but the or combinator seems to prefer the output of the leaf parser, even if the node parser succeeds. I tested this by removing the or combinator and testing the node parser on the exact same input, which does produce the desired result:

fn html<T: Parser<Token, AST, Error = Simple<Token>> + Clone>(
    ast_parser: T,
) -> impl Parser<Token, Node, Error = Simple<Token>> {
    let html_frag = select! { Token::HtmlFragment(x) => x };
    let node = html_frag
        .clone()
        .then(ast_parser)
        .then(html_frag.clone())
        .map(|((before, middle), after)| Html::Node(before, middle, after));
    let leaf = html_frag.map(|str| Html::Leaf(str));
    node.or(leaf).map(|html| Node::Html(html))
}
Test code and result
#[test]
fn ast_simple() {
    let simple_html = "\
<html>
<body>
<span>
{{ x }}
</span>
</body>
</html>\
";
    let first_part = "\
<html>
<body>
<span>
"
    .to_string();
    let last_part = "
</span>
</body>
</html>\
"
    .to_string();
    let tokenized = token_parser().parse(simple_html).unwrap();
    let ast = ast_parser().parse(tokenized);
    assert_eq!(
        ast,
        Ok(vec![Node::Html(Html::Node(
            first_part,
            vec![Node::Expression(vec![StackToken::Ident(vec![
                IdentSpecifiers::Ident('x'.to_string())
            ])])],
            last_part
        ))])
    )
}

assertion `left == right` failed
  left: Ok([Html(Leaf("<html>\n<body>\n<span>\n")), Expression([Ident([Ident("x")])]), Html(Leaf("\n</span>\n</body>\n</html>"))])
 right: Ok([Html(Node("<html>\n<body>\n<span>\n", [Expression([Ident([Ident("x")])])], "\n</span>\n</body>\n</html>"))])
@dexterdy
Copy link
Author

for clarity: removing .or(leaf) makes the test succeed. Expected is that the test succeeds with the code unchanged

@zesterer
Copy link
Owner

Do you have a more complete example that I can run locally? I don't see anything wrong here, but the behaviour of ast_parser could be thing that's producing the output you see.

@dexterdy
Copy link
Author

ast.txt

This is the src file for tokenizer and ast.

@zesterer
Copy link
Owner

I've not had time to properly look at this yet, but have you considered moving to 1.0? If you don't have all that much work done using the 0.9 API, you'll find that 1.0 is far more expressive, performant, and more likely free of such odd issues.

@dexterdy
Copy link
Author

That depends on how stable the new API is currently. If it's unlikely to change significantly, I'm willing to try it, but otherwise id rather stay with 0.9

@dexterdy
Copy link
Author

dexterdy commented Nov 16, 2023

Would you recommend the alpha release or the GitHub main branch?

@zesterer
Copy link
Owner

That depends on how stable the new API is currently.

1.0 is almost ready. There is more work to be done, but it's largely around docs & minor cleanups (see #543). In all likelihood, the worst breaking change you might expect from here on out might be something like a method being renamed: the API as it sits today will remain pretty much unchanged come 1.0 stable, all of the core concepts & features are in place and aren't going anywhere.

Would you recommend the alpha release or the GitHub main branch?

1.0.0-alpha.6 is probably the best one to go for since it's a non-moving target (although it's effectively no different to main in practice).

@dexterdy
Copy link
Author

is there an easy way to recreate the take_until function from 0.9.3 in 1.0.0 alpha 6?

@zesterer
Copy link
Owner

Yes, check out and_is and not. You can use them in combination like this:

any().and_is(ending.not()).repeated()

Where ending is the terminating pattern.

@dexterdy
Copy link
Author

dexterdy commented Dec 10, 2023

The bug persists after I upgraded to chumsky alpha 6

@dexterdy
Copy link
Author

Could It have something to do with the fact that ast_parser is a recursive parser that in its body calls the html parser again?

@Zij-IT
Copy link
Contributor

Zij-IT commented Dec 11, 2023

It's not a bug in Chumsky, but with your parser. If you put a map(|x| dbg!(x)) immediately after the then_ast line in your node parser, you will see the following output:

[src/main.rs:267] x = (
    "<html>\n<body>\n<span>\n",
    [
        Expression(
            [
                Ident(
                    [
                        Ident(
                            "x",
                        ),
                    ],
                ),
            ],
        ),
        Html(
            Leaf(
                "\n</span>\n</body>\n</html>",
            ),
        ),
    ],
)

From this we can determine that your ast_paser is consuming the final leaf, meaning that the final html_frag will fail to parse. Because the final html_frag fails to parse, the node fragment fails. This in turn leads your parser to parse the singular html_frag as a leaf.

@dexterdy
Copy link
Author

damn you're right. Thank you!

@dexterdy
Copy link
Author

dexterdy commented Dec 12, 2023

instead of ast_parser, I can use the following to make it work for anyone who has the same issue as me in the future (unlikely)

let ast_non_html = choice((block(ast_parser), flat()))
        .repeated()
        .collect::<AST>();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants