Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow recursive pattern #463

Closed
Spu7Nix opened this issue Jun 22, 2020 · 3 comments
Closed

Very slow recursive pattern #463

Spu7Nix opened this issue Jun 22, 2020 · 3 comments

Comments

@Spu7Nix
Copy link

Spu7Nix commented Jun 22, 2020

Parsing this expr(ession)
a(b,{a(b,{a(b,{a(b,{a(b,{a(b,{a(b,{a(b,{})})})})})})})})
with this grammar

WHITESPACE = _{ " " | "\t"}
cmp_expr = {"{" ~  expr* ~ "}"}
value = _{cmp_expr | symbol}
call = {value ~ (arguments)*}
arguments = { "(" ~ (expr ~ ",")* ~ expr? ~ ")" }
expr = {(call ~ operator)* ~ call}
operator = {"+"}
symbol = ${ASCII_ALPHA+}

is extremely slow (takes multiple minutes, and gets exponentially slower as you add more recursion).

@SkiFire13
Copy link
Contributor

SkiFire13 commented Aug 20, 2020

When parsing expr your rule must parse call then operator and if it fails it has to parse call again. This means that for each level the number of parsing of call rules doubles, hence the exponential time. I'm not sure if this can be optimized in the parser, I guess it could I opened a PR for optimizing expr-like rules, but it can't handle arguments-like rules. For now you may want to rewrite your rules to move the optional rules at the end. The following rules should be equivalent to yours and run in linear time.

WHITESPACE = _{ " " | "\t"}
cmp_expr = {"{" ~  expr* ~ "}"}
value = _{cmp_expr | symbol}
call = {value ~ (arguments)*}
arguments = { "(" ~ (expr ~ ("," ~ expr)* ~ ","?)? ~ ")" }
expr = {call ~ (operator ~ call)*}
operator = {"+"}
symbol = ${ASCII_ALPHA+}

@sbeckeriv
Copy link

Thank you for this issue. it was a very helpful example of a recursive pattern. You have helped me get unstuck!

bors bot added a commit that referenced this issue Sep 23, 2020
473: Add more optimizations r=dragostis a=SkiFire13

Adds a couple more optimizations:

1. Converts `(rule ~ rest) | rule` to `rule ~ rest?`, avoiding trying to match `rule` twice. Same as the already existing `factor` but in the case the second expression is not `Expr::Seq`.

2. Converts `rule | (rule ~ rest)` to `rule` since `(rule ~ rest)` will never match if `rule` didn't. Not sure if this should go in `factorizer.rs` but the pattern looked really similar.

3. Converts `(rule ~ rest)* ~ rule` to `rule ~ (rest ~ rule)*`, avoiding matching the last `rule` twice, should have a big impact on issues like #453 and #463. Goes to a new file names `lister.rs`, not sure if the name is descriptive enough.

Actually there's a 4th optimization I thought of, converting `(rule ~ rest)* ~ rule?` to `rule ~ (rest ~ rule)* ~ rest?`, assuming `rule` is more expensive than `rest` but I don't think this can be determined if one of them contain an `Expr::Ident`. This would have the same effect of 3. on the relevant cases.

Co-authored-by: Giacomo Stevanato <giaco.stevanato@gmail.com>
bors bot added a commit that referenced this issue Nov 1, 2020
473: Add more optimizations r=dragostis a=SkiFire13

Adds a couple more optimizations:

1. Converts `(rule ~ rest) | rule` to `rule ~ rest?`, avoiding trying to match `rule` twice. Same as the already existing `factor` but in the case the second expression is not `Expr::Seq`.

2. Converts `rule | (rule ~ rest)` to `rule` since `(rule ~ rest)` will never match if `rule` didn't. Not sure if this should go in `factorizer.rs` but the pattern looked really similar.

3. Converts `(rule ~ rest)* ~ rule` to `rule ~ (rest ~ rule)*`, avoiding matching the last `rule` twice, should have a big impact on issues like #453 and #463. Goes to a new file names `lister.rs`, not sure if the name is descriptive enough.

Actually there's a 4th optimization I thought of, converting `(rule ~ rest)* ~ rule?` to `rule ~ (rest ~ rule)* ~ rest?`, assuming `rule` is more expensive than `rest` but I don't think this can be determined if one of them contain an `Expr::Ident`. This would have the same effect of 3. on the relevant cases.

Co-authored-by: Giacomo Stevanato <giaco.stevanato@gmail.com>
bors bot added a commit that referenced this issue Apr 12, 2021
473: Add more optimizations r=MarinPostma a=SkiFire13

Adds a couple more optimizations:

1. Converts `(rule ~ rest) | rule` to `rule ~ rest?`, avoiding trying to match `rule` twice. Same as the already existing `factor` but in the case the second expression is not `Expr::Seq`.

2. Converts `rule | (rule ~ rest)` to `rule` since `(rule ~ rest)` will never match if `rule` didn't. Not sure if this should go in `factorizer.rs` but the pattern looked really similar.

3. Converts `(rule ~ rest)* ~ rule` to `rule ~ (rest ~ rule)*`, avoiding matching the last `rule` twice, should have a big impact on issues like #453 and #463. Goes to a new file names `lister.rs`, not sure if the name is descriptive enough.

Actually there's a 4th optimization I thought of, converting `(rule ~ rest)* ~ rule?` to `rule ~ (rest ~ rule)* ~ rest?`, assuming `rule` is more expensive than `rest` but I don't think this can be determined if one of them contain an `Expr::Ident`. This would have the same effect of 3. on the relevant cases.

Co-authored-by: Giacomo Stevanato <giaco.stevanato@gmail.com>
@f0i
Copy link

f0i commented Jun 21, 2022

I had a similar performance issue with rules of the form x | (rule ~ rest1) | (rule ~ rest2) | rule (source) that was fixed by changing to x | rule ~ (rest1 | rest2)?.
I guess #473 only optimizes the last two.

One more issue that caused exponential parsing time was in the form of a? ~ b? ~ rule ~ rest | rule (source) that I converted to (a ~ b? | b) ~ rule ~ rest | rule ~ rest?.

It is quite hard to find those when there is some nesting and indirection.
So I was also wondering if the results could be cached, so that it doesn't have to re-evaluate a rule when it's at the same position inside the input.

@tomtau tomtau converted this issue into discussion #636 Jul 10, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants