Very slow recursive pattern #463

Spu7Nix · 2020-06-22T12:45:13Z

Parsing this expr(ession)
a(b,{a(b,{a(b,{a(b,{a(b,{a(b,{a(b,{a(b,{})})})})})})})})
with this grammar

WHITESPACE = _{ " " | "\t"}
cmp_expr = {"{" ~  expr* ~ "}"}
value = _{cmp_expr | symbol}
call = {value ~ (arguments)*}
arguments = { "(" ~ (expr ~ ",")* ~ expr? ~ ")" }
expr = {(call ~ operator)* ~ call}
operator = {"+"}
symbol = ${ASCII_ALPHA+}

is extremely slow (takes multiple minutes, and gets exponentially slower as you add more recursion).

The text was updated successfully, but these errors were encountered:

SkiFire13 · 2020-08-20T10:13:18Z

When parsing expr your rule must parse call then operator and if it fails it has to parse call again. This means that for each level the number of parsing of call rules doubles, hence the exponential time. ~~I'm not sure if this can be optimized in the parser, I guess it could~~ I opened a PR for optimizing expr-like rules, but it can't handle arguments-like rules. For now you may want to rewrite your rules to move the optional rules at the end. The following rules should be equivalent to yours and run in linear time.

WHITESPACE = _{ " " | "\t"}
cmp_expr = {"{" ~  expr* ~ "}"}
value = _{cmp_expr | symbol}
call = {value ~ (arguments)*}
arguments = { "(" ~ (expr ~ ("," ~ expr)* ~ ","?)? ~ ")" }
expr = {call ~ (operator ~ call)*}
operator = {"+"}
symbol = ${ASCII_ALPHA+}

sbeckeriv · 2020-08-26T06:27:12Z

Thank you for this issue. it was a very helpful example of a recursive pattern. You have helped me get unstuck!

473: Add more optimizations r=dragostis a=SkiFire13 Adds a couple more optimizations: 1. Converts `(rule ~ rest) | rule` to `rule ~ rest?`, avoiding trying to match `rule` twice. Same as the already existing `factor` but in the case the second expression is not `Expr::Seq`. 2. Converts `rule | (rule ~ rest)` to `rule` since `(rule ~ rest)` will never match if `rule` didn't. Not sure if this should go in `factorizer.rs` but the pattern looked really similar. 3. Converts `(rule ~ rest)* ~ rule` to `rule ~ (rest ~ rule)*`, avoiding matching the last `rule` twice, should have a big impact on issues like #453 and #463. Goes to a new file names `lister.rs`, not sure if the name is descriptive enough. Actually there's a 4th optimization I thought of, converting `(rule ~ rest)* ~ rule?` to `rule ~ (rest ~ rule)* ~ rest?`, assuming `rule` is more expensive than `rest` but I don't think this can be determined if one of them contain an `Expr::Ident`. This would have the same effect of 3. on the relevant cases. Co-authored-by: Giacomo Stevanato <giaco.stevanato@gmail.com>

473: Add more optimizations r=MarinPostma a=SkiFire13 Adds a couple more optimizations: 1. Converts `(rule ~ rest) | rule` to `rule ~ rest?`, avoiding trying to match `rule` twice. Same as the already existing `factor` but in the case the second expression is not `Expr::Seq`. 2. Converts `rule | (rule ~ rest)` to `rule` since `(rule ~ rest)` will never match if `rule` didn't. Not sure if this should go in `factorizer.rs` but the pattern looked really similar. 3. Converts `(rule ~ rest)* ~ rule` to `rule ~ (rest ~ rule)*`, avoiding matching the last `rule` twice, should have a big impact on issues like #453 and #463. Goes to a new file names `lister.rs`, not sure if the name is descriptive enough. Actually there's a 4th optimization I thought of, converting `(rule ~ rest)* ~ rule?` to `rule ~ (rest ~ rule)* ~ rest?`, assuming `rule` is more expensive than `rest` but I don't think this can be determined if one of them contain an `Expr::Ident`. This would have the same effect of 3. on the relevant cases. Co-authored-by: Giacomo Stevanato <giaco.stevanato@gmail.com>

f0i · 2022-06-21T17:56:33Z

One more issue that caused exponential parsing time was in the form of a? ~ b? ~ rule ~ rest | rule (source) that I converted to (a ~ b? | b) ~ rule ~ rest | rule ~ rest?.

It is quite hard to find those when there is some nesting and indirection.
So I was also wondering if the results could be cached, so that it doesn't have to re-evaluate a rule when it's at the same position inside the input.

SkiFire13 mentioned this issue Aug 20, 2020

Add more optimizations #473

Merged

tomtau converted this issue into discussion #636 Jul 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Very slow recursive pattern #463

Very slow recursive pattern #463

Spu7Nix commented Jun 22, 2020

SkiFire13 commented Aug 20, 2020 •

edited

Loading

sbeckeriv commented Aug 26, 2020

f0i commented Jun 21, 2022 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Very slow recursive pattern #463

Very slow recursive pattern #463

Comments

Spu7Nix commented Jun 22, 2020

SkiFire13 commented Aug 20, 2020 • edited Loading

sbeckeriv commented Aug 26, 2020

f0i commented Jun 21, 2022 • edited Loading

This issue was moved to a discussion.

SkiFire13 commented Aug 20, 2020 •

edited

Loading

f0i commented Jun 21, 2022 •

edited

Loading