Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline rules expand ambiguities unnecessarily #1221

Open
yurymann opened this issue Nov 28, 2022 · 4 comments
Open

Inline rules expand ambiguities unnecessarily #1221

yurymann opened this issue Nov 28, 2022 · 4 comments

Comments

@yurymann
Copy link

yurymann commented Nov 28, 2022

Thanks for fixing #1214 earlier. It now seems to return all ambiguities, but ambiguities inside inline rules are expanded within the parser rather than leaving it to CollapseAmbiguities transformer.

I have some input strings that explode very quickly when expanding ambiguities. I parse them with ambiguity=explicit, then count the expected number of expanded ambiguities. If it's within a reasonable threshold, I put the tree through CollapseAmbituities. Otherwise, the processing stops (until I find a better way to deal with such strings). As the result, I still can't use inline rules because parsing of some inputs explodes before I can even count the number of ambiguities.

To reproduce, parse this string:

t = parser.parse('1 2 3')
print(t.pretty())

with this grammar

start: field+
field: f1 | f2

f1: INT
f2: INT

%ignore WS
%import common.WS
%import common.INT

Result:

start
  _ambig
    field
      f1	1
    field
      f2	1
  _ambig
    field
      f1	2
    field
      f2	2
  _ambig
    field
      f1	3
    field
      f2	3

Now, inline field:

start: _field+
_field: f1 | f2

f1: INT
f2: INT

%ignore WS
%import common.WS
%import common.INT

The parser now expands the nested ambiguities prematurely:

_ambig
  start
    f1	1
    f1	2
    f1	3
  start
    f1	1
    f1	2
    f2	3
  start
    f1	1
    f2	2
    f1	3
  start
    f1	1
    f2	2
    f2	3
  start
    f2	1
    f1	2
    f1	3
  start
    f2	1
    f1	2
    f2	3
  start
    f2	1
    f2	2
    f1	3
  start
    f2	1
    f2	2
    f2	3
@erezsh
Copy link
Member

erezsh commented Nov 28, 2022

@chanicpanic The saga continues!

@yurymann
Copy link
Author

Sorry, messed up the original examples, corrected now.

@chanicpanic
Copy link
Contributor

The example provided is actually a special case that may be open to optimization. Consider the general case in which expansions of field may contain more than one rule:

start: field+
field: f1 ws | f2 ws2

f1: INT
f2: INT
ws: WS
ws2: WS

%import common.WS
%import common.INT

Parsing the text "1 2 3 " produces:

start
  _ambig
    field
      f1        1
      ws
    field
      f2        1
      ws2
  _ambig
    field
      f1        2
      ws
    field
      f2        2
      ws2
  _ambig
    field
      f1        3
      ws
    field
      f2        3
      ws2

If we inline field in this tree, we get:

start
  _ambig
    f1        1
    ws
    f2        1
    ws2
  _ambig
    f1        2
    ws
    f2        2
    ws2
  _ambig
    f1        3
    ws
    f2        3
    ws2

which is not correct. In this case, I see no more compact way to represent the ambiguities than to expand them all.

We may be able to optimize the special case in which all instances of _field have a single child. However, it is a tradeoff. It would add complexity during tree building for a smaller tree size. On the other hand, expanding the tree is work that may be done later anyways when collapsing ambiguities.

@yurymann Conditional inlining may be your friend for highly ambiguous rules:

start: field+
?field: f1 | f2

f1: INT
f2: INT

%ignore WS
%import common.WS
%import common.INT

gives:

start
  _ambig
    f1  1
    f2  1
  _ambig
    f1  2
    f2  2
  _ambig
    f1  3
    f2  3

You may also want to look at lark-ambig-tools which can help you process highly ambiguous trees more efficiently.

@yurymann
Copy link
Author

@chanicpanic, wow, conditional inlining looks indeed like THE solution for my case.
And thanks for the lark-ambi-tools link! Disambiguator is particularly a thing I was looking for and was almost ready to write myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants