-
-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Earley: share nodes created by the scanner with the completer #1451
Conversation
Thanks for looking into it, @chanicpanic ! In terms of API, it definitely looks like the right approach. In terms of implementation, it looks like something is a bit off. In test_consistent_derivation_order1, there should be 4 derivation. When I run it with explicit ambiguity on this PR, I get this result, which is incorrect and also repetitive:
The |
@erezsh I believe that result is correct, although I can certainly understand why it may not appear so. The first Using parser = Lark('''
start: a a
a: "." | b
b: "."
''', ambiguity='explicit')
tree = parser.parse('..')
for t in CollapseAmbiguities().transform(tree):
print(t.pretty()) Output:
|
@chanicpanic Yes, you're right. I apologize for my confusion. There is just one more thing I'm unsure about. It feels like the ordering is a bit arbitrary. In I'm worried about it because changing the default order might cause errors for users who have accidental ambiguities in their grammar. So I want to make sure we're now changing to the "right order", so we won't have to break it again in the future. |
This is a very valid concern. I believe that the new order is more consistent with the way we typically resolve ambiguities. That is, in lieu of priorities, if there is a rule with multiple possible derivations, we choose the derivation based its alternative's rule order. Concretely, if we have In the case of For Overall, I think the new behavior is more correct (no duplicate symbol nodes in the SPPF), aligns with rule order expectations even when an alternative ends in a terminal (as in For reference, grammars affected by the change contain a rule:
|
Okay, I'm sufficiently convinced. Thank you! |
There should only be one distinct SPPF start node now.
When a start rule alternative ends in a terminal, the scanner creates the start
SymbolNode
, and when a start rule alternative ends in a nonterminal, the completer creates the startSymbolNode
. The issue was that if both cases occurred, the scanner would create a startSymbolNode
, and then the completer would also create one instead of reusing the existing node because the completer did not know the node existed. Hence, we ended up with two different startSymbolNode
s. The issue is resolved by sharing thenode_cache
fromscan
withpredict_and_complete
.test_multiple_start_solutions
andtest_consistent_derivation_order1
were adjusted because the change affected the order in which derivations were produced for some grammars. The order is still consistent across executions though.test_cycles2
is back to having only one derivation which I believe is the correct behavior after the change.Before the change, the SPPF was:
Notice that the traversal that produces the "triple v" derivation contains two symbol nodes labeled
(v, 1, 2, -inf)
. The existence of two symbol nodes with the same label violates the "Shared" property of SPPFs.After the change, the SPPF is:
Now, there is only one symbol node labeled
(v, 1, 2, -inf)
. A traversal that produces the 'triple v" derivation still exists, but requires traversing the cycle through(v, 1, 2, -inf)
which we don't do.