-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefer element reference over method invocation #156
Conversation
whoops :D |
@maxbrunsfeld I'm having trouble getting tree-sitter to attempt to resolve this:
as a I'd love your insight if you have some moments to check this out. Here's what |
I think that, for that to parse as a command call, you'd need a comma after the array literal: argf [x], y |
0313cef
to
3ec2c53
Compare
@maxbrunsfeld I was able to resolve the corpus test failures. Unfortunately, I had to fall back to GLR parsing. I think the work here both fixes the linked issues and parses the test corpus properly. However, I'm confident that the conflict here should be resolvable with an |
Yup, it turns out the issue I actually needed to investigate was argf [f] do
...
end |
Ruby presents two ~syntactic~ sugarings that can not be distinguished unambiguously from syntax alone: First, array elements can be referenced using a bracketed argument after any amount of white space, so: x.[](0) x[0] x [0] are all equivalent. Second, methods may be invoked with omitted parends, so: f(y) f y are equivalent. The ambiguity can be seen when the function argument is a literal array: f [0] At this point, there is no syntactic information that can distinguish between element reference and procedural invocation. This can be seen by running this program in irb: irb(main):001:0> x = [0, 1, 2] => [0, 1, 2] irb(main):002:0> x.[](0) => 0 irb(main):003:0> x [0] => 0 irb(main):004:0> def y(z) irb(main):005:1> z irb(main):006:1> end => :y irb(main):007:0> y([0]) => [0] irb(main):008:0> y [0] => [0] Previously, tree-sitter-ruby handled this ambiguity by presenting both `x [0]` and `y [0]` as procedural invocation. However, this causes a parse error as described in tree-sitter#146, when parsing d.find { |x| a(x) } [b] == c Here I add an optional, lower-precedence interpretation of `x [0]` as an element reference. Due to the construction of the grammar in this project, this unfortunately causes problems when attempting to parse constructs like: fun [0] do something end as the parser will eagerly consume `fun [0]` as the left-hand-side of a binary expression. To deal with this case, I explicitly add this construct to the `call` production rule. Unfortunately I had to resort to the GLR parser in order to resolve the ambiguity between these two rules. Finally, note that the tree obtained from the construct z [0] == 0 is context-sensitive in Ruby. If `z` is an array type, it is interpreted as `binary ( reference ( identifier, integer ), integer`. If `z` is a method, it is interpreted as `call ( identifier, binary ( array (integer), integer)`. Since tree-sitter assumes the parsed language is context-free, there's no good way for us to resolve this ambiguity. This commit prefers the second, method-invocation, interpretation, which appears to be more common within the test corpus.
3ec2c53
to
41720e3
Compare
This one might take a bit longer for me to review because it's more complex. I will try to look at it next week. |
Hi @maxbrunsfeld ... just checking in to see if you'd had an opportunity to look at this yet. 🙏 |
Hi @maxbrunsfeld ... Just checking in again to see if you (or possibly any other maintainers) had a chance to look over this change yet. Much thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great explanation of your debugging process.
This is actually a pretty tricky change, surprisingly. There are some downsides to this change that I'd like to avoid if possible, but I admit that wasn't able to come up with something better on my first attempt.
The way that the different precedences interact is really tricky in this part of the grammar. I would like to add some more precise APIs for resolving conflicts like these.
I am going to spend a little bit more time investigating this one myself, to see if there's any way to fix the parse errors that you've found (and written good tests for) without adding conflicts
, and without changing the structure of call
.
Thanks all of the debugging and analysis that you've done so far. We'll fix this soon one way or the other.
grammar.js
Outdated
@@ -535,8 +547,10 @@ module.exports = grammar({ | |||
seq(receiver, arguments), | |||
seq(receiver, prec(PREC.CURLY_BLOCK, seq(arguments, block))), | |||
seq(receiver, prec(PREC.DO_BLOCK, seq(arguments, doBlock))), | |||
prec(PREC.CURLY_BLOCK, seq(receiver, $.array, block)), | |||
prec(PREC.DO_BLOCK, seq(receiver, $.array, doBlock)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause call
nodes to have a new type of child (array
), instead of just receiver, arguments
and block
/do_block
. It seems a little bit counterintuitive to me, so I'd like to try and avoid this.
grammar.js
Outdated
conflicts: $ => [ | ||
[$._lhs, $.call], | ||
[$._lhs, $.call, $.command_call, $.command_call_with_block], | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried about how this conflict will behave, since it seems like it may come up in more cases than just an identifier followed by a [
. It may be necessary, but I'd like to look into this some, to see if it can be avoided.
When an opening square bracket appears immediately after a callable expression like "a" or "a.b", we must decide between two possible interpretations of the bracket: 1. It could be part of an element reference, as in `a[0] = true`. 2. Or it could be an array literal, passed as an argumet, as in `puts [1, 2, 3]` If there is no preceding whitespace, the bracket should *always* be treated as part of an element reference. This matches MRI's behavior. If there *is* preceding whitespace, MRI makes its decision in a context-sensitive way, based on whether the preceding expression is a local variable or a method name. This parser is not context-sensitive, so we instead will interpret the bracket as part of an array literal whenever that is syntactically valid, and interpret it as part of element reference otherwise. The external scanner can use the validity of other expression tokens like `string` to infer whether an array literal would be valid.
@nbrahms I think I have found a solution that I prefer, because there are no |
Awesome thanks @maxbrunsfeld |
Use external scanner logic to distinguish between arrays & subscripts
Yup, that implementation looks good to me. I've merged it into the source branch of this PR. |
Just to double-check @maxbrunsfeld ... I think this change still suffers from your point in #156 (comment)... do we want to try to add aliasing to avoid this? |
Ah, I just saw that you were able to undo the changes to the |
Thanks! |
Ruby presents two
syntacticsugarings that can not be distinguishedunambiguously from syntax alone:
First, array elements can be referenced using a bracketed argument after
any amount of white space, so:
are all equivalent.
Second, methods may be invoked with omitted parends, so:
are equivalent.
The ambiguity can be seen when the function argument is a literal array:
At this point, there is no syntactic information that can distinguish
between element reference and procedural invocation.
This can be seen by running this program in irb:
Previously, tree-sitter-ruby handled this ambiguity by presenting both
x [0]
andy [0]
as procedural invocation.However, this causes a parse error as described in
#146, when parsing
Here I add an optional, lower-precedence interpretation of
x [0]
as anelement reference.
Due to the construction of the grammar in this project, this
unfortunately causes problems when attempting to parse constructs like:
as the parser will eagerly consume
fun [0]
as the left-hand-side ofa binary expression. To deal with this case, I explicitly add this
construct to the
call
production rule. Unfortunately I had to resortto the GLR parser in order to resolve the ambiguity between these two
rules.
Finally, note that the tree obtained from the construct
is context-sensitive in Ruby. If
z
is an array type, it is interpretedas
binary ( reference ( identifier, integer ), integer
. Ifz
isa method, it is interpreted as
call ( identifier, binary ( array (integer), integer)
. Since tree-sitter assumes the parsed language iscontext-free, there's no good way for us to resolve this ambiguity. This
commit prefers the second, method-invocation, interpretation, which
appears to be more common within the test corpus.
Fixes #146 and #155