Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use the parser to disambiguate valid function parameters, fixing several miscompiles and ICEs #5474

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cosmicexplorer
Copy link
Contributor

Problem

The CoffeeScript compiler currently accepts a wide variety of invalid parameter lists, successfully generating js code which then errors upon evaluation:

; coffee -c -b -s --no-header <<EOF
({a: (x)}) -> x
EOF
# (function(arg) {
#   var arg, x;
#   x = arg.undefined;
# });

; coffee -c -b -s --no-header <<EOF
({["x"]}) -> x
EOF
# (function({["x"]: "x"}) {
#   return x;
# });

; coffee -c -b -s --no-header <<EOF
([x[1]]) -> x
EOF
# (function([x[1]]) {
#   return x;
# });

For example, see how the second input is processed by node:

; (coffee -c -b -s --no-header | node) <<EOF
({["x"]}) -> x
EOF
[stdin]:1
(function({["x"]: "x"}) {
                  ^^^

SyntaxError: Invalid destructuring assignment target

It even produces an internal compiler error upon certain inputs:

# This is an ICE!
; coffee -c -b -s --no-header <<EOF
({x: x[1]}) -> x
EOF
# TypeError: Cannot read properties of undefined (reading 'value')
#     at atParam (.../lib/coffeescript/nodes.js:6543:54)
# ...

Current Approach

Right now, we address this piecemeal, by identifying post hoc which types of param names and values are valid.

For param names:

coffeescript/src/nodes.coffee

Lines 4340 to 4341 in 817c39a

message = isUnassignable @name.unwrapAll().value
@name.error message if message

For (literal) param values:

coffeescript/src/nodes.coffee

Lines 4377 to 4378 in 817c39a

checkAssignabilityOfLiteral = (literal) ->
message = isUnassignable literal.value

This approach is actually better than my proposal in several ways (see below), but it's a bit of a whack-a-mole, as this approach can only identify negative matches (invalid inputs), as opposed to defining a set of valid inputs (which is the job of the parser). This leads to my proposal.

Solution

It turns out that js function parameters (and their capability for destructuring assignment) are highly constrained in comparison to destructuring on the left-hand side of a "normal" = assignment expression (one which does not occur within a function parameter binding context). This propagates to CoffeeScript, and we can actually identify valid parameter assignments at parse time, by adding a parallel set of grammar rules specific to function parameters.

Comparison to Alternatives

This change strictly modifies the Jison grammar declaration, and does not otherwise modify any semantics: it produces equivalent output nodes, but from a more restricted range of inputs. This means it only errors on inputs that (I claim) it should already be rejecting, but it does it earlier instead of later.

There are further comparisons we can make:

Negative Matching

Compared to negative matching against unassignable nodes (the current approach):

  • pro: Parsing describes the structure of correct input, not just invalidating incorrect inputs one at a time. This should hopefully lead to fewer bug reports later.
  • con: However, it means we have to work harder to construct useful error messages, since LALR parsers like Jison have difficulty reconstructing the grammatical context for an incorrect token.

For some examples of this tradeoff, take the following test cases in error_messages.coffee modified in this PR (abbreviated below):

On main:

; coffee -c -b -s --no-header <<EOF
({@param: null}) ->
EOF
# [stdin]:1:11: error: keyword 'null' can't be assigned
# ({@param: null}) ->
#           ^^^^

; coffee -c -b -s --no-header <<EOF
({a: null}) ->
EOF
# [stdin]:1:6: error: keyword 'null' can't be assigned
# ({a: null}) ->
#      ^^^^

After this change:

; coffee -c -b -s --no-header <<EOF
({@param: null}) ->
EOF
# [stdin]:1:9: error: unexpected :
# ({@param: null}) ->
#         ^

; coffee -c -b -s --no-header <<EOF
({a: null}) ->
EOF
# [stdin]:1:6: error: unexpected null
# ({a: null}) ->
#      ^^^^

As you can see above, there is a bigger issue with ({@param: null}) -> than the use of null: it's also incorrectly using a shorthand @param as an object property name! This is actually valid as a computed property name with ({[@param]: x}) ->, but the negative matching approach currently doesn't have the logic to disambiguate those two.

However, unexpected : is much less informative to the user, so this approach would give up some of our ability to craft useful explanations of why something isn't a valid function parameter. For example, all it says for ({a: null}) -> is that null was "unexpected", as opposed to explaining that this is because null is a keyword.

It's not clear to me how to improve error messages with this parsing-based approach. We avoid miscompiles, but we also lose some explanatory capacity. One way we could mitigate this would be to add some documentation about the restrictions on function parameter destructuring to the documentation.

I still believe this should be considered backwards-compatible, as it only rejects code that would have silently miscompiled. However, it may cause a build pipeline to break, for code which would otherwise not cause errors if it was never evaluated. I'm not sure if CoffeeScript's stability contract covers compile-time errors for code that would definitely have errored if ever executed, but if so, we could consider putting this behavior behind a flag or environment variable. It will not reject any previously-valid code.

Hygienic Code Generation

One final alternative I'll mention to miscompiles is instead restructuring our code generation to avoid miscompiles. This might vaguely take the best of both prior approaches, as it lets us craft custom error messages explaining how to fix an issue, but allows us to avoid whack-a-mole negative matching logic.

For example, for the case of ({["x"]}) -> x, we could ensure that our destructuring expressions can only bind to expressions of a certain type. In the parsing-based approach from this PR, we currently achieve this via the grammar rule:

  # These are the only names we're allowed to bind in a destructuring expression from
  # a function parameter.
  ParamBindingTarget: [
    o 'Identifier'
    o 'ThisProperty'
  ]

We could accomplish something similar at the point of code generation, possibly by editing Param#asReference() from nodes.coffee. However, this has one downside. Refer to our ICE from before:

; coffee -c -b -s --no-header <<EOF
({x: x[1]}) -> x
EOF
# TypeError: Cannot read properties of undefined (reading 'value')
#     at atParam (.../lib/coffeescript/nodes.js:6543:54)
# ...

While it's true that we could fix this individual ICE and still fix this larger issue in codegen, the reason why the ICE occurs here is because the compiler is expecting a specific type of input for function parameters, has a lot of assumptions about it, and essentially has to redo the work of the parser in order to fully validate the input. It seems to make sense to use the parser to solve this problem.

@cosmicexplorer
Copy link
Contributor Author

I identified the listed miscompiles and ICE in the process of investigating extending the CoffeeScript grammar for #5471 and #5307 (comment), so I do have an ulterior motive in proposing this (backwards-compatible) change to the grammar. I believe the parsing-based approach here is also the most robust way to rule out invalid code, so I think this is still the best way to go about it regardless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant