use the parser to disambiguate valid function parameters, fixing several miscompiles and ICEs #5474

cosmicexplorer · 2024-11-16T02:28:35Z

Problem

The CoffeeScript compiler currently accepts a wide variety of invalid parameter lists, successfully generating js code which then errors upon evaluation:

; coffee -c -b -s --no-header <<EOF
({a: (x)}) -> x
EOF
# (function(arg) {
#   var arg, x;
#   x = arg.undefined;
# });

; coffee -c -b -s --no-header <<EOF
({["x"]}) -> x
EOF
# (function({["x"]: "x"}) {
#   return x;
# });

; coffee -c -b -s --no-header <<EOF
([x[1]]) -> x
EOF
# (function([x[1]]) {
#   return x;
# });

For example, see how the second input is processed by node:

; (coffee -c -b -s --no-header | node) <<EOF
({["x"]}) -> x
EOF
[stdin]:1
(function({["x"]: "x"}) {
                  ^^^

SyntaxError: Invalid destructuring assignment target

It even produces an internal compiler error upon certain inputs:

# This is an ICE!
; coffee -c -b -s --no-header <<EOF
({x: x[1]}) -> x
EOF
# TypeError: Cannot read properties of undefined (reading 'value')
#     at atParam (.../lib/coffeescript/nodes.js:6543:54)
# ...

Current Approach

Right now, we address this piecemeal, by identifying post hoc which types of param names and values are valid.

For param names:

coffeescript/src/nodes.coffee

Lines 4340 to 4341 in 817c39a

    
           message = isUnassignable @name.unwrapAll().value 
        
           @name.error message if message

For (literal) param values:

coffeescript/src/nodes.coffee

Lines 4377 to 4378 in 817c39a

    
           checkAssignabilityOfLiteral = (literal) -> 
        
             message = isUnassignable literal.value

This approach is actually better than my proposal in several ways (see below), but it's a bit of a whack-a-mole, as this approach can only identify negative matches (invalid inputs), as opposed to defining a set of valid inputs (which is the job of the parser). This leads to my proposal.

Solution

It turns out that js function parameters (and their capability for destructuring assignment) are highly constrained in comparison to destructuring on the left-hand side of a "normal" = assignment expression (one which does not occur within a function parameter binding context). This propagates to CoffeeScript, and we can actually identify valid parameter assignments at parse time, by adding a parallel set of grammar rules specific to function parameters.

Comparison to Alternatives

This change strictly modifies the Jison grammar declaration, and does not otherwise modify any semantics: it produces equivalent output nodes, but from a more restricted range of inputs. This means it only errors on inputs that (I claim) it should already be rejecting, but it does it earlier instead of later.

There are further comparisons we can make:

Negative Matching

Compared to negative matching against unassignable nodes (the current approach):

pro: Parsing describes the structure of correct input, not just invalidating incorrect inputs one at a time. This should hopefully lead to fewer bug reports later.
con: However, it means we have to work harder to construct useful error messages, since LALR parsers like Jison have difficulty reconstructing the grammatical context for an incorrect token.

For some examples of this tradeoff, take the following test cases in error_messages.coffee modified in this PR (abbreviated below):

On main:

; coffee -c -b -s --no-header <<EOF
({@param: null}) ->
EOF
# [stdin]:1:11: error: keyword 'null' can't be assigned
# ({@param: null}) ->
#           ^^^^

; coffee -c -b -s --no-header <<EOF
({a: null}) ->
EOF
# [stdin]:1:6: error: keyword 'null' can't be assigned
# ({a: null}) ->
#      ^^^^

After this change:

; coffee -c -b -s --no-header <<EOF
({@param: null}) ->
EOF
# [stdin]:1:9: error: unexpected :
# ({@param: null}) ->
#         ^

; coffee -c -b -s --no-header <<EOF
({a: null}) ->
EOF
# [stdin]:1:6: error: unexpected null
# ({a: null}) ->
#      ^^^^

As you can see above, there is a bigger issue with ({@param: null}) -> than the use of null: it's also incorrectly using a shorthand @param as an object property name! This is actually valid as a computed property name with ({[@param]: x}) ->, but the negative matching approach currently doesn't have the logic to disambiguate those two.

However, unexpected : is much less informative to the user, so this approach would give up some of our ability to craft useful explanations of why something isn't a valid function parameter. For example, all it says for ({a: null}) -> is that null was "unexpected", as opposed to explaining that this is because null is a keyword.

It's not clear to me how to improve error messages with this parsing-based approach. We avoid miscompiles, but we also lose some explanatory capacity. One way we could mitigate this would be to add some documentation about the restrictions on function parameter destructuring to the documentation.

I still believe this should be considered backwards-compatible, as it only rejects code that would have silently miscompiled. However, it may cause a build pipeline to break, for code which would otherwise not cause errors if it was never evaluated. I'm not sure if CoffeeScript's stability contract covers compile-time errors for code that would definitely have errored if ever executed, but if so, we could consider putting this behavior behind a flag or environment variable. It will not reject any previously-valid code.

Hygienic Code Generation

One final alternative I'll mention to miscompiles is instead restructuring our code generation to avoid miscompiles. This might vaguely take the best of both prior approaches, as it lets us craft custom error messages explaining how to fix an issue, but allows us to avoid whack-a-mole negative matching logic.

For example, for the case of ({["x"]}) -> x, we could ensure that our destructuring expressions can only bind to expressions of a certain type. In the parsing-based approach from this PR, we currently achieve this via the grammar rule:

  # These are the only names we're allowed to bind in a destructuring expression from
  # a function parameter.
  ParamBindingTarget: [
    o 'Identifier'
    o 'ThisProperty'
  ]

We could accomplish something similar at the point of code generation, possibly by editing Param#asReference() from nodes.coffee. However, this has one downside. Refer to our ICE from before:

; coffee -c -b -s --no-header <<EOF
({x: x[1]}) -> x
EOF
# TypeError: Cannot read properties of undefined (reading 'value')
#     at atParam (.../lib/coffeescript/nodes.js:6543:54)
# ...

While it's true that we could fix this individual ICE and still fix this larger issue in codegen, the reason why the ICE occurs here is because the compiler is expecting a specific type of input for function parameters, has a lot of assumptions about it, and essentially has to redo the work of the parser in order to fully validate the input. It seems to make sense to use the parser to solve this problem.

move tests to error message

cosmicexplorer · 2024-11-16T02:34:36Z

I identified the listed miscompiles and ICE in the process of investigating extending the CoffeeScript grammar for #5471 and #5307 (comment), so I do have an ulterior motive in proposing this (backwards-compatible) change to the grammar. I believe the parsing-based approach here is also the most robust way to rule out invalid code, so I think this is still the best way to go about it regardless.

cosmicexplorer added 2 commits November 15, 2024 19:09

add test expressions for computed destructured property params

ea4ca2a

rewrite the grammar for param binding to fix several miscompiles

522578e

move tests to error message

cosmicexplorer mentioned this pull request Nov 16, 2024

Proposal: bind a name to a destructurable place with @ #5471

Closed

cosmicexplorer mentioned this pull request Nov 24, 2024

Discussion: TypeScript Output #5307

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use the parser to disambiguate valid function parameters, fixing several miscompiles and ICEs #5474

use the parser to disambiguate valid function parameters, fixing several miscompiles and ICEs #5474

cosmicexplorer commented Nov 16, 2024

cosmicexplorer commented Nov 16, 2024

	message = isUnassignable @name.unwrapAll().value
	@name.error message if message

	checkAssignabilityOfLiteral = (literal) ->
	message = isUnassignable literal.value

use the parser to disambiguate valid function parameters, fixing several miscompiles and ICEs #5474

Are you sure you want to change the base?

use the parser to disambiguate valid function parameters, fixing several miscompiles and ICEs #5474

Conversation

cosmicexplorer commented Nov 16, 2024

Problem

Current Approach

Solution

Comparison to Alternatives

Negative Matching

Hygienic Code Generation

cosmicexplorer commented Nov 16, 2024