Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JavaScript] Division incorrectly highlighted as regex. A general fix is known, but nontrivial and could use feedback. #885

Closed
Thom1729 opened this issue Apr 7, 2017 · 3 comments

Comments

@Thom1729
Copy link
Collaborator

Thom1729 commented Apr 7, 2017

Examples (note that GitHub gets the second example wrong):

1 /**/ / 2 / /**/ 3;

1
/ 2 /
3;

+function foo () {} /x/i;

The root cause seems to be the architecture of expressions. The after-operator and after-identifier contexts work to differentiate uses of the / characters in common cases, but they aren't really a general solution. The same architectural issue also holds back #324; the suggested solution there suffers from exactly the same bug.

The third example also suffers from a lack of distinction between statements and expressions.

The general solution I've worked out for this sort of issue is to have separate expression-head and expression-tail contexts. I've included a minimal example:

%YAML 1.2
---
name: Example
scope: source.example

contexts:
  main:
    - match: (?=\S)
      push: [ expression-tail, expression-head ]

  prototype:
    - match: '/\*'
      scope: punctuation.comment.block.begin.example
      push:
        - meta_scope: comment.block.example
        - meta_include_prototype: false
        - match: '\*/'
          scope: punctuation.comment.block.end.example
          pop: true

  expression-tail:
    - match: '/'
      scope: keyword.operator.arithmetic.division.example
      push: expression-head

  expression-head:
    - match: '/'
      scope: punctuation.definition.regex.begin.example
      set:
        - meta_scope: string.regex.example
        - meta_include_prototype: false
        - match: '/'
          scope: punctuation.definition.regex.end.example
          pop: true

    - match: (?=\S)
      pop: true

The following are scoped correctly:

/abc/ /**/ / /ijk/ / /**/ /xyz/
/abc/
/ /ijk/ /
/xyz/

The expression-head context should:

  • Match unary prefix operators and continue (do not push, pop, or set).
  • Match complete expressions (such as literals or variables) and pop.
  • Match opening punctuation (such as ( or /) and set.
  • Otherwise, pop.

The expression-tail context should:

  • Match unary postfix operators and continue.
  • Match binary operators and push expression-head.
  • Otherwise, pop.

As written, this will match one expression and pop everything when the next character is not part of an expression. Slight modification will yield contexts that can parse a list of expressions or gracefully skip invalid input.


This might seem like an odd architecture, but I've used it quite a bit (including in an unpublished JavaScript syntax). It should gracefully solve this bug and any similar bugs and allow for an easy solution to #324. I welcome feedback on the approach; if there are no objections, I'll write up a patch.

@Thom1729
Copy link
Collaborator Author

Thom1729 commented Apr 9, 2017

This looks like it might be a bigger project than I'd realized. A roadmap might look like this:

  1. Remove comments and mustache templates from expressions and put them into a prototype. Add meta_include_prototype: false where needed.
  2. Remove most keywords (return, var, etc) from expressions and put them into a new statements scope that includes expressions but correctly handles statement termination.
  3. Refactor expressions into singular expression. Modify statements to properly handle expression statements.
  4. Implement the class and function declaration statements.1
  5. Refactor expression into expression-head and expression-tail. This will require changing most expression-related rules to use the stack differently. after-operator and after-identifier will probably go away. This will fix [JavaScript] Division incorrectly highlighted as regex. A general fix is known, but nontrivial and could use feedback. #885.
  6. Reimplement the semicolon insertion algorithm.2
  7. Implement [JavaScript] possible enhancement: distinguish uses of square brackets #324 (should be easy).

Questions:

  • Do we really need mustache templates built into the JavaScript syntax? They cause incorrect scoping for standard JavaScript (e.g. {{ 42; }}). Shouldn't this be a separate syntax that uses with_prototype? The same applies to HTML comments; surely, the HTML syntax will handle these.
  • Do we mind if some invalid JavaScript will change scoping? Step 2 will affect highlighting of nonsensical constructs like (throw x);.
  • Are we okay with the radical change that fixing these issues will entail? I know that a lot of work has already been put into the JavaScript syntax, and this will change a lot of lines of code.
  • Given the complexity and scope of the necessary changes, is it worth exploring a ground-up rewrite to the same tests (plus new ones)? If, hypothetically, someone had already mostly done it?3 I realize that the answer to this is probably “no”, but I think it's worth bringing up.

Also, an additional note. Rather than keeping both expression-tail and expression-head on the stack, it might seem logical to set from one to the other. But this has a substantial drawback: both contexts must be hard-coded to refer to each other. In the implementation I propose, you can easily extend expression-tail with additional behavior without code duplication; for instance, semicolon insertion can be handled by an expression-statement-tail context that simply includes expression-tail and by an expression-statement-restricted-tail context with slightly different logic. If expression-head were hard-coded to set expression-tail, then this would not be possible without a great deal of code duplication (as seen in the massive Ecmascript Sublime).


  1. Class and function declaration statements are not semicolon-terminated. The following two lines should be highlighted differently:

    function Foo() {} /a/g; // Function declaration, then expression statement starting with a regex
    +function Foo() {} /a/g; // Expression statement containing division operators.
  2. The current logic for restricted productions will fail on some cases:

    return 1
    /2/g;

    However, it currently works by coincidence, and step (5) will break it. With the new architecture, there's a straightforward fix.

  3. I have an unpublished implementation using the architecture I describe. The experience of creating that syntax provided the solutions to these bugs. That syntax would require minor revision to strip out unwanted features (such as JSX and Flow) and to use exactly the same scopes as the default syntax.

@Thom1729
Copy link
Collaborator Author

In working on this, I have noticed that a number of the tests are incorrect. Example:

'aabbcc'.replace(/b+/, 'd')
//               ^^^^ string.regexp
//                 ^ keyword.operator.quantifier.regexp

/a+(?:bc)/
// <- string.regexp
//  ^^ punctuation.definition.group.no-capture.regexp

The second line of code is not a regular expression; the slash is a division operator. To correct the test, a semicolon should be added to the first line of code. There are a handful of similar examples where a missing comma or semicolon results in a faulty test.

@Thom1729
Copy link
Collaborator Author

This has been completely resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant