Autosemi after comments #134

flbulgarelli · 2022-03-12T02:07:39Z

Hi! Currently, there is a subtle difference in the way language-javascript deals with auto semicolons when comments are present.

Let's have a look at this code sample:

function f1() {
  return // hello
    4
}

function f2() {
  return /* hello */
    4
}


function f3() {
  return /* hello
    */ 4
}

function f4() {
  return
    4
}

And how esprima parses it:

{
  "type": "Program",
  "body": [
    {
      "type": "FunctionDeclaration",
      "id": {
        "type": "Identifier",
        "name": "f1"
      },
      "params": [],
      "body": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ReturnStatement",
            "argument": null
          },
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "Literal",
              "value": 4,
              "raw": "4"
            }
          }
        ]
      },
      "generator": false,
      "expression": false,
      "async": false
    },
    {
      "type": "FunctionDeclaration",
      "id": {
        "type": "Identifier",
        "name": "f2"
      },
      "params": [],
      "body": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ReturnStatement",
            "argument": null
          },
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "Literal",
              "value": 4,
              "raw": "4"
            }
          }
        ]
      },
      "generator": false,
      "expression": false,
      "async": false
    },
    {
      "type": "FunctionDeclaration",
      "id": {
        "type": "Identifier",
        "name": "f3"
      },
      "params": [],
      "body": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ReturnStatement",
            "argument": null
          },
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "Literal",
              "value": 4,
              "raw": "4"
            }
          }
        ]
      },
      "generator": false,
      "expression": false,
      "async": false
    },
    {
      "type": "FunctionDeclaration",
      "id": {
        "type": "Identifier",
        "name": "f4"
      },
      "params": [],
      "body": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ReturnStatement",
            "argument": null
          },
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "Literal",
              "value": 4,
              "raw": "4"
            }
          }
        ]
      },
      "generator": false,
      "expression": false,
      "async": false
    }
  ],
  "sourceType": "script"
}

As you can see, the parser is introducing an automatic semicolon in all the four scenarios. This behavior is consistent with modern V8 and SpiderMonkey engines.

However, let's have a look at how language-javascript handles these scenarios. If you add the following tests to ProgramParser.hs...

describe "function with autosemi on return" $ do
  it "return with inline comment" $ do
      testProg "function f1(){ return //hello\n 4 }"  `shouldBe` "Right (JSAstProgram [JSFunction 'f1' () (JSBlock [JSReturn ,JSDecimal '4'])])"

  it "return with multiline comment" $ do
      testProg "function f2(){ return /* hello */\n  4 }"  `shouldBe` "Right (JSAstProgram [JSFunction 'f2' () (JSBlock [JSReturn ,JSDecimal '4'])])"

  it "return with multiline comment with newline" $ do
      testProg "function f3(){ return /* hello\n  */ 4 }"  `shouldBe` "Right (JSAstProgram [JSFunction 'f3' () (JSBlock [JSReturn ,JSDecimal '4'])])"

  it "return with newline" $ do
      testProg "function f4(){ return \n 4 }"  `shouldBe` "Right (JSAstProgram [JSFunction 'f4' () (JSBlock [JSReturn ,JSDecimal '4'])])"

...only the fourth and last test will actually produce the expected results. The other three will fail with the same error:

  test/Test/Language/Javascript/ProgramParser.hs:29:13:
  1) Program parser:, function with autosemi on return, return with inline comment
      expected: "Right (JSAstProgram [JSFunction 'f1' () (JSBlock [JSReturn ,JSDecimal '4'])])"
        but got: "Right (JSAstProgram [JSFunction 'f1' () (JSBlock [JSReturn JSDecimal '4' ])])"

  To rerun use: --match "/Program parser:/function with autosemi on return/return with inline comment/"

  test/Test/Language/Javascript/ProgramParser.hs:32:13:
  2) Program parser:, function with autosemi on return, return with multiline comment
      expected: "Right (JSAstProgram [JSFunction 'f2' () (JSBlock [JSReturn ,JSDecimal '4'])])"
        but got: "Right (JSAstProgram [JSFunction 'f2' () (JSBlock [JSReturn JSDecimal '4' ])])"

  To rerun use: --match "/Program parser:/function with autosemi on return/return with multiline comment/"

  test/Test/Language/Javascript/ProgramParser.hs:35:13:
  3) Program parser:, function with autosemi on return, return with multiline comment with newline
      expected: "Right (JSAstProgram [JSFunction 'f3' () (JSBlock [JSReturn ,JSDecimal '4'])])"
        but got: "Right (JSAstProgram [JSFunction 'f3' () (JSBlock [JSReturn JSDecimal '4' ])])"

  To rerun use: --match "/Program parser:/function with autosemi on return/return with multiline comment with newline/"

In other words, this parser is only inserting an automatic semicolon after a newline, but not after comments. This PR fixes this issue.

I have also refactored the Lexer tests, so that they don't just inspect the tokens produced by alex, but also those sent to happy.

flbulgarelli added 5 commits November 30, 2021 20:13

Add tests

7a3ca8b

Also consider comments as semi producers

63f9a9c

Generalize function

f328daf

Generalize lexing step and test happy lexer rules

018a983

Fixed typo

71fcdf5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Autosemi after comments #134

Autosemi after comments #134

Uh oh!

flbulgarelli commented Mar 12, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Autosemi after comments #134

Are you sure you want to change the base?

Autosemi after comments #134

Uh oh!

Conversation

flbulgarelli commented Mar 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

flbulgarelli commented Mar 12, 2022 •

edited

Loading