Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsx_element nodes on different line being parsed as start on the same line. #329

Closed
lucario387 opened this issue Jul 7, 2024 · 6 comments · Fixed by #335
Closed

jsx_element nodes on different line being parsed as start on the same line. #329

lucario387 opened this issue Jul 7, 2024 · 6 comments · Fixed by #335
Labels

Comments

@lucario387
Copy link

lucario387 commented Jul 7, 2024

From v0.21.4, I experienced the jsx elements to be captured too widely. Here's an example of it

export default function Home() {
  return (
    <>
      <Button
        style={{ 
          color: 'blue', 
        }}
        disabled
      >
      </Button>
    </>
  )
}

The output of tree-sitter parse is the following:

(program [0, 0] - [13, 0]
  (export_statement [0, 0] - [12, 1]
    declaration: (function_declaration [0, 15] - [12, 1]
      name: (identifier [0, 24] - [0, 28])
      parameters: (formal_parameters [0, 28] - [0, 30])
      body: (statement_block [0, 31] - [12, 1]
        (return_statement [1, 2] - [11, 3]
          (parenthesized_expression [1, 9] - [11, 3]
            (jsx_element [2, 4] - [10, 7]
              open_tag: (jsx_opening_element [2, 4] - [2, 6])
              (jsx_element [2, 6] - [9, 15]
                open_tag: (jsx_opening_element [2, 6] - [8, 7]
                  name: (identifier [3, 7] - [3, 13])
                  attribute: (jsx_attribute [4, 8] - [6, 10]
                    (property_identifier [4, 8] - [4, 13])
                    (jsx_expression [4, 14] - [6, 10]
                      (object [4, 15] - [6, 9]
                        (pair [5, 10] - [5, 23]
                          key: (property_identifier [5, 10] - [5, 15])
                          value: (string [5, 17] - [5, 23]
                            (string_fragment [5, 18] - [5, 22]))))))
                  attribute: (jsx_attribute [7, 8] - [7, 16]
                    (property_identifier [7, 8] - [7, 16])))
                close_tag: (jsx_closing_element [8, 7] - [9, 15]
                  name: (identifier [9, 8] - [9, 14])))
              close_tag: (jsx_closing_element [9, 15] - [10, 7]))))))))

While this is the parsed result from commit c4739fe

(program [0, 0] - [13, 0]
  (export_statement [0, 0] - [12, 1]
    declaration: (function_declaration [0, 15] - [12, 1]
      name: (identifier [0, 24] - [0, 28])
      parameters: (formal_parameters [0, 28] - [0, 30])
      body: (statement_block [0, 31] - [12, 1]
        (return_statement [1, 2] - [11, 3]
          (parenthesized_expression [1, 9] - [11, 3]
            (jsx_element [2, 4] - [10, 7]
              open_tag: (jsx_opening_element [2, 4] - [2, 6])
              (jsx_element [3, 6] - [9, 15]
                open_tag: (jsx_opening_element [3, 6] - [8, 7]
                  name: (identifier [3, 7] - [3, 13])
                  attribute: (jsx_attribute [4, 8] - [6, 10]
                    (property_identifier [4, 8] - [4, 13])
                    (jsx_expression [4, 14] - [6, 10]
                      (object [4, 15] - [6, 9]
                        (pair [5, 10] - [5, 23]
                          key: (property_identifier [5, 10] - [5, 15])
                          value: (string [5, 17] - [5, 23]
                            (string_fragment [5, 18] - [5, 22]))))))
                  attribute: (jsx_attribute [7, 8] - [7, 16]
                    (property_identifier [7, 8] - [7, 16])))
                close_tag: (jsx_closing_element [9, 6] - [9, 15]
                  name: (identifier [9, 8] - [9, 14])))
              close_tag: (jsx_closing_element [10, 4] - [10, 7]))))))))

And the diff of the two parsed trees:

diff --git a/old_tree.txt b/new_tree.txt
index 31e1bb606db0..f0bb1eb3ac42 100644
--- a/old_tree.txt
+++ b/new_tree.txt
@@ -8,8 +8,8 @@
           (parenthesized_expression [1, 9] - [11, 3]
             (jsx_element [2, 4] - [10, 7]
               open_tag: (jsx_opening_element [2, 4] - [2, 6])
-              (jsx_element [3, 6] - [9, 15]
-                open_tag: (jsx_opening_element [3, 6] - [8, 7]
+              (jsx_element [2, 6] - [9, 15]
+                open_tag: (jsx_opening_element [2, 6] - [8, 7]
                   name: (identifier [3, 7] - [3, 13])
                   attribute: (jsx_attribute [4, 8] - [6, 10]
                     (property_identifier [4, 8] - [4, 13])
@@ -21,6 +21,6 @@
                             (string_fragment [5, 18] - [5, 22]))))))
                   attribute: (jsx_attribute [7, 8] - [7, 16]
                     (property_identifier [7, 8] - [7, 16])))
-                close_tag: (jsx_closing_element [9, 6] - [9, 15]
+                close_tag: (jsx_closing_element [8, 7] - [9, 15]
                   name: (identifier [9, 8] - [9, 14])))
-              close_tag: (jsx_closing_element [10, 4] - [10, 7]))))))))
+              close_tag: (jsx_closing_element [9, 15] - [10, 7]))))))))

As can be seen from the file above. despite <></> and <Button></Button> be on different lines, the <Button> element starts at [2, 6] instead of [3, 6] like in v0.21.3

I think the changes in 99be62f may have lead to this being affected.

@lucario387 lucario387 added the bug label Jul 7, 2024
@lucario387 lucario387 changed the title jsx_element nodes on different line shown as start on the same line. jsx_element nodes on different line being parsed as start on the same line. Jul 7, 2024
@jackschu
Copy link
Contributor

jackschu commented Jul 7, 2024

It seems like adding token.immediate to had an effect on the outer _jsx_child: choice?

We could explicitly allow whitespace with something like

    _jsx_child: $ => choice(
      $.jsx_text,
      /[\s\p{Zs}\uFEFF\u2028\u2029\u2060\u200B]+/,
      $.html_character_reference,
      $._jsx_element,
      $.jsx_expression,
    ),

But TBH, this feels downstream of #328 upholding #221 . This is starting to feel like the wrong call, its not what estree-compatible parsers do but theres no spec requiring it either way.

what i mean is having

<div>
    <div/>
</div>

result in

(program [0, 0] - [3, 0]
  (expression_statement [0, 0] - [2, 6]
    (jsx_element [0, 0] - [2, 6]
      open_tag: (jsx_opening_element [0, 0] - [0, 5]
        name: (identifier [0, 1] - [0, 4]))
      (jsx_text [0, 5] - [1, 4])
      (jsx_self_closing_element [1, 4] - [1, 10]
        name: (identifier [1, 5] - [1, 8]))
      (jsx_text [1, 10] - [2, 0])
      close_tag: (jsx_closing_element [2, 0] - [2, 6]
        name: (identifier [2, 2] - [2, 5])))))

would also solve this problem (and be helpful for babel/acorn compatibility) but would go back on #221

what do you think @amaanq

@amaanq
Copy link
Member

amaanq commented Jul 20, 2024

What if we just forbid newlines in the first regex group? The problem is (and this is a somewhat known issue) the spaces get consumed when parsing the author's original example, and then get lumped in with the parsing of the </ token because no prior token was produced (ideally a jsx_text one).

Log output showing what I mean:

image

The </ token shouldn't include spaces before it, so I think an easy stopgap here to ensure that we get a jsx_text node is to disallow newlines in the first group like I mentioned, so we get jsx_text nodes when there's no newline, and if there is, it's skipped as long as it's all whitespace only.

@jackschu a question I have is, are these spaces after a newline rendered in React as literal spaces? I'm no React/JSX expert, but if they are, we should consider going back on 221 then imo, since real syntactic info about the code is lost. If it is not rendered as spaces, then we can either go with what I proposed.

@jackschu
Copy link
Contributor

are these spaces after a newline rendered in React as literal spaces?

Kind of. If newlines are separating text nodes then React will render these as literal spaces. My best understanding is that this is the current ruleset for react facebook/react#480 (comment)

All lines having leading or trailing whitespace are trimmed, all newlines are
removed, adjacent text separated by newlines become separated by a single
space. Any whitespace tabs are replaced with spaces. Strings inside
expressions are unaffected.

This means that these examples will include whitespace in what react emits

let x = (<div> <Button/> </div>)
// OR
let y = (<div>
    hi
    bye
</div>)

But this example will not

<div> 
    <Button/> 
</div>

(for more detail see the JS emitted by this playground link)

So babel, acorn, etc decide to just always emit jsx_text nodes containing newlines and let downstream tooling figure it out. This is the direction id lean especially given that the ruleset that react seems to use is not really codified by any spec. But I'm not sure if this would be bad for @lucario387 's use-case.

@lucario387
Copy link
Author

lucario387 commented Jul 22, 2024

From what I can see of both of the examples, and from the playground link

let x = (<div> <Button/> </div>)
// OR
let y = (<div>
    hi
    bye
</div>)

They both do contain whitespaces, but they do not consider it part of the child element, but rather as separate, actual " " string nodes.
The latter let y = ... follows normal HTML spec, as

<p>
    Foo
    Bar
</p>

creates foo bar on browsers. This means \s* inbetween text nodes are stripped and trimmed to a singular whitespace, while whitespaces around an element are not captured at all. I believe the correct behavior should be mirroring that of HTML, considering jsx element are glorified HTML elements.

@jackschu
Copy link
Contributor

Agreed, I think this is what @amaanq and I are talking about w.r.t. 'going back on 221'

@amaanq
Copy link
Member

amaanq commented Aug 17, 2024

I'm fine with that, it makes more sense imo, though might be slightly annoying for some consumers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants