Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree-sitter rolling fixes, 1.119 edition #1028

Merged

Conversation

savetheclocktower
Copy link
Contributor

These PRs contain fewer and fewer fixes each month. This is a good sign.

…meaning the `<?php` and `?>` tokens.

Instead, ignore the redundant tokens and try to proceed.

Fixes pulsar-edit#1032.
@savetheclocktower savetheclocktower marked this pull request as ready for review July 12, 2024 17:04
@savetheclocktower
Copy link
Contributor Author

This one's ready for review! Only the PHP commit involves any JavaScript changes; the rest of the changes are in query files.

Copy link
Member

@DeeDeeG DeeDeeG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the extent I know what's going on (see individual comments), looks good to me! Small is good! 🎆

Copy link
Member

@DeeDeeG DeeDeeG Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding 9a1c19a: Glad to see a fix! (Now that I've read the corresponding issue/comments.)

Seems legit!

Just to add something more substantive as commentary: Does often seem that throwing less often is the answer, in JS. Seems also that a suitable alternative was identified and taken. Nice.

Copy link
Member

@DeeDeeG DeeDeeG Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding d7faa84: Not sure I understand the distinction nesting these differently in this file and adding the trailing @fold marker, but the comments/commit message together make it sound like a good idea... (That's as much as I can say!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine this code in JS:

if (foo) {
  bar();
}

Here's the corresponding Tree-sitter tree:

Screenshot 2024-07-15 at 5 04 52 PM

We want to fold everything between the { and }. Most Tree-sitter parsers are structured so that the thing you want to fold is a node equivalent to the statement_block node here: it groups a bunch of things, but the first and last children are nearly always the anonymous nodes that act as delimiters. You'll find the same thing with arrays, literal object notation, and so on.

This is so common in Tree-sitter that we treat it as the default for folding purposes. If you do

(statement_block) @fold

in a folds.scm, we interpret it to mean “fold from the end of the line on which statement_block starts… up to the position at the start of the last child.” Because the last child is going to be a delimiter 99% of the time.

Python is a special case because it eschews so many delimiters in favor of indentation. So its folds.scm makes a broad assumption that, when we fold blocks, we'll always want to fold up to the very end. Hence (#set! fold.endAt endPosition) — tweaking the end position of the fold for everything listed in the first group.

But the assumption was a bit too broad. Tuples, lists, sets, and dictionaries all have delimiters, so they shouldn't have been included in the first group. The default behavior works just fine for these node types, so we just mark them with @fold without customizing their end position.

(I try to explain these things even if they don't stick — just in case it lowers the bus factor even a little bit, and because PR comments are part of the public record.)

Copy link
Member

@DeeDeeG DeeDeeG Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding 0070e88:

I hadn't heard of this auto keyword in C, so I got curious and looked it up. I have found this astoundingly funny StackOverflow answer to explain what the auto keyword is about, at least in C. To any reading these review comments: Enjoy.

https://stackoverflow.com/a/2192761

(Alright, and some arguably more helpful links about how it can be somewhat actually useful in C++, apparently. https://stackoverflow.com/questions/7576953/what-is-the-meaning-of-the-auto-keyword ... https://learn.microsoft.com/en-us/cpp/cpp/auto-cpp?view=msvc-170)

@@ -3,6 +3,7 @@
"then"
"do"
"("
"else"
Copy link
Member

@DeeDeeG DeeDeeG Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems legit and popular to indent after else in bash/sh, depending on where in particular you put your newlines. Lotta flexibility in bash/sh. I can't think of many times when I wouldn't want to newline+indent after an else even if it might be valid to with the right (or debatably/subjectively: "wrong" per style preferences) semicolon usage.

Just commentary, I guess. I think we probably indent for stuff like this when it's mostly expected even if not strictly required by the language to do so. So this is probabl right as-is? Rambling a bit.

(Welcome to sh, where the newlines sometimes and do and sometimes don't matter, and everything seems to be made up, but is no doubt documented in some obscure manual somewhere. /standup)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, failing to include else originally was just an oversight of mine.

@@ -169,6 +169,7 @@
] @keyword.operator.redirect.shell)

(test_operator) @keyword.operator.test.shell
(unary_expression "!" @keyword.operator.unary.shell)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have any relation to the entry on line 156?:

(negated_command "!" @keyword.operator.unary.shell)

Are these redundant, or it's something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something else.

if [[ ! -z "foo" ]]; then
  echo "aha"
fi

!$(foo)

First ! is (unary_expression "!"); second ! is (negated_command "!").

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding b5f6635: I recall someone wanted these, great to see them implemented! 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I'm out of my element looking at. Hopefully this is correct, I don't know how to read it. Given this was a round two after realizing round 1 wasn't working as intended, that hopefully means round 2 is working good.

Rubber-stamping this file on faith, heh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a couple issues here:

  1. It's really common to write something like (foo "bar") @baz in Tree-sitter query files when you really meant (foo "bar" @baz). The first one means “match any foo that has an anonymous "bar" node as a child.” The second one means “match the anonymous "bar" node that is a direct child of a foo.” That's why I had to come back for round 2.

  2. Round 1 was about another bad habit I have: referring to anonymous nodes without clarifying context. It's fine to refer to "case" casually in JS because there's only one scenario in which you'd find an anonymous "case" node: a switch statement. But "default" is found in more places! It should've occurred to me that it'd match the "default" node in stuff like export default function foo() {.

    So I rewrite these queries to match only the "case" and "default" we want (being super paranoid with "case" just to be thorough).

The @match capture itself is more complicated, especially with the#set! predicates and the config flag. Luckily, I don't have to bore you with the details in this case. :)

Copy link
Member

@DeeDeeG DeeDeeG Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation! I hope that these trails of conversations in these Tree-sitter rolling fixes PRs will lead anyone to what they need to know if they're hoping to pick up this kind of knowledge, and I consider it a long-term goal to be reasonably versed in this well enough to start properly self-teaching it or asking questions of the tree-sitter dev community.

Copy link
Member

@confused-Techie confused-Techie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Providing my rubber stamp as it were. This is not my area of expertise, but with really clear comments about the change in behavior seems valid to me.

Basically resulting in us just accepting some invalid syntax and continuing on as if we hadn't seen it, which seems like a valid strategy to me. As really the syntax highlighting is meant to help the user as much as possible, while not ensuring everything is valid.

Now personally I do appreciate syntax highlighting failing, as it alerts me of a mistake I've made, but in this case, as far as I'm aware and recall, this was requested due to usage in some type of templating environment, so in situations like that, this makes perfect sense to me

@savetheclocktower savetheclocktower merged commit 9aeb99a into pulsar-edit:master Jul 16, 2024
103 checks passed
@savetheclocktower savetheclocktower deleted the tree-sitter-july branch July 17, 2024 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants