Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree-sitter rolling fixes (February) #906

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .eslintignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
*.ts
vendor
86 changes: 69 additions & 17 deletions packages/language-c/grammars/tree-sitter-c/highlights.scm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I like the amount of comments in this, I sort of almost can follow along with this. (One of these days I need to read what I believe I heard you've written about how to write .scm files, so I know what's specifically going on in these. But I can sense the comments will come very much in handy then.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably because of how not-at-home I feel inside C/C++ as opposed to most of the other languages for which I've written highlights.scm files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was gonna say, at risk of being mildly off-topic or tangenty: C seems really tricky based on reading through this (plus some stuff I'm dealing with outside of Pulsar lately that's in C...)

Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

; PREPROCESSOR
; ============

Expand All @@ -16,21 +17,48 @@
(["#if" "#ifdef" "#ifndef" "#endif" "#elif" "#else" "#define" "#include"] @punctuation.definition.directive.c
(#set! adjust.endAfterFirstMatchOf "^#"))


; This will match if the more specific rules above haven't matched. The
; anonymous nodes will match under ideal conditions, but might not be present
; if the parser is flummoxed.
; `preproc_directive` will be used when the parser doesn't recognize the
; directive as one of the above. It's permissive; `#afdfafsdfdfad` would be
; parsed as a `preproc_directive`.
;
; Hence this rule will match if the more specific rules above haven't matched.
; The anonymous nodes will match under ideal conditions, but might not be
; present even when they ought to be _if_ the parser is flummoxed; so this'll
; sometimes catch `#ifdef` and others.
((preproc_directive) @keyword.control.directive.c
(#set! capture.shy true))

((preproc_ifdef
(identifier) @entity.name.function.preprocessor.c
(#match? @entity.name.function.preprocessor.c "[a-zA-Z_$][\\w$]*")))
((preproc_directive) @punctuation.definition.directive.c
(#set! capture.shy true)
(#set! adjust.endAfterFirstMatchOf "^#"))

; Macro functions are definitely entities.
(preproc_function_def
(identifier) @entity.name.function.preprocessor.c
(#set! capture.final true))

; Identifiers in macro definitions are definitely constants.
((preproc_def
name: (identifier) @constant.preprocessor.c))

; We can also safely treat identifiers as constants in `#ifdef`…
((preproc_ifdef
(identifier) @constant.preprocessor.c))

; …and `#if` and `#elif`…
(preproc_if
(binary_expression
(identifier) @constant.preprocessor.c))
(preproc_elif
(binary_expression
(identifier) @constant.preprocessor.c))

; …and `#undef`.
((preproc_call
directive: (preproc_directive) @_IGNORE_
argument: (preproc_arg) @constant.preprocessor.c)
(#eq? @_IGNORE_ "#undef"))

(system_lib_string) @string.quoted.other.lt-gt.include.c
((system_lib_string) @punctuation.definition.string.begin.c
(#set! adjust.endAfterFirstMatchOf "^<"))
Expand All @@ -48,6 +76,15 @@
(#set! capture.final true))

(primitive_type) @support.storage.type.builtin.c

; When the user has typed `#define FOO`, the macro injection thinks that `FOO`
; is a type declaration (for some reason). This node structure seems to exist
; only in that unusual and incorrect scenario, so we'll stop it from happening
; so that it doesn't override the underlying `constant.other.c` scope.
(translation_unit
(type_identifier) @_IGNORE_
(#set! capture.final))

(type_identifier) @support.other.storage.type.c

; These types are all reserved words; if we see an identifier with this name,
Expand Down Expand Up @@ -133,27 +170,31 @@

; The "x" in `int x;`
(declaration
declarator: (identifier) @variable.declaration.c)
declarator: (identifier) @variable.other.declaration.c)
Comment on lines -136 to +173
Copy link
Member

@DeeDeeG DeeDeeG Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we marking so many of these as .other, in addition to the various other CSS classes being added?

I can guess it helps make sure rules in stylesheets actually apply, by adding this class on to the rule's selector to increase its specificity as needed, without resorting to !important... Is this basically the reason, or anything else I'm missing?


; The "x" in `int x = y;`
(init_declarator
declarator: (identifier) @variable.declaration.c)
declarator: (identifier) @variable.other.declaration.c)

; The "x" in `SomeType *x;`
; (Should work no matter how many pointers deep we are.)
(pointer_declarator
declarator: [(identifier) (field_identifier)] @variable.declaration.pointer.c
declarator: [(identifier) (field_identifier)] @variable.other.declaration.pointer.c
(#is? test.descendantOfType "declaration field_declaration"))

; An array declarator: the "table" in `int table[4];`
(array_declarator
declarator: (identifier) @variable.other.declaration.c)

; A member of a struct.
(field_declaration
(field_identifier) @variable.declaration.member.c)
(field_identifier) @variable.other.declaration.member.c)

; An attribute in a C99 struct designated initializer:
; the "foo" in `MY_TYPE a = { .foo = true };
(initializer_pair
(field_designator
(field_identifier) @variable.declaration.member.c))
(field_identifier) @variable.other.declaration.member.c))

; (and the associated ".")
(initializer_pair
Expand All @@ -162,15 +203,15 @@

(field_declaration
(pointer_declarator
(field_identifier) @variable.declaration.member.c))
(field_identifier) @variable.other.declaration.member.c))

(field_declaration
(array_declarator
(field_identifier) @variable.declaration.member.c))
(field_identifier) @variable.other.declaration.member.c))

(init_declarator
(pointer_declarator
(identifier) @variable.declaration.member.c))
(identifier) @variable.other.declaration.member.c))

; The "x" in `x = y;`
(assignment_expression
Expand Down Expand Up @@ -253,8 +294,19 @@
(false)
] @constant.language._TYPE_.c

((identifier) @constant.c
(#match? @constant.c "[_A-Z][_A-Z0-9]*$"))
; Don't try to scope (e.g.) `int FOO = 1` as a constant when the user types `=`
; but has not typed the value yet.
(ERROR
(identifier) @_IGNORE_
(#set! capture.final))

; In most languages we wouldn't be making the assumption that an all-caps
; identifier should be treated as a constant. But those languages don't have
; macro preprocessors. The convention is decently strong in C/C++ that all-caps
; identifiers will refer to `#define`d things.
((identifier) @constant.other.c
(#match? @constant.other.c "^[_A-Z][_A-Z0-9]*$")
(#set! capture.shy))


; COMMENTS
Expand Down
82 changes: 61 additions & 21 deletions packages/language-c/grammars/tree-sitter-cpp/highlights.scm
Original file line number Diff line number Diff line change
Expand Up @@ -13,33 +13,55 @@
"#define" @keyword.control.directive.define.cpp
"#include" @keyword.control.directive.include.cpp

(["#if" "#ifdef" "#ifndef" "#endif" "#elif" "#else" "#define" "#include"] @punctuation.definition.directive.c
(["#if" "#ifdef" "#ifndef" "#endif" "#elif" "#else" "#define" "#include"] @punctuation.definition.directive.cpp
(#set! adjust.endAfterFirstMatchOf "^#"))


; This will match if the more specific rules above haven't matched. The
; anonymous nodes will match under ideal conditions, but might not be present
; if the parser is flummoxed.
((preproc_directive) @keyword.control.directive.c
; `preproc_directive` will be used when the parser doesn't recognize the
; directive as one of the above. It's permissive; `#afdfafsdfdfad` would be
; parsed as a `preproc_directive`.
;
; Hence this rule will match if the more specific rules above haven't matched.
; The anonymous nodes will match under ideal conditions, but might not be
; present even when they ought to be _if_ the parser is flummoxed; so this'll
; sometimes catch `#ifdef` and others.
((preproc_directive) @keyword.control.directive.cpp
(#set! capture.shy true))

((preproc_ifdef
(identifier) @entity.name.function.preprocessor.c
(#match? @entity.name.function.preprocessor.c "[a-zA-Z_$][\\w$]*")))
((preproc_directive) @punctuation.definition.directive.cpp
(#set! capture.shy true)
(#set! adjust.endAfterFirstMatchOf "^#"))

; Macro functions are definitely entities.
(preproc_function_def
(identifier) @entity.name.function.preprocessor.c
(identifier) @entity.name.function.preprocessor.cpp
(#set! capture.final true))

(preproc_function_def
(identifier) @entity.name.function.preprocessor.cpp
(#set! capture.final true)
)
; Identifiers in macro definitions are definitely constants.
((preproc_def
name: (identifier) @constant.preprocessor.cpp))

(system_lib_string) @string.quoted.other.lt-gt.include.c
((system_lib_string) @punctuation.definition.string.begin.c
; We can also safely treat identifiers as constants in `#ifdef`…
((preproc_ifdef
(identifier) @constant.preprocessor.cpp))

; …and `#if` and `#elif`…
(preproc_if
(binary_expression
(identifier) @constant.preprocessor.cpp))
(preproc_elif
(binary_expression
(identifier) @constant.preprocessor.cpp))

; …and `#undef`.
((preproc_call
directive: (preproc_directive) @_IGNORE_
argument: (preproc_arg) @constant.preprocessor.cpp)
(#eq? @_IGNORE_ "#undef"))

(system_lib_string) @string.quoted.other.lt-gt.include.cpp
((system_lib_string) @punctuation.definition.string.begin.cpp
(#set! adjust.endAfterFirstMatchOf "^<"))
((system_lib_string) @punctuation.definition.string.end.c
((system_lib_string) @punctuation.definition.string.end.cpp
(#set! adjust.startBeforeFirstMatchOf ">$"))


Expand All @@ -52,6 +74,13 @@
(type_identifier) @_IGNORE_
(#set! capture.final true))

; When the user has typed `#define FOO`, the macro injection thinks that `FOO`
; is a type declaration (for some reason). This node structure seems to exist
; only in that unusual and incorrect scenario, so we'll stop it from happening
; so that it doesn't override the underlying `constant.other.c` scope.
(translation_unit
(type_identifier) @_IGNORE_
(#set! capture.final))

(primitive_type) @support.type.builtin.cpp

Expand Down Expand Up @@ -232,7 +261,7 @@
; The "x" in `SomeType *x;`
; (Should work no matter how many pointers deep we are.)
(pointer_declarator
declarator: [(identifier) (field_identifier)] @variable.declaration.pointer.c
declarator: [(identifier) (field_identifier)] @variable.declaration.pointer.cpp
(#is? test.descendantOfType "declaration field_declaration"))

; A member of a struct.
Expand Down Expand Up @@ -289,7 +318,7 @@
; The "foo" in `const char *foo` within a parameter list.
; (Should work no matter how many pointers deep we are.)
(pointer_declarator
declarator: [(identifier) (field_identifier)] @variable.parameter.pointer.c
declarator: [(identifier) (field_identifier)] @variable.parameter.pointer.cpp
(#is? test.descendantOfType "parameter_declaration"))

(parameter_declaration
Expand Down Expand Up @@ -332,8 +361,19 @@
(false)
] @constant.language._TYPE_.cpp

((identifier) @constant.cpp
(#match? @constant.cpp "[_A-Z][_A-Z0-9]*$"))
; Don't try to scope (e.g.) `int FOO = 1` as a constant when the user types `=`
; but has not typed the value yet.
(ERROR
(identifier) @_IGNORE_
(#set! capture.final))

; In most languages we wouldn't be making the assumption that an all-caps
; identifier should be treated as a constant. But those languages don't have
; macro preprocessors. The convention is decently strong in C/C++ that all-caps
; identifiers will refer to `#define`d things.
((identifier) @constant.other.cpp
(#match? @constant.other.cpp "[_A-Z][_A-Z0-9]*$")
(#set! capture.shy))


; COMMENTS
Expand Down
70 changes: 69 additions & 1 deletion packages/language-html/grammars/tree-sitter-html/folds.scm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Knowing the issue this is addressing, these comments are great and very readable, IMO.

Even though I can't read .scm files that well right now, I know what purpose this is accomplishing. 👍. Thanks.

Original file line number Diff line number Diff line change
@@ -1,6 +1,74 @@
; When dealing with a self-closing element that spans multiple lines, this lets
; us fold the attribute list.
;
; This query captures elements that happen to be self-closing but don't end
; with an XHTML-style ` />`. Because `tree-sitter-html` doesn't distinguish
; these from elements that can have content, we have to check the tag name to
; know how to treat these.

((element
(start_tag
(tag_name) @_IGNORE_) @fold)
(#match? @_IGNORE_ "^(area|base|br|col|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)$")
)

; This one captures the XHTML-style nodes.
(self_closing_tag) @fold


; TODO: Right now, the fold cache doesn't work properly when a given range
; satisfies more than one fold. We should employ `ScopeResolver` to fix this.

; Fold up all of
;
; <div
; foo="bar"
; baz="thud">
;
; </div>
;
; with the fold indicator appearing on whichever line has the `>` that closes
; the opening tag.
;
; Usually this'll be the same line on which the tag opened; but when it isn't,
; this allows for the attribute list of the opening element to be folded
; separately from the element's contents.
;

(element
(start_tag
(tag_name) @_IGNORE_
">" @fold)
(#not-match? @_IGNORE_ "^(area|base|br|col|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)$")
(#set! fold.endAt parent.parent.lastNamedChild.startPosition)
(#set! fold.adjustToEndOfPreviousRow true)
)


; When we have…
;
; <div
; foo="bar"
; baz="thud"
; >
;
; </div>
;
; …we can put a fold indicator on the line with `<div` and use it to fold up
; all of a start tag's attributes.
;
; We keep the end of the fold on a separate line because otherwise we lose the
; ability to independently toggle the folding of the element's contents.
;
(element
(start_tag
(tag_name) @_IGNORE_) @fold
(#not-match? @_IGNORE_ "^(area|base|br|col|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)$")
(#set! fold.endAt lastChild.startPosition)
(#set! fold.adjustToEndOfPreviousRow true))


[
(element)
(script_element)
(style_element)
] @fold
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

((start_tag) @indent
; Only indent if this isn't a self-closing tag.
(#not-match? @indent "^<(?:area|base|br|col|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)\\s"))
(#not-match? @indent "^<(?:area|base|br|col|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)(?=\\s|>)"))

; `end_tag` will still match when only `</div` is present. Without enforcing
; the presence of `>`, the dedent happens too soon.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@ parser: 'tree-sitter-phpdoc'
injectionRegex: '^(phpdoc|PHPDoc)$'

treeSitter:
parserSource: 'github:claytonrcarter/tree-sitter-phpdoc#915a527d5aafa81b31acf67fab31b0ac6b6319c0'
parserSource: 'github:claytonrcarter/tree-sitter-phpdoc#f285e338d328a03920a9bfd8dda78585c7ddcca3'
grammar: 'tree-sitter/tree-sitter-phpdoc.wasm'
highlightsQuery: 'tree-sitter/queries/phpdoc/highlights.scm'
Loading
Loading