Declarative treesitter-based indentation rules #208985

lionel- · 2024-03-28T08:08:41Z

As the VS Code team is considering integrating tree-sitter in the UI thread (#161479 and more recently #207416) to potentially improve tokenization (#77140) and syntax highlighting (#50140) I'd like to suggest that indentation is another area that could be drastically improved by tree-sitter.

Currently language extensions have these possibilities when it comes to defining indentation, all with their drawbacks:

Define indentationRules and onEnterRules regexps. Because of the limitations of regexps to match programming language constructs this produces surprising indentation in many edge cases that are often not possible to fix in full generality. The python extension for instance has struggled with these quite a bit: Improve auto-indentation behaviour vscode-python#481
Override the Enter key to bypass the regexp-based indentation, e.g.:
https://marketplace.visualstudio.com/items?itemName=KevinRose.vsc-python-indent

The extension is then able to maintain knowledge of the code structure and implement an ad hoc engine.

This works but only when indenting after Enter is pressed, the indentation command is not called when moving lines up and down or pasting with autoindent. Furthermore, if the extension host is swamped there might be a noticeable delay between pressing enter and inserting the newline and indent in the editor.
Correct the indentation with a format-on-type provider for \n. This is the approach taken for python, with pylance now providing indentation corrections via format-on-type: Implement format on type for proper indentation pylance-release#1613.

I call these "corrections" because the indentation rules of the renderer thread are applied first and are immediately visible. The format-on-type edits are applied second and noticeable to the user if the extension host is slow to respond. That's not too bad but it's also possible for the format request to be cancelled/ignored if the user types too fast and invalidates the document version sent to the provider.

Indenting in both the renderer process and in edit corrections via format-on-type is a good compromise overall but it requires non-trivial analysis and behaviour implemented in extension code. Allowing extensions to provide smarter indentation rules aware of the tree structure of code (and with provisions for vertical alignment, see #66235) would make the initial indentation in the UI thread much more accurate in general.

That's where tree-sitter comes in. Both Emacs and nvim provide an indentation engine specified by declarative tree-sitter queries:

Similarly, a tree-sitter declarative language for indentation in VS Code could solve the issue of complex tree-based indentation running entirely in the main process without running extension code.

The text was updated successfully, but these errors were encountered:

alexr00 · 2024-04-02T08:34:19Z

@aiday-mar were you just working on something like this?

aiday-mar · 2024-04-02T12:24:28Z

Hi @lionel- @alexr00, thank you for posting this issue. Indeed, we are currently discussing using tree-sitter to perform the indentation. For the moment, we are trying to improve the indentation regex rules and explore the limits of these. As you mentioned, there are cases where regex indentation rules perform poorly mostly because the indentation code is not aware of the context. This is why tree-sitter could be a good alternative. One issue with using tree-sitter is that it would be computationally more expensive to do the indentation, and the indentation would be computed asynchronously (whereas currently the computation is synchronous). This implies a delay in the indentation just like with format-on-type. We will continue in-team discussion about this feature request.

lionel- · 2024-04-09T09:02:31Z

One issue with using tree-sitter is that it would be computationally more expensive to do the indentation, and the indentation would be computed asynchronously (whereas currently the computation is synchronous). This implies a delay in the indentation just like with format-on-type. We will continue in-team discussion about this feature request.

Interesting. I was hoping the incremental reparsing of TS would be fast enough to do everything synchronously. IIUC the asynchrony would be contained to the renderer process and would not be impacted by a slowed down extension host? Hopefully the delay will not be noticeable most of the time.

we are trying to improve the indentation regex rules and explore the limits of these. As you mentioned, there are cases where regex indentation rules perform poorly mostly because the indentation code is not aware of the context. This is why tree-sitter could be a good alternative.

I'd like to mention here an important use case for us (the same discussed in #136593 (comment)): chains of operator-separated expressions like a && b && c. In the language R these chains (often called pipelines) are very common:

# ggplot2 pipeline
mtcars |>
 ggplot(aes(
   disp,
   drat
 )) +
 geom_point(
   aes(colour = cyl)
 )

# dplyr pipeline
starwars |>
 group_by(species) |>
 summarise(
   n = n(),
   mass = mean(mass, na.rm = TRUE)
 ) |>
 filter(
   n > 1,
   mass > 50
 )

We are currently using OnEnter rules designed to avoid a staircase indentation effect like:

# Unwanted staircase
foo() +
  bar() +
    baz()

# Expected indentation
foo() +
  bar() +
  baz()

However these rules can only really handle one-liners, they start to fail when the pipeline elements are laid out on multiple lines:

foo() +
  bar(
    x
  ) +
    baz(
      y
    )

Even if we could match across more lines, I don't think we could really express what we need in full generality with regular expressions.

With nvim tree-sitter queries, we can express this alignment rule simply with:

[
  (binary_operator)
] @indent.begin

This should work with homogeneous left-associative operators because in that case, since we indent from the right-hand side, the parent of the current node will be the top most node in the chain of operators.

For more complex cases such as righ-associative operators, we can use the Emacs approach of expressing indentation with a matcher and an anchor. First here is how the same nvim rule as above is expressed:

((parent-is "binary_operator") parent-bol r-ts-mode-indent-offset)

This matches a child of a binary operator and indents from the parent node. To make sure that we always indent from the topmost node of a chain of binary operations, we can modify as follows:

((parent-is "binary_operator") (ancestor-while "binary_operator") r-ts-mode-indent-offset)

This uses this anchor which climbs a tree as long as the type matches and selects the start position of the last node that matched:

(defun ancestor-while (type)
  (lambda (node parent _bol &rest _)
    (while (and parent (string-match-p type (treesit-node-type parent)))
      (setq node parent)
      (setq parent (treesit-node-parent parent)))
    (treesit-node-start node)))

This example illustrates how TS rules can express indentation behaviour that approaches based on regular expressions struggle with. We will need a sufficiently flexible API to express more complex indent rules but it should be easy to extend the API with additional matchers and anchors as needed.

aiday-mar · 2024-04-09T09:40:58Z

Hi @lionel- thank you for sharing these interesting thoughts. Unfortunately, as of now I am not aware if/when we will consider changing the indentation API. I will let you know if/when this happens, and take your ideas into account.

As concerning the asynchronicity, I believe the tree parsing would be done in a separate worker thread. It could be the delay is small and therefore not noticeable as you mentioned - we would have to test to check this.

vscodenpa added the triage-needed label Mar 28, 2024

vscodenpa assigned connor4312 Mar 28, 2024

connor4312 assigned alexr00 and unassigned connor4312 Mar 28, 2024

vscodenpa removed the triage-needed label Mar 28, 2024

aiday-mar assigned aiday-mar and unassigned alexr00 Apr 2, 2024

lionel- mentioned this issue Apr 3, 2024

Check for onEnterRules that decrease indentation when determining new line indent #136593

Open

lionel- mentioned this issue Apr 9, 2024

R: Indentation of pipes with multi-line components produces staircase layout posit-dev/positron#2707

Closed

aiday-mar added editor-autoindent Editor auto indentation issues editor-indent-detection Issues around the guessing of indentation for files labels Nov 4, 2024

aiday-mar mentioned this issue Nov 12, 2024

Unexpected indentation #233700

Open

connor4312 added feature-request Request for new features or functionality under-discussion Issue is under discussion for relevance, priority, approach labels Dec 19, 2024

connor4312 added this to the Backlog milestone Dec 19, 2024

kieferrm mentioned this issue Jan 6, 2025

Iteration Plan for January 2025 #237297

Open

62 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Declarative treesitter-based indentation rules #208985

Declarative treesitter-based indentation rules #208985

lionel- commented Mar 28, 2024

alexr00 commented Apr 2, 2024

aiday-mar commented Apr 2, 2024

lionel- commented Apr 9, 2024

aiday-mar commented Apr 9, 2024

Declarative treesitter-based indentation rules #208985

Declarative treesitter-based indentation rules #208985

Comments

lionel- commented Mar 28, 2024

alexr00 commented Apr 2, 2024

aiday-mar commented Apr 2, 2024

lionel- commented Apr 9, 2024

aiday-mar commented Apr 9, 2024