Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree sitter breaks code folding in certain cases #868

Closed
5 tasks done
gazhay opened this issue Jan 17, 2024 · 10 comments · Fixed by #1062
Closed
5 tasks done

Tree sitter breaks code folding in certain cases #868

gazhay opened this issue Jan 17, 2024 · 10 comments · Fixed by #1062
Assignees
Labels
bug Something isn't working

Comments

@gazhay
Copy link

gazhay commented Jan 17, 2024

Thanks in advance for your bug report!

  • Have you reproduced issue in safe mode?
  • Have you used the debugging guide to try to resolve the issue?
  • Have you checked our FAQs to make sure your question isn't answered there?
  • Have you checked to make sure your issue does not already exist?
  • Have you checked you are on the latest release of Pulsar?

What happened?

After installing the latest 1.13 version of Pulsar code folding in large javascript files "stops" at ~3500 lines.

On previous versions "ctrl-k ctrl-1" would fold the entire file at level 1 - it no longer does this. It folds the first 3500 or so and then does nothing after. You can still manually fold if you wish.

When using the menu "Fold All" it does fold the entire file (at every level).

toggling legacy tree-sitter in settings immediately fixes this.

Pulsar version

1.13

Which OS does this happen on?

🐧 Debian based (Linux Mint, Ubuntu, etc.)

OS details

22.04.3 LTS

Which CPU architecture are you running this on?

x86_64/AMD64

What steps are needed to reproduce this?

  1. Any large javascript file open in the editor pane.
    2.Ctrl-k ctrl-1
    3.Scroll down to find the point it stops being folded.
  2. Unfold all
  3. Go to settings and toggle "legacy tree-sitter"
  4. Return to edit pane
  5. Ctrl-k ctrl-1 now folds the entire file.

Additional Information:

I appreciate this may be an "edge" case with such a large file and currently I can enable legacy to work around.

@gazhay gazhay added the bug Something isn't working label Jan 17, 2024
@savetheclocktower savetheclocktower self-assigned this Jan 17, 2024
@savetheclocktower
Copy link
Contributor

We have test coverage for this, but certainly not on a file that's 3500 lines long. So I'll run these instructions against some of the larger files in our codebase to see if I can reproduce this issue.

Thanks for the report!

@savetheclocktower
Copy link
Contributor

Hmm, initial experiments aren't giving me much to go on. I tried it on a 6,000-line file and it seemed to work just fine.

A few questions:

  • What happens if you isolate the part that's failing? You say it stops being folded at around line 3500? Copy the text of the buffer from that point until the end (or as close as you can get while still being valid code), then paste it into a new buffer and try to fold that. Let me know what happens. I'm wondering if there's something in the file that it's getting confused by.
  • On that note, does the syntax highlighting still work over the entire file?
  • Is this happening in one particular file, or have you observed it over multiple files?

My main theory is that Tree-sitter is getting confused by something in the file, but that would also probably affect the syntax highlighting. If this is a file you can post publicly somewhere, please do so and I can test on the exact input. If it isn't, then maybe see if you can get it to happen with some other JavaScript that's open-source or otherwise publicly available.

@savetheclocktower savetheclocktower changed the title Tree sitter breaks cold folding in certain cases Tree sitter breaks code folding in certain cases Jan 19, 2024
@gazhay
Copy link
Author

gazhay commented Feb 9, 2024

I can confirm it is multiple files.

Also, although the code folding "stops" you can proceed to fold it manually where you would usually expect (and the old tree sitter does fine)

Syntax highlighting is also fine for the remainder of the file.

Also, my file in question has changed over time and the point at which it stops does not vary by much, but the code around the point at which it terminates has changed so I don't think it is related to the code, more some kind of folding loop being terminated?

@savetheclocktower
Copy link
Contributor

OK, I can rule out most of my initial theories based on the fact that manual folding still works fine.

Here's roughly how I've implemented this:

  • Get all folds in the document (harder than it sounds because we have to visit several “layers” if the buffer has any embedded languages — JS in HTML, code blocks in Markdown, etc.)
  • Add all fold boundaries to a red-black tree keyed on buffer position
  • Iterate through the tree and visit each boundary in order. When we hit a fold opening boundary, increase the “indent level” variable; when we hit a fold closing boundary, decrease the “indent level” variable. As we do this, we can add each new fold we encounter to a group based on indentation level
  • When we're done, get all the folds for the requested indent level and hand them off to TextEditor, which does the actual folding

(I call it “indent level” because that's literally how it used to work, but that makes no sense now that folds are semantic. So now it's levels of nesting, and the “indent level” is actually how many other fold ranges completely surround a given fold range.)

So there are a few possible points of failure here:

  1. We're failing to retrieve all the possible folds at the beginning of the operation
  2. The red-black tree iterator is terminating early
  3. The iterator has a subtle bug that, when encountered, leads it to misclassify the indent level of all subsequent folds
  4. We're handing off all the folds correctly, but TextEditor is failing partway through its attempt to fold all the ranges

For now I'll rule out options 1 and 4. Offhand I can't see how 2 is possible. So that leaves Option 3.

Try to fold at a different indent level — first level 2, then level 3 — and let me know what happens. If Option 3 is the culprit, then some of the folds you expected to occur on level 1 will actually happen on level 2, and so on.

@gazhay
Copy link
Author

gazhay commented Apr 17, 2024

Ok, finally got a chance to look at this.

Updated to latest (1.116)

Folding still fails at level 1, beyond that things get weird as you suggested in your 3rd option.
Folding at 2, 3, 4, works for the first portion of the file 4473 lines, then it folds the rest at level 1.
Folding to level 1 again after a 2,3,4 - folds up to 4473 at level 1 and unfolds everything else.

@savetheclocktower
Copy link
Contributor

OK. I'll keep this ticket open until we can solve this mystery. If you can reproduce these symptoms on a long file whose source code you are allowed to show me, please put it into a Gist or something so that I can try to reproduce this.

If you have other source files that are just as long, please confirm whether it's happening in multiple files or just one specific file.

@gazhay
Copy link
Author

gazhay commented May 17, 2024

Just updated to 117 and confirming the bug still exists - legacy tree sitter still working.

Happy yo share the one huge file privately to devs, but can't post it publicly.

@gazhay
Copy link
Author

gazhay commented Jun 21, 2024

Just dropping by to say 118 still has the tree-sitter bug I experienced.

@savetheclocktower
Copy link
Contributor

Happy yo share the one huge file privately to devs, but can't post it publicly.

You can find me on Discord if you like and send it to me in a private message. Haven't noticed this bug on any file of my own, so it'd be a coup if I could reproduce it on my machine with the same input.

@savetheclocktower
Copy link
Contributor

savetheclocktower commented Jul 17, 2024

@gazhay and I figured it out; he was kind enough to provide me the offending file on Discord.

The length of the file was a red herring! You can reproduce this bug with the following code:

function foo () {
  let results = [];
  for (let thing of things) {
    if (thing.bar) {results.push({
      lorem: 'ipsum'
    })} else {
      // TODO
    }
  }
}

function bar () {
  if (foo) {
    baz();
  }
}

Invoke the Editor: Fold At Indent Level 1 command and you'll notice that foo folds itself, but bar doesn't.

Yet if you go to this line

    })} else {

and insert a newline here

    })
} else {

then it works as expected.

This is annoying because our Tree-sitter future was supposed to save us from using indentation level as a proxy for code folding depth (as is the default behavior in TextMate grammars).

At first, I figured this was some sort of edge case that tree-sitter-javascript didn't handle right. But it's much simpler than that.

An if statement has its own implicit folding range:

if (foo) {
  // something
}

In this example, the fold starts at the end of line 1, no matter how long the line is — so (1, Infinity) — and ends just before the closing delimiter, meaning (3, 0). We have logic that tweaks it a bit in if/else scenarios, but it's not relevant here.

An object literal has its own implicit folding range, too:

let obj = {
  foo: 'foo'
};

In this example, the fold starts at the end of line 1, no matter how long the line is — so (1, Infinity) — and ends just before the closing delimiter, meaning (3, 0). In this case, the fold ranges for these two snippets are identical.

Now let's combine them by having a statement block and an object literal start on the same line!

if (bar) { obj = {
  foo: 'foo'
}}

Now we've got code that wants to place two different fold ranges that start on line 1. Only one of them can be toggled at once, so the editor has to pick a winner. That's the part that wasn't happening.

Running Editor: Fold At Indent Level 1 is tricky for us because we have to figure out fold containment on the fly. We ask each language layer for its folds, combine them into one list via a red-black tree, then iterate through them in position order. When we find the starting point of a fold, we increment a depth counter; when we find an ending point, we decrement the counter.

But that process doesn't seem to like when two folds report ranges that are basically identical! As a result, we seemed to be incrementing depth twice (when encountering the starting points of both fold ranges) but decrementing only once. This is probably the root cause of the bug! And I haven't quite solved that part yet. (Note that, as evidenced by the workaround, we seem to tolerate when the folds start at the same point as long as they end at different points.)

But the headline is that we should be more aware of the constraint that only one fold can start on a given line, and only one fold can end on a given line. If we ask the grammars to mark folds for us in a buffer, then find out that two of the folds are supposed to start at the end of line 4, then we need to be consistent about how we pick the “correct” one.

My suspicion is that we've been acting on the first match of a folds.scm query. We might want to keep doing that; we might instead want to weigh the various options for a line and pick the one that covers the greatest buffer range. Or the smallest buffer range. What's important is to be consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants