Fix completion race conditions #6173

pascalkuthe · 2023-03-03T14:56:51Z

Supercedes #1819

This PR resolves multiple problems with the completions.
The root cause of all of these issues were shortcomings in the way savepoints are handled.
Savepoints are set when a completion is triggered so that the changes sent by the server
can be applied correctly (refer to the same document).

The exact explanation is a bit tricky. I will provide an overview here but looking at the commits individually also helps. Every commit has a separate explanation/justification and usually just contains a couple of LOC of changes and therefore makes the explanation more digestible.

Overview

The savepoint is currently created when the completion request response was received. However, a (spec compliant) language server computes the completion response for the document state when it receives the completion request. That means that any changes between sending the completion requests to the server and receiving a response can cause all kinds of issues. For example if completion is requested at foo:: and you keep typing ba before the response arrives and then select the bar completion the result is foo::babar instead of foo::bar.

The savepoint was moved to the completion request in 5545aad. However, this actually introduces new issues as there is only a single savepoint tracked on the document at a given time. The completion popup restores the savepoint whenever a new completion option is selected. However, if a new completion request is created while the completion popup is open (idle timeout) and a completion is selected before a response is received, then the completion popup would restore the wrong savepoint (as the savepoint was overwritten during the request). To alleviate that problem the savepoint mechanism was changed in 7e42f25 so that we can track an arbitrary number of (reference counted) snapshots on the document.

I noticed that multiple completion requests could be active at the same time (either because of low idle timeout values or because the user manually invoked <C-x>). Because LSP servers can arbitrarily reorder non-dependent requests (and it's random which response helix processes first if both finish at the same time) these completions are essentially racing. This is not as much of an issue anymore with the previous fixes, but it can cause an outdated completion request to replace the current completion request (which is not good). It can also cause some flickering. Most importantly it makes the association of savepoints to the completion popup essentially non-deterministic and therefore impossible to repeat correctly (see below). Therefore, I introduced a mechanism to automatically discard any running completion requests when a new one is created in 0a22fd2. Completion requests are also discarded if view/document changes.

Repeating completions also operated under the assumption that completion trigger creates savepoints. As this is not the case anymore I had to add an InsertEvent for requesting completions. When this event is replayed a new savepoint is created and saved. When the TriggerCompletion event is replayed the savepoint from the last request is set as activated (simply moved to a different variable). From that point forward any CompletionAction events will use that savepoint. For this to work it's critical that the TriggerCompletion event always belongs to the last completion request (otherwise the wrong savepoint is used). That is ensured by dropping stale completion requests (explained in the previous paragraph).

Another small fix included here: Savepoints now also store the selection. This is important since we apply multicursor completions relative to the selections. Usually this wasn't necessary because cursors tend to map back to the same position. However, this isn't always the case and can cause weird behavior in edgecase. In particular with snippet support (#5864) some completion snippets can cause more complex changes/arbitrary selection changes, and it's important that the selection is fully reset.

pascalkuthe · 2023-03-03T15:06:34Z

I put together some test infrastructure locally (which includes an in-process LSP server skeleton and a way to replace servers with a local server). This infrastructure has allowed me to test this much more thoroughly and identify edgecases missed in #1819. However that infrastructure is quite a bit larger than the actual changes here. It's also still quite messy and therefore I have opted not to include it yet. Instead I will add that as a followup PR in the future.

Even with that testing infrastructure it was quite hard for me to reliably test the various issue that we have attributed to #1819 in the past as none of them contain a reproduction case and even with a finegrained testing infrastructure I need to understand the nature of the bug to create a test.

I am fairly certain that this should fix the reported issues, nonetheless it would be good if people could try this out locally to see if it fixes their problems as I can not reproduce most issues (likely because I have a very fast cpu and therfore am not able to type quicker than the server responds)

kirawi · 2023-03-03T16:06:42Z

Would introducing an artificial delay to the LSP skeleton provide a testcase?

pascalkuthe · 2023-03-03T16:14:05Z

Would introducing an artificial delay to the LSP skeleton provide a testcase?

I have done this and used it for developing this PR. Just an artifical delay is not enough as the typing and completion selection needs to be timed carefully. If you just send <tab><ret> immidietly the delayed completion hasn't arrived yet and you just windup inserting a tab. The problem is that this testecase is quite complicated and even just having any in process LS at all requires ~500loc of changes. I need to cleanup that code a lot more before that is ready for merging. The problem is also that that case only covers the most basic case: The user types before the completion response is processed. But this PR handles many more cases (described in the overview). Creating clean testcases for all of is quite a bit of work and I didn't feel that the couple messy tests I have right now justified pulling in all that extra testing code and would have turned this reasonably sized PR is something quite large.

I found that the tests also don't really help while reviewing as they are complex themselves. Therefore I choose to delay adding itnegration tests to separate PR . Th3ere is lots of other tests we likely want to add for LSP support and I want to make sure that the in-process LS sceleton is flexible enough to allow all of them (without requiring huge amounts of boilerplate)

helix-view/src/editor.rs

dead10ck

On a first pass, this looks reasonable, though I'd like to take another look. I'm not quite sure I've grokked the save points being a list now.

dead10ck · 2023-03-04T04:40:33Z

helix-term/src/ui/editor.rs

                    // then replay the inputs
                    for key in self.last_insert.1.clone() {
                        match key {
                            InsertEvent::Key(key) => self.insert_mode(cxt, key),
                            InsertEvent::CompletionApply(compl) => {
                                let (view, doc) = current!(cxt.editor);

-                                doc.restore(view);
+                                if let Some(last_savepoint) = last_savepoint.as_deref() {
+                                    doc.restore(view, last_savepoint);


Since this is just getting whatever the currently focused view is, that may not be the same as the view in the last save point, right? E.g. would this panic if we sent a completion request and then changed views?

Your general idea is correct, that's why I added a check to drop requests if the view or document has changed in 56bf37a so this edgecase is already covered by that

I think I'm not quite understanding how this works. It looks like those additional checks would prevent a new pop-up menu of completion items if we received a response after moving views. But this code looks like it's for handling repetition of a completion application, which in my understanding, does not trigger a new pop-up, just inserts the same text again. How does that extra check come into play here?

The repetition code only replays events recorded while in insertmode and the appropriate event only gets generated if the completion menu is actually opened (prevent by said check). Specifically there are three events:

Request completion: Generated immidietly when a completion is requested, generates a new savepoint when repeated

trigger completion: Generated only if the completion menu is actually opened, active the savepoiny described above (move it to a second variable)

completion even: triggered when selecting something in the menu, restores the active savepoint

The important part is that there are always two savepoints during repeating: an active savepoint actually used for completions and the savepoint of the current request.

So in the case you mentioned a completion request (and completion request event) would occur bit if the view/doc is switched then the request gets dropped after it finishes and no trigger completion event is ever generated. Therefore the savepoint belonging to the request is never activated and would be replaced by the next completion request event

Got it, this makes a lot more sense now, thanks for the write-up!

pascalkuthe · 2023-03-04T09:44:28Z

On a first pass, this looks reasonable, though I'd like to take another look. I'm not quite sure I've grokked the save points being a list now.

Simply think about save points being an Arc so the save point ris essentially a snapshot of the document that remains valid as long as you hold a refence to it. The list on the document is only an implemententation detail that API consumers dont need to think about. It's only needed to keep updating the savepoints as long as anybody holds a reference to them (hence weak references)

HealsCodes · 2023-03-04T15:14:08Z

It's unlikely that the issue is more frequent on 'slower CPUs' and feels a lot more related to what LSP and completion scenario you apply for testing.

Using the rust-analyser LSP I don't experience letter doubling or any other issues with completions; using pylsp together with pylsp-mypy and the rather massive PySide6 type hints I can reproduce doubling on pretty much any system ranging from a 9th gen i7 going all the way up to a Ryzen 9 7600x as well as on an M1 and M2 max with a reasonable probability of issues happening.

poliorcetics · 2023-03-04T16:52:23Z

Using the rust-analyser LSP I don't experience letter doubling or any other issues with completions

As another datapoint, with helix-master (not this PR) I experience it a lot even with rust-analyzer on bigger projects with more than 100k lines of Rust with hundreds of dependencies

pascalkuthe · 2023-03-04T19:45:59Z

The bugs this PR are fixed are observable if user input (or completion re-request) is faster than the response to completion requests. So a fast LSP server (RA) has the problem less often (usually only on large projects as @poliorcetics pointed out). CPU speed is very much a factor here. I don't really work on massive python codebases either (which has very slow LS) and I don't tend to type super fast (and have my idle timeout set to the default so pretty high) so that also contributes.

I am decently certain this fixes these issues regardless, the call for testing was just to make sure we get some more widespread testing across many different useage scenarios.

HealsCodes · 2023-03-04T19:52:05Z

Hence our suggestion to try something like PyLSP / a large project and not the very optimised rust LSP.

Trust me I am a very fast typer as well and the hardware I listed is beefy as well but using pylsp the problem is very much testable if you want to try.

(this isn't meant as criticism it's meant to help you reproduce the issue, beefy PC is not a reason for it not to happen as multiple people pointed out)

dead10ck · 2023-03-05T03:34:18Z

helix-view/src/document.rs

+    ///
+    /// An `id` that can uniquely identify this savepoint. A document
+    /// can only track a single savepoint at a time, to check if your savepoint still
+    /// applies simply


Looks like the sentence was never finished

Ah this documentation is actually outdated and refers to an old version of this function from a previous slightly different approach that I moved awaybfrom (refcount stored on the document, and savepoints tracked using ids, was very error prone because it was easy to forgot to drop a savepoint or use a dropped savepoint for a revert, the recounting is much more robust). Will fix that comment in a bit

dead10ck · 2023-03-05T03:57:08Z

The list on the document is only an implemententation detail that API consumers dont need to think about. It's only needed to keep updating the savepoints as long as anybody holds a reference to them (hence weak references)

Ok, I think I'm starting to understand. I was thinking about what happens if we have multiple save points in flight at a time, and at some point or another, we end up trying to restore them in succession? If I'm understanding this right, that would be fine since on each restore, we apply the invert transaction, and now this will compose the changes into each save point that's still valid.

pascalkuthe · 2023-03-05T07:16:19Z

The list on the document is only an implemententation detail that API consumers dont need to think about. It's only needed to keep updating the savepoints as long as anybody holds a reference to them (hence weak references)

Ok, I think I'm starting to understand. I was thinking about what happens if we have multiple save points in flight at a time, and at some point or another, we end up trying to restore them in succession? If I'm understanding this right, that would be fine since on each restore, we apply the invert transaction, and now this will compose the changes into each save point that's still valid.

Ah I see yeah that can be a bit confusing. Your idea is exactly right the revert transaction is simply added on top so the revert is essentially undone too.

Basically savepoints are only kept on the document as a performance optimization. A simpler/slower implantation would simply store a snapshot of the document (so a clone of the Rope) and compute a diff of that with the document contents to generate a transaction. The list on the document only exists so that we don't have to compute a full diff and instead "remember" what changed

rcorre · 2023-03-05T12:27:49Z

This problem is really easy to repro using the Godot LSP with helix main. I wasn't able to repro at all with this patch. Here's a comparison with idle_timeout=100ms (main first, this patch second):
https://asciinema.org/a/IJWMTBQiBFMYSwfvi9p7KQ2a2

pascalkuthe · 2023-03-06T00:01:55Z

This problem is really easy to repro using the Godot LSP with helix main. I wasn't able to repro at all with this patch. Here's a comparison with idle_timeout=100ms (main first, this patch second): asciinema.org/a/IJWMTBQiBFMYSwfvi9p7KQ2a2

thanks for testing :)

Currently, the selection is not saved/restored when completion checkpoints are applied. This is usually fine because undoing changes usually restores maps selections back in insert mode. But this is not always the case and especially problematic in the presence of multi-cursor completions (since completions are applied relative to the selection/cursor) and snippets (which can change the selection)

Fixing autocomplete required moving the document savepoint before the asynchronous completion request. However, this in turn causes new bugs: If the completion popup is open, the savepoint is restored when the popup closes (or another entry is selected). However, at that point a new completion request might already have been created which would have replaced the new savepoint (therefore leading to incorrectly applied complies). This commit fixes that bug by allowing in arbitrary number of savepoints to be tracked on the document. The savepoints are reference counted and therefore remain valid as long as any reference to them remains. Weak reference are stored on the document and any reference that can not be upgraded anymore (hence no strong reference remain) are automatically discarded.

Completion requests are computed asynchronously to avoid common micro freezes while editing. This means that once a completion request completes, the state of the editor might have changed. Currently, there is a check to ensure we are still in insert mode. However, we also need to ensure that the view and document hasn't changed to avoid accidentally using a savepoint with the wrong view/document. Furthermore, the editor might request a new completion while the previous completion request hasn't complemented yet. This can lead to weird flickering or an outdated completion request replacing a newer completion that has already completed (the LSP server is not required to process completion requests in order). This change also needed to ensure determinism/linear ordering so that completion popup always correspond to the last completion request.

Repeating completions currently crates a savepoint when a completion popup was triggered (so after the request completed). Just like for normal completions the savepoint must be created at the request. The occurrence of the completion request was previously not saved in `last_insert`. To that end a new `InsertEvent::RequestCompletion` variant has been added. When replayed this event creates a snapshot that is "actived" by the `TriggerCompletion` event and subsequently used during any `InsertEvent::CompletiuonApply` events.

dead10ck

Ok, I've been using it for a couple days now, and the code looks good to me too. Thanks @pascalkuthe!

StratilJakub · 2023-03-09T10:23:38Z

Just want to say thanks a lot for fixing this. This was quite a big issue with languages like JavaScript.

pascalkuthe added A-helix-term Area: Helix term improvements E-medium Call for participation: Experience needed to fix: Medium / intermediate S-waiting-on-review Status: Awaiting review from a maintainer. labels Mar 3, 2023

pascalkuthe force-pushed the completion_sync branch from 7985d5f to ffdd40d Compare March 3, 2023 15:02

pascalkuthe added the E-testing-wanted Call for participation: Experimental features suitable for testing label Mar 3, 2023

pascalkuthe force-pushed the completion_sync branch from ffdd40d to 45ac25f Compare March 3, 2023 15:10

pascalkuthe added this to the next milestone Mar 3, 2023

pascalkuthe mentioned this pull request Mar 3, 2023

completion fix #1819

Closed

rcorre reviewed Mar 3, 2023

View reviewed changes

helix-view/src/editor.rs Outdated Show resolved Hide resolved

pascalkuthe mentioned this pull request Mar 3, 2023

Auto completing deletes character cursor is currently on. #5761

Closed

pascalkuthe force-pushed the completion_sync branch from 45ac25f to e12b3be Compare March 4, 2023 02:06

dead10ck reviewed Mar 4, 2023

View reviewed changes

dead10ck reviewed Mar 5, 2023

View reviewed changes

pascalkuthe mentioned this pull request Mar 7, 2023

Fix multiline snippets for completions without offsets Urgau/helix#7

Closed

pascalkuthe added 2 commits March 7, 2023 18:37

create savepoint before requesting completion

989f189

pascalkuthe force-pushed the completion_sync branch from e12b3be to 226d77f Compare March 7, 2023 17:38

pascalkuthe added 2 commits March 7, 2023 19:12

pascalkuthe force-pushed the completion_sync branch from 226d77f to 0144c92 Compare March 7, 2023 18:12

dead10ck approved these changes Mar 8, 2023

View reviewed changes

archseer merged commit aabc8af into helix-editor:master Mar 9, 2023

This was referenced Mar 9, 2023

Suggestions often inserted in incorrect place #4851

Closed

Auto-complete corrupts when trying to wrap code in function call #5163

Closed

Pyright LSP completion lowercase to uppercase leaves first few characters #5469

Closed

pascalkuthe deleted the completion_sync branch March 9, 2023 11:48

pascalkuthe mentioned this pull request Mar 12, 2023

Discard LSP requests/notifcation that target outdated versions #6282

Merged

pascalkuthe mentioned this pull request Apr 4, 2023

correctly handle completion rerequest #6594

Merged

pascalkuthe mentioned this pull request Jul 16, 2023

Auto import doesn't work #2142

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix completion race conditions #6173

Fix completion race conditions #6173

pascalkuthe commented Mar 3, 2023

pascalkuthe commented Mar 3, 2023 •

edited

Loading

kirawi commented Mar 3, 2023

pascalkuthe commented Mar 3, 2023 •

edited

Loading

dead10ck left a comment

dead10ck Mar 4, 2023

pascalkuthe Mar 4, 2023 •

edited

Loading

dead10ck Mar 5, 2023

pascalkuthe Mar 5, 2023

dead10ck Mar 5, 2023

pascalkuthe commented Mar 4, 2023 •

edited

Loading

HealsCodes commented Mar 4, 2023

poliorcetics commented Mar 4, 2023

pascalkuthe commented Mar 4, 2023 •

edited

Loading

HealsCodes commented Mar 4, 2023 •

edited

Loading

dead10ck Mar 5, 2023

pascalkuthe Mar 5, 2023

dead10ck commented Mar 5, 2023

pascalkuthe commented Mar 5, 2023

rcorre commented Mar 5, 2023

pascalkuthe commented Mar 6, 2023

dead10ck left a comment

StratilJakub commented Mar 9, 2023

Fix completion race conditions #6173

Fix completion race conditions #6173

Conversation

pascalkuthe commented Mar 3, 2023

Overview

pascalkuthe commented Mar 3, 2023 • edited Loading

kirawi commented Mar 3, 2023

pascalkuthe commented Mar 3, 2023 • edited Loading

dead10ck left a comment

Choose a reason for hiding this comment

dead10ck Mar 4, 2023

Choose a reason for hiding this comment

pascalkuthe Mar 4, 2023 • edited Loading

Choose a reason for hiding this comment

dead10ck Mar 5, 2023

Choose a reason for hiding this comment

pascalkuthe Mar 5, 2023

Choose a reason for hiding this comment

dead10ck Mar 5, 2023

Choose a reason for hiding this comment

pascalkuthe commented Mar 4, 2023 • edited Loading

HealsCodes commented Mar 4, 2023

poliorcetics commented Mar 4, 2023

pascalkuthe commented Mar 4, 2023 • edited Loading

HealsCodes commented Mar 4, 2023 • edited Loading

dead10ck Mar 5, 2023

Choose a reason for hiding this comment

pascalkuthe Mar 5, 2023

Choose a reason for hiding this comment

dead10ck commented Mar 5, 2023

pascalkuthe commented Mar 5, 2023

rcorre commented Mar 5, 2023

pascalkuthe commented Mar 6, 2023

dead10ck left a comment

Choose a reason for hiding this comment

StratilJakub commented Mar 9, 2023

pascalkuthe commented Mar 3, 2023 •

edited

Loading

pascalkuthe commented Mar 3, 2023 •

edited

Loading

pascalkuthe Mar 4, 2023 •

edited

Loading

pascalkuthe commented Mar 4, 2023 •

edited

Loading

pascalkuthe commented Mar 4, 2023 •

edited

Loading

HealsCodes commented Mar 4, 2023 •

edited

Loading