-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix completion race conditions #6173
Conversation
7985d5f
to
ffdd40d
Compare
I put together some test infrastructure locally (which includes an in-process LSP server skeleton and a way to replace servers with a local server). This infrastructure has allowed me to test this much more thoroughly and identify edgecases missed in #1819. However that infrastructure is quite a bit larger than the actual changes here. It's also still quite messy and therefore I have opted not to include it yet. Instead I will add that as a followup PR in the future. Even with that testing infrastructure it was quite hard for me to reliably test the various issue that we have attributed to #1819 in the past as none of them contain a reproduction case and even with a finegrained testing infrastructure I need to understand the nature of the bug to create a test. I am fairly certain that this should fix the reported issues, nonetheless it would be good if people could try this out locally to see if it fixes their problems as I can not reproduce most issues (likely because I have a very fast cpu and therfore am not able to type quicker than the server responds) |
ffdd40d
to
45ac25f
Compare
Would introducing an artificial delay to the LSP skeleton provide a testcase? |
I have done this and used it for developing this PR. Just an artifical delay is not enough as the typing and completion selection needs to be timed carefully. If you just send I found that the tests also don't really help while reviewing as they are complex themselves. Therefore I choose to delay adding itnegration tests to separate PR . Th3ere is lots of other tests we likely want to add for LSP support and I want to make sure that the in-process LS sceleton is flexible enough to allow all of them (without requiring huge amounts of boilerplate) |
45ac25f
to
e12b3be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a first pass, this looks reasonable, though I'd like to take another look. I'm not quite sure I've grokked the save points being a list now.
// then replay the inputs | ||
for key in self.last_insert.1.clone() { | ||
match key { | ||
InsertEvent::Key(key) => self.insert_mode(cxt, key), | ||
InsertEvent::CompletionApply(compl) => { | ||
let (view, doc) = current!(cxt.editor); | ||
|
||
doc.restore(view); | ||
if let Some(last_savepoint) = last_savepoint.as_deref() { | ||
doc.restore(view, last_savepoint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is just getting whatever the currently focused view is, that may not be the same as the view in the last save point, right? E.g. would this panic if we sent a completion request and then changed views?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your general idea is correct, that's why I added a check to drop requests if the view or document has changed in 56bf37a so this edgecase is already covered by that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm not quite understanding how this works. It looks like those additional checks would prevent a new pop-up menu of completion items if we received a response after moving views. But this code looks like it's for handling repetition of a completion application, which in my understanding, does not trigger a new pop-up, just inserts the same text again. How does that extra check come into play here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repetition code only replays events recorded while in insertmode and the appropriate event only gets generated if the completion menu is actually opened (prevent by said check). Specifically there are three events:
- Request completion: Generated immidietly when a completion is requested, generates a new savepoint when repeated
- trigger completion: Generated only if the completion menu is actually opened, active the savepoiny described above (move it to a second variable)
- completion even: triggered when selecting something in the menu, restores the active savepoint
The important part is that there are always two savepoints during repeating: an active savepoint actually used for completions and the savepoint of the current request.
So in the case you mentioned a completion request (and completion request event) would occur bit if the view/doc is switched then the request gets dropped after it finishes and no trigger completion event is ever generated. Therefore the savepoint belonging to the request is never activated and would be replaced by the next completion request event
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, this makes a lot more sense now, thanks for the write-up!
Simply think about save points being an |
It's unlikely that the issue is more frequent on 'slower CPUs' and feels a lot more related to what LSP and completion scenario you apply for testing. Using the rust-analyser LSP I don't experience letter doubling or any other issues with completions; using pylsp together with pylsp-mypy and the rather massive PySide6 type hints I can reproduce doubling on pretty much any system ranging from a 9th gen i7 going all the way up to a Ryzen 9 7600x as well as on an M1 and M2 max with a reasonable probability of issues happening. |
As another datapoint, with helix-master (not this PR) I experience it a lot even with |
The bugs this PR are fixed are observable if user input (or completion re-request) is faster than the response to completion requests. So a fast LSP server (RA) has the problem less often (usually only on large projects as @poliorcetics pointed out). CPU speed is very much a factor here. I don't really work on massive python codebases either (which has very slow LS) and I don't tend to type super fast (and have my idle timeout set to the default so pretty high) so that also contributes. I am decently certain this fixes these issues regardless, the call for testing was just to make sure we get some more widespread testing across many different useage scenarios. |
Hence our suggestion to try something like PyLSP / a large project and not the very optimised rust LSP. Trust me I am a very fast typer as well and the hardware I listed is beefy as well but using pylsp the problem is very much testable if you want to try. (this isn't meant as criticism it's meant to help you reproduce the issue, beefy PC is not a reason for it not to happen as multiple people pointed out) |
helix-view/src/document.rs
Outdated
/// | ||
/// An `id` that can uniquely identify this savepoint. A document | ||
/// can only track a single savepoint at a time, to check if your savepoint still | ||
/// applies simply |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the sentence was never finished
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this documentation is actually outdated and refers to an old version of this function from a previous slightly different approach that I moved awaybfrom (refcount stored on the document, and savepoints tracked using ids, was very error prone because it was easy to forgot to drop a savepoint or use a dropped savepoint for a revert, the recounting is much more robust). Will fix that comment in a bit
Ok, I think I'm starting to understand. I was thinking about what happens if we have multiple save points in flight at a time, and at some point or another, we end up trying to restore them in succession? If I'm understanding this right, that would be fine since on each restore, we apply the invert transaction, and now this will compose the changes into each save point that's still valid. |
Ah I see yeah that can be a bit confusing. Your idea is exactly right the revert transaction is simply added on top so the revert is essentially undone too. Basically savepoints are only kept on the document as a performance optimization. A simpler/slower implantation would simply store a snapshot of the document (so a clone of the |
This problem is really easy to repro using the Godot LSP with helix |
thanks for testing :) |
Currently, the selection is not saved/restored when completion checkpoints are applied. This is usually fine because undoing changes usually restores maps selections back in insert mode. But this is not always the case and especially problematic in the presence of multi-cursor completions (since completions are applied relative to the selection/cursor) and snippets (which can change the selection)
e12b3be
to
226d77f
Compare
Fixing autocomplete required moving the document savepoint before the asynchronous completion request. However, this in turn causes new bugs: If the completion popup is open, the savepoint is restored when the popup closes (or another entry is selected). However, at that point a new completion request might already have been created which would have replaced the new savepoint (therefore leading to incorrectly applied complies). This commit fixes that bug by allowing in arbitrary number of savepoints to be tracked on the document. The savepoints are reference counted and therefore remain valid as long as any reference to them remains. Weak reference are stored on the document and any reference that can not be upgraded anymore (hence no strong reference remain) are automatically discarded.
Completion requests are computed asynchronously to avoid common micro freezes while editing. This means that once a completion request completes, the state of the editor might have changed. Currently, there is a check to ensure we are still in insert mode. However, we also need to ensure that the view and document hasn't changed to avoid accidentally using a savepoint with the wrong view/document. Furthermore, the editor might request a new completion while the previous completion request hasn't complemented yet. This can lead to weird flickering or an outdated completion request replacing a newer completion that has already completed (the LSP server is not required to process completion requests in order). This change also needed to ensure determinism/linear ordering so that completion popup always correspond to the last completion request.
Repeating completions currently crates a savepoint when a completion popup was triggered (so after the request completed). Just like for normal completions the savepoint must be created at the request. The occurrence of the completion request was previously not saved in `last_insert`. To that end a new `InsertEvent::RequestCompletion` variant has been added. When replayed this event creates a snapshot that is "actived" by the `TriggerCompletion` event and subsequently used during any `InsertEvent::CompletiuonApply` events.
226d77f
to
0144c92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've been using it for a couple days now, and the code looks good to me too. Thanks @pascalkuthe!
Just want to say thanks a lot for fixing this. This was quite a big issue with languages like JavaScript. |
Supercedes #1819
This PR resolves multiple problems with the completions.
The root cause of all of these issues were shortcomings in the way savepoints are handled.
Savepoints are set when a completion is triggered so that the changes sent by the server
can be applied correctly (refer to the same document).
The exact explanation is a bit tricky. I will provide an overview here but looking at the commits individually also helps. Every commit has a separate explanation/justification and usually just contains a couple of LOC of changes and therefore makes the explanation more digestible.
Overview
The savepoint is currently created when the completion request response was received. However, a (spec compliant) language server computes the completion response for the document state when it receives the completion request. That means that any changes between sending the completion requests to the server and receiving a response can cause all kinds of issues. For example if completion is requested at
foo::
and you keep typingba
before the response arrives and then select thebar
completion the result isfoo::babar
instead offoo::bar
.The savepoint was moved to the completion request in 5545aad. However, this actually introduces new issues as there is only a single savepoint tracked on the document at a given time. The completion popup restores the savepoint whenever a new completion option is selected. However, if a new completion request is created while the completion popup is open (idle timeout) and a completion is selected before a response is received, then the completion popup would restore the wrong savepoint (as the savepoint was overwritten during the request). To alleviate that problem the savepoint mechanism was changed in 7e42f25 so that we can track an arbitrary number of (reference counted) snapshots on the document.
I noticed that multiple completion requests could be active at the same time (either because of low idle timeout values or because the user manually invoked
<C-x>
). Because LSP servers can arbitrarily reorder non-dependent requests (and it's random which response helix processes first if both finish at the same time) these completions are essentially racing. This is not as much of an issue anymore with the previous fixes, but it can cause an outdated completion request to replace the current completion request (which is not good). It can also cause some flickering. Most importantly it makes the association of savepoints to the completion popup essentially non-deterministic and therefore impossible to repeat correctly (see below). Therefore, I introduced a mechanism to automatically discard any running completion requests when a new one is created in 0a22fd2. Completion requests are also discarded if view/document changes.Repeating completions also operated under the assumption that completion trigger creates savepoints. As this is not the case anymore I had to add an
InsertEvent
for requesting completions. When this event is replayed a new savepoint is created and saved. When theTriggerCompletion
event is replayed the savepoint from the last request is set as activated (simply moved to a different variable). From that point forward anyCompletionAction
events will use that savepoint. For this to work it's critical that theTriggerCompletion
event always belongs to the last completion request (otherwise the wrong savepoint is used). That is ensured by dropping stale completion requests (explained in the previous paragraph).Another small fix included here: Savepoints now also store the selection. This is important since we apply multicursor completions relative to the selections. Usually this wasn't necessary because cursors tend to map back to the same position. However, this isn't always the case and can cause weird behavior in edgecase. In particular with snippet support (#5864) some completion snippets can cause more complex changes/arbitrary selection changes, and it's important that the selection is fully reset.