-
Notifications
You must be signed in to change notification settings - Fork 29.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic classification: more aggressive debounce and perf considerations #129607
Comments
💯 We have to be very very careful of running an expensive operation in the same thread where the user is typing in (see #27378 for example to get some inspiration). At the minimum we should debounce in a way that continuous typing further delays the computation up to the point where the user stops typing for a moment. This becomes even more important if we decide to make language detection enabled by default. Further thoughts:
|
Thanks @bpasero for bringing this to my attention. I don't think having it running in main thread scales in terms of performance, especially now we are also adding bracket matching on typing. Adding a few ms doesn't seem to hurt typing but if everyone is doing so, a single type can easily exceed 16ms and we can't finish rendering in one single frame. Note that we already have tokenization and multiple event listeners (auto save trigger), which take a couple of ms each. |
All good points, and I think we should explore all our options here.
|
I can chat with @TylerLeonhardt on how to move it to a worker, we already have workers for word and link computation. btw, some data on the perf of typing now with latest Insiders, which we should improve
@TylerLeonhardt note that the text model supports fetch string within a range which is more efficient than getting the whole string. |
Yeah this is an unfortunate consequence that is specific to untitled files where we use the content of the untitled buffer as title for the tab. If you want to file an issue for it where we track offenders, we can look into that individually 👍 I think it is limited to changes to the first line only though, not any line after that. So might not be the issue for typical usages. |
Updated what's done and what's not |
Nice, I think from my end the biggest ask is to support streaming to prevent producing a string of the editor contents and maybe cancellation support, though I am not entirely sure that is even possible in the library itself... |
I'm trying to understand how streaming would work if the work is done in a Worker. Wouldn't the data need to be serialized to be transferred to the worker and lose the benefit of streaming? Maybe I'm misunderstanding. |
Yeah you are right, that suggestion is from the time before we even thought about workers... Still, even when IPC is involved you can avoid creating a |
The encoding library we use works with that model: vscode/src/vs/workbench/services/textfile/common/encoding.ts Lines 181 to 182 in 83f4dfd
...
|
I'm using the same code that the EditorWorker uses which sends over changed events that look like this: export interface IModelContentChange {
/**
* The range that got replaced.
*/
readonly range: IRange;
/**
* The offset of the range that got replaced.
*/
readonly rangeOffset: number;
/**
* The length of the range that got replaced.
*/
readonly rangeLength: number;
/**
* The new text for the range.
*/
readonly text: string;
} And the So we're already only sending chunks over to the worker when it changes...though not the chunks you're referring to I think. If we're talking about optimizing that further it should probably be for all of our editor worker code.
The model doesn't have any internal concepts of what the contents of a file looks like (in otherwords it doesn't keep state) so we need to send as much content as we can to it so that we have the best chance of guessing correctly. Running the model on anything smaller than the whole contents of the file I worry will come with a steep decrease in confidence. |
Oh yeah that is good, wasn't aware you also benefit from the mirror model code. And yeah, agree the library needs to run detection on the entire contents for a confident decision but should probably have an upper limit of how large these contents can be. Given so you could avoid allocating the string of the entire file (which can be GBs) and only use the chunk size that is supported at max. |
It does :) 100000 characters: because after that, tfjs throws some error: #129597 ideally the string I create only creates a 100000 character string instead of the entire file... but I believe the file was represented as lines so I'd have to roll my own "get first 100000 characters of this string" function... which I haven't done yet. |
Yeah the closet I see is I guess one solution could be to |
Moving the other things to the backlog as I don't think we need to do them right this second. I'm going to have a TPI for this so those testers can try out perf. |
Testing #129436
On my machine it takes around
300ms
for the language classification to compute. However some of our users have a slower machine, so here are some ideas on how to make sure to not slow down when users that are just scribbling in an untitled editor.n
times and every time you are unsuccessful maybe stop trying and only on larger changes runBoth 3 and 4 require some knowledge about what is a bigger change to a file. We can figure something out. Maybe @bpasero has ideas
The text was updated successfully, but these errors were encountered: