-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancements to handling large files in remote scenarios #169433
Comments
This is a more general thing, relating to the server and file I/O. It's just as relevant for SSH and other remotes. The same cancellation behavior could also be useful if reading 'local' file that happen to be on a network share, too. |
On top of my head, I do not recall any queueing or serialisation, unless there is some on the connection layer (Alex would probably know). In local scenarios I have seen very large files (GBs) block operations in the renderer simply because the renderer is flooded with data and busy catching up. I recently got a PR (that I would not accept without further work) and started a discussion on this in #169288. However, I somewhat doubt it would help for remote scenarios because the latency alone is probably causing for sufficient yielding to free up the renderer. I wonder if here this is a different issue only for remote. Can you reproduce reliably?
Yes, covered in: #57585. We currently do not have cancellation propagating from closing a tab all the way down to the core file layer.
On desktop we have this: And a related setting: It would have assumed that the setting is applied on remote too, but would have to check. It is likely not applied in web, only desktop. |
Btw I cannot reproduce the issue about queueing. I uploaded a 100 MB file to my repo https://github.com/bpasero/test-ts with the name |
@bpasero What do you think about adding a new capability to FS providers where a FS provider could indicate that it is using the network for FS operations? I could add it to the resolvers API or we could try guessing it based on the latency we observe on the management connection. We could use that as a hint to prompt for confirmation when opening large files (>50MB?). Adding such a prompt for over-the-network remotes like remote tunnels, ssh or codespaces, together with fixing #57585 could improve the experience when using a remote over the network. |
This is what I am seeing. The large file is ~100MB. vscode-large-files.movThat's where these two observations from the original post came from:
|
Talked with @joaomoreno , distilled 3 things:
|
Is the file reading done by the backend pushing the bytes to the frontend or by the frontend pulling for the bytes? If it's done by the backend eagerly pushing the bytes, then it is possible that on the server side the entire 100MB file is read into memory within 2s and then those 100MB are basically queued on the stream, so no other server->client message can be sent until those 100MB are transmitted. |
The server (or main) process pushes to the client via events and there is no throttling. A community PR that was filed a few days before this issue (#169288) has more details how it works and the suggested change to let the client read large file in chunks instead. I however suggested to implement throttling on where the file is read to reduce the complexity of the change. This would mean large files will be read slower but the client would have a chance to do other things meanwhile. Ideally if we were to do that, some kind of discrete progress could be shown in the editor to tell the user how much more to load. |
Thanks for the clarification! You're correct, implementing this in a way which can utilise the available bandwidth is quite tricky 🤔 . For example, with a client - server latency of 200ms, if the client reads 64KB chunks at a time, then the maximum theoretical bandwidth is 192KB/s:
In order to utilise the available bandwidth, I think the IPC layer needs to either expose the underlying's socket |
Actually in the beginning we had a model where the client would pull the data from the server but then for speed optimization I changed to our current model to reduce the overhead of communication for large files. This means the throughput is higher but the pressure on the client is also larger. |
Having recently dealt extensively with this when implementing Basis' host relay (which leverages SSH under the hood), you could also look at a traffic management mechanism similar to what SSH does with its channels. SSH multiplexes multiple channels over a single connection. For each channel, each party has a receive window, whose initial size and and updates to the size are announced to the other side. Senders must not sent more bytes than what it knows a receiver window can handle, and must wait for the receiver to announce a larger window before sending any more data to the channel. Getting the sizing and adjustments right can be tricky--e.g. the receiver usually wants to adjust the receive window when some proportion of it has been used up, and would need to do so in an appropriately prioritizing way--but something like that could accomplish what you're after. Further reading: https://www.rfc-editor.org/rfc/rfc4254#section-5.2 |
I have extracted the large file confirmation to #170090, maybe @bamurtaugh and @alexdima you could try it out and provide some feedback, also on the limits I have picked based on web, remote or local. I am not sure if we want to turn this issue into improving the IPC communication given we have #57585 for cancelling when closing an editor? I am feeling that #57585 is a more complex risky change, so I am not pushing for it right now unless you really want me to. Adding cancellation to model resolution can have side-effects and regressions because the change will be complex and needs careful handling of multiple clients resolving async and waiting on the result. |
Thanks for adding the large file confirmation, @bpasero! I tested desktop, remote (WSL), and web, and it works well for me. One thought - would it make sense to amend the web remote (i.e. connected via Remote Tunnels in vscode.dev) limit to 50 as well?
Now that we have the confirmation in place, I think it makes sense for this issue to track improving the IPC communication in the long term, given the risk associated in the PR you link. I also say "long term" since I think at least providing this initial warning + confirmation flow will help users, and we can see if we get further relevant feedback here. We could now have two tracking issues for the remaining discussion:
|
To clarify, you mean if someone is using the web interface and the backend is a remote tunnel? If there is a way for me to figure this out, we can easily change the default, not sure who would know though how to figure that out.
Sounds good to me, but maybe we create a new issue to make tracking easier and have just that one problem in the issue description. |
Yes, that's what I was thinking. @connor4312 @aeschli I think this would be possible, but you'd both know best.
Makes sense, I'll open one now! |
I just opened #170645. |
I just wonder why the default would be any different if I am using web connected to a tunnel? If the default depends on the remote I am connected to then maybe this needs to be metadata a resolver can give back and not hardcoded? |
Maybe it doesn't have to be different 🤔 I was considering if being connected to compute would make that web + tunnel scenario better able to handle larger files than just a serverless web instance. But maybe being in the web, regardless of connected compute, is enough to say that the limit should be the lowest. |
It is actually funny that when you are in web+serverless opening very large files is no problemo, because no IPC is involved and no roundtrip to a serverful remote. It is with web+remote where things get ugly and slow and potentially expensive. |
I think this is reasonable: vscode/src/vs/platform/files/common/files.ts Lines 1357 to 1377 in 1ad2cb5
|
In discussion with @derekbekoe, @curib, @connor4312, @aeschli, @alexdima, and @chrisdias a few weeks ago, we discussed enhancements to how VS Code handles large files for remote scenarios.
Some observations from @derekbekoe:
The file is not displayed in the editor because it is either binary or uses an unsupported text encoding
) but should one be added for opening large files in remote scenarios?Tentatively assigning @connor4312 and @aeschli as we discussed this in the context of Remote Tunnels, but let me know if I should move to the core repo or update assignment.
The text was updated successfully, but these errors were encountered: