Enhancements to handling large files in remote scenarios #169433

bamurtaugh · 2022-12-16T19:25:20Z

In discussion with @derekbekoe, @curib, @connor4312, @aeschli, @alexdima, and @chrisdias a few weeks ago, we discussed enhancements to how VS Code handles large files for remote scenarios.

Some observations from @derekbekoe:

Looks like VS Code blocks/queues file downloads so if I try to open a large file then a small file afterwards, the small file doesn’t open until the large file has finished?
Related to the above, it appears it’s not possible to cancel a large file download? Closing the file tab doesn’t appear to stop the download.
There is a warning for some file types (like The file is not displayed in the editor because it is either binary or uses an unsupported text encoding) but should one be added for opening large files in remote scenarios?

Tentatively assigning @connor4312 and @aeschli as we discussed this in the context of Remote Tunnels, but let me know if I should move to the core repo or update assignment.

The text was updated successfully, but these errors were encountered:

connor4312 · 2022-12-16T19:27:07Z

This is a more general thing, relating to the server and file I/O. It's just as relevant for SSH and other remotes. The same cancellation behavior could also be useful if reading 'local' file that happen to be on a network share, too.

bpasero · 2022-12-17T07:23:42Z

Looks like VS Code blocks/queues file downloads so if I try to open a large file then a small file afterwards, the small file doesn’t open until the large file has finished?

On top of my head, I do not recall any queueing or serialisation, unless there is some on the connection layer (Alex would probably know). In local scenarios I have seen very large files (GBs) block operations in the renderer simply because the renderer is flooded with data and busy catching up. I recently got a PR (that I would not accept without further work) and started a discussion on this in #169288. However, I somewhat doubt it would help for remote scenarios because the latency alone is probably causing for sufficient yielding to free up the renderer. I wonder if here this is a different issue only for remote.

Can you reproduce reliably?

Related to the above, it appears it’s not possible to cancel a large file download? Closing the file tab doesn’t appear to stop the download.

Yes, covered in: #57585. We currently do not have cancellation propagating from closing a tab all the way down to the core file layer.

There is a warning for some file types (like The file is not displayed in the editor because it is either binary or uses an unsupported text encoding) but should one be added for opening large files in remote scenarios?

On desktop we have this:

And a related setting:

It would have assumed that the setting is applied on remote too, but would have to check. It is likely not applied in web, only desktop.

bpasero · 2022-12-17T16:44:08Z

Btw I cannot reproduce the issue about queueing. I uploaded a 100 MB file to my repo https://github.com/bpasero/test-ts with the name xaa.txt. On github.dev, other files load snappy even while the large one loads. With a Codespace, it is noticably slower, but files still open for me while the large one loads.

alexdima · 2022-12-19T15:24:22Z

@bpasero What do you think about adding a new capability to FS providers where a FS provider could indicate that it is using the network for FS operations? I could add it to the resolvers API or we could try guessing it based on the latency we observe on the management connection. We could use that as a hint to prompt for confirmation when opening large files (>50MB?). Adding such a prompt for over-the-network remotes like remote tunnels, ssh or codespaces, together with fixing #57585 could improve the experience when using a remote over the network.

derekbekoe · 2022-12-19T19:32:35Z

Btw I cannot reproduce the issue about queueing.

This is what I am seeing. The large file is ~100MB.

vscode-large-files.mov

That's where these two observations from the original post came from:

Looks like VS Code blocks/queues file downloads so if I try to open a large file then a small file afterwards, the small file doesn’t open until the large file has finished?

Related to the above, it appears it’s not possible to cancel a large file download? Closing the file tab doesn’t appear to stop the download.

bpasero · 2022-12-20T10:32:18Z

Talked with @joaomoreno , distilled 3 things:

we do not cancel when you close a tab but we cover that in Cancel file operations from opening an editor when closing it #57585
we can show a warning when opening large files (maybe configurable) where we have custom defaults based on using vscode-remote scheme or not, I think we do not need a new capability for this (see [1] for example)
the remote connection seems to be busy dealing exclusively with the response from the remote not allowing other IPC messages, so maybe we need some kind of quality of service in our IPC layer to allow for other IPC messages to get through (I am not an expert in our remote management connection implementation)

[1] Warning in Editor

alexdima · 2022-12-21T22:21:24Z

the remote connection seems to be busy dealing exclusively with the response from the remote not allowing other IPC messages, so maybe we need some kind of quality of service in our IPC layer to allow for other IPC messages to get through

Is the file reading done by the backend pushing the bytes to the frontend or by the frontend pulling for the bytes? If it's done by the backend eagerly pushing the bytes, then it is possible that on the server side the entire 100MB file is read into memory within 2s and then those 100MB are basically queued on the stream, so no other server->client message can be sent until those 100MB are transmitted.

bpasero · 2022-12-22T06:10:23Z

The server (or main) process pushes to the client via events and there is no throttling. A community PR that was filed a few days before this issue (#169288) has more details how it works and the suggested change to let the client read large file in chunks instead. I however suggested to implement throttling on where the file is read to reduce the complexity of the change. This would mean large files will be read slower but the client would have a chance to do other things meanwhile. Ideally if we were to do that, some kind of discrete progress could be shown in the editor to tell the user how much more to load.

alexdima · 2022-12-22T08:18:46Z

Thanks for the clarification! You're correct, implementing this in a way which can utilise the available bandwidth is quite tricky 🤔 . For example, with a client - server latency of 200ms, if the client reads 64KB chunks at a time, then the maximum theoretical bandwidth is 192KB/s:

0ms: the client asks for the first 64KB chunk
200ms: the server receives the request and sends the first chunk
400ms: the client receives the first 64KB chunk and asks for the next chunk
600ms: the server receives the request and sends the second chunk
800ms: the client receives the second 64KB chunk and asks for the next chunk
1000ms: the server receives the request and sends the third chunk
1200ms: the client receives the third 64KB chunk
...

In order to utilise the available bandwidth, I think the IPC layer needs to either expose the underlying's socket drain events, write buffer size, etc. or maybe we can support returning a Stream or Promise<Stream> natively in the IPC layer and do the right thing internally to align the writing speed with the TCP/IP socket writing speed, all while allowing other IPC messages to go through. cc @joaomoreno

bpasero · 2022-12-22T09:13:40Z

Actually in the beginning we had a model where the client would pull the data from the server but then for speed optimization I changed to our current model to reduce the overhead of communication for large files. This means the throughput is higher but the pressure on the client is also larger.

bpasero · 2022-12-23T15:41:12Z

Current progress for blocking large files:

This would be driven by a new workbench.editorLargeFileConfirmation setting that is set conditionally:

web: 10mb
remote: 50mb
otherwise: 1024mb

connor4312 · 2022-12-26T18:22:27Z

In order to utilise the available bandwidth, I think the IPC layer needs to either expose the underlying's socket drain events, write buffer size, etc. or maybe we can support returning a Stream or Promise natively in the IPC layer and do the right thing internally to align the writing speed with the TCP/IP socket writing speed, all while allowing other IPC messages to go through.

Having recently dealt extensively with this when implementing Basis' host relay (which leverages SSH under the hood), you could also look at a traffic management mechanism similar to what SSH does with its channels.

SSH multiplexes multiple channels over a single connection. For each channel, each party has a receive window, whose initial size and and updates to the size are announced to the other side. Senders must not sent more bytes than what it knows a receiver window can handle, and must wait for the receiver to announce a larger window before sending any more data to the channel.

Getting the sizing and adjustments right can be tricky--e.g. the receiver usually wants to adjust the receive window when some proportion of it has been used up, and would need to do so in an appropriately prioritizing way--but something like that could accomplish what you're after.

Further reading: https://www.rfc-editor.org/rfc/rfc4254#section-5.2

bpasero · 2022-12-27T10:15:27Z

I have extracted the large file confirmation to #170090, maybe @bamurtaugh and @alexdima you could try it out and provide some feedback, also on the limits I have picked based on web, remote or local.

I am not sure if we want to turn this issue into improving the IPC communication given we have #57585 for cancelling when closing an editor? I am feeling that #57585 is a more complex risky change, so I am not pushing for it right now unless you really want me to. Adding cancellation to model resolution can have side-effects and regressions because the change will be complex and needs careful handling of multiple clients resolving async and waiting on the result.

bamurtaugh · 2023-01-03T18:16:02Z

Thanks for adding the large file confirmation, @bpasero! I tested desktop, remote (WSL), and web, and it works well for me.

One thought - would it make sense to amend the web remote (i.e. connected via Remote Tunnels in vscode.dev) limit to 50 as well?

I am not sure if we want to turn this issue into improving the IPC communication given we have #57585 for cancelling when closing an editor?

Now that we have the confirmation in place, I think it makes sense for this issue to track improving the IPC communication in the long term, given the risk associated in the PR you link. I also say "long term" since I think at least providing this initial warning + confirmation flow will help users, and we can see if we get further relevant feedback here. We could now have two tracking issues for the remaining discussion:

This issue for IPC communication
Cancel file operations from opening an editor when closing it #57585 for cancelling a file open

bpasero · 2023-01-03T19:20:35Z

One thought - would it make sense to amend the web remote (i.e. connected via Remote Tunnels in vscode.dev) limit to 50 as well?

To clarify, you mean if someone is using the web interface and the backend is a remote tunnel? If there is a way for me to figure this out, we can easily change the default, not sure who would know though how to figure that out.

This issue for IPC communication

Sounds good to me, but maybe we create a new issue to make tracking easier and have just that one problem in the issue description.

bamurtaugh · 2023-01-05T18:04:05Z

To clarify, you mean if someone is using the web interface and the backend is a remote tunnel? If there is a way for me to figure this out, we can easily change the default, not sure who would know though how to figure that out.

Yes, that's what I was thinking. @connor4312 @aeschli I think this would be possible, but you'd both know best.

Sounds good to me, but maybe we create a new issue to make tracking easier and have just that one problem in the issue description.

Makes sense, I'll open one now!

bamurtaugh · 2023-01-05T18:08:23Z

I just opened #170645.

bpasero · 2023-01-05T18:20:28Z

I just wonder why the default would be any different if I am using web connected to a tunnel? If the default depends on the remote I am connected to then maybe this needs to be metadata a resolver can give back and not hardcoded?

bamurtaugh · 2023-01-05T18:23:33Z

Maybe it doesn't have to be different 🤔 I was considering if being connected to compute would make that web + tunnel scenario better able to handle larger files than just a serverless web instance. But maybe being in the web, regardless of connected compute, is enough to say that the limit should be the lowest.

bpasero · 2023-01-05T18:24:43Z

It is actually funny that when you are in web+serverless opening very large files is no problemo, because no IPC is involved and no roundtrip to a serverful remote. It is with web+remote where things get ugly and slow and potentially expensive.

bpasero · 2023-01-06T13:57:09Z

I think this is reasonable:

vscode/src/vs/platform/files/common/files.ts

Lines 1357 to 1377 in 1ad2cb5

    
           export function getLargeFileConfirmationLimit(remoteAuthority?: string): number { 
        
           	// These numbers are picked somewhat randomly but with the intent to: 
        
           	// - avoid performance issues (in web) 
        
           	// - avoid network cost (in remote) 
        
           	// - have a good default experinece in local desktop 
        
           	if (isWeb) { 
        
           		if (remoteAuthority) { 
        
           			return 10 * ByteSize.MB; 
        
           		} 
        
           		return 50 * ByteSize.MB; 
        
           	} 
        
           	if (remoteAuthority) { 
        
           		return 100 * ByteSize.MB; 
        
           	} 
        
           	return 1024 * ByteSize.MB; 
        
           }

bamurtaugh added feature-request Request for new features or functionality remote Remote system operations issues labels Dec 16, 2022

bamurtaugh added this to the Backlog milestone Dec 16, 2022

bamurtaugh assigned connor4312 and aeschli Dec 16, 2022

connor4312 assigned bpasero and alexdima and unassigned connor4312 and aeschli Dec 16, 2022

connor4312 transferred this issue from microsoft/vscode-remote-release Dec 16, 2022

derekbekoe mentioned this issue Dec 16, 2022

[Code Server] Better connectivity for users in other regions microsoft/vscode-remote-release#7242

Closed

bpasero added the file-io File I/O label Dec 17, 2022

bpasero added the workbench-editors Managing of editor widgets in workbench window label Dec 21, 2022

kieferrm mentioned this issue Jan 2, 2023

Iteration Plan for January 2023 #170413

Closed

bamurtaugh mentioned this issue Jan 5, 2023

Remote IPC layer improvements #170645

Open

bpasero added a commit that referenced this issue Jan 6, 2023

Enhancements to handling large files in remote scenarios (fix #169433)

1ad2cb5

bpasero modified the milestones: Backlog, January 2023 Jan 6, 2023

bpasero closed this as completed in 91687c4 Jan 6, 2023

vscodenpa added unreleased Patch has not yet been released in VS Code Insiders insiders-released Patch has been released in VS Code Insiders and removed unreleased Patch has not yet been released in VS Code Insiders labels Jan 6, 2023

bpasero added the on-testplan label Jan 24, 2023

gjsjohnmurray mentioned this issue Jan 26, 2023

【Big bug】Every time you copy a large file, it will open! ! ! #172490

Closed

github-actions bot locked and limited conversation to collaborators Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancements to handling large files in remote scenarios #169433

Enhancements to handling large files in remote scenarios #169433

bamurtaugh commented Dec 16, 2022

connor4312 commented Dec 16, 2022 •

edited

Loading

bpasero commented Dec 17, 2022

bpasero commented Dec 17, 2022

alexdima commented Dec 19, 2022

derekbekoe commented Dec 19, 2022 •

edited

Loading

bpasero commented Dec 20, 2022 •

edited

Loading

alexdima commented Dec 21, 2022

bpasero commented Dec 22, 2022 •

edited

Loading

alexdima commented Dec 22, 2022 •

edited

Loading

bpasero commented Dec 22, 2022

bpasero commented Dec 23, 2022

connor4312 commented Dec 26, 2022 •

edited

Loading

bpasero commented Dec 27, 2022

bamurtaugh commented Jan 3, 2023

bpasero commented Jan 3, 2023

bamurtaugh commented Jan 5, 2023

bamurtaugh commented Jan 5, 2023

bpasero commented Jan 5, 2023

bamurtaugh commented Jan 5, 2023

bpasero commented Jan 5, 2023 •

edited

Loading

bpasero commented Jan 6, 2023

Enhancements to handling large files in remote scenarios #169433

Enhancements to handling large files in remote scenarios #169433

Comments

bamurtaugh commented Dec 16, 2022

connor4312 commented Dec 16, 2022 • edited Loading

bpasero commented Dec 17, 2022

bpasero commented Dec 17, 2022

alexdima commented Dec 19, 2022

derekbekoe commented Dec 19, 2022 • edited Loading

bpasero commented Dec 20, 2022 • edited Loading

alexdima commented Dec 21, 2022

bpasero commented Dec 22, 2022 • edited Loading

alexdima commented Dec 22, 2022 • edited Loading

bpasero commented Dec 22, 2022

bpasero commented Dec 23, 2022

connor4312 commented Dec 26, 2022 • edited Loading

bpasero commented Dec 27, 2022

bamurtaugh commented Jan 3, 2023

bpasero commented Jan 3, 2023

bamurtaugh commented Jan 5, 2023

bamurtaugh commented Jan 5, 2023

bpasero commented Jan 5, 2023

bamurtaugh commented Jan 5, 2023

bpasero commented Jan 5, 2023 • edited Loading

bpasero commented Jan 6, 2023

connor4312 commented Dec 16, 2022 •

edited

Loading

derekbekoe commented Dec 19, 2022 •

edited

Loading

bpasero commented Dec 20, 2022 •

edited

Loading

bpasero commented Dec 22, 2022 •

edited

Loading

alexdima commented Dec 22, 2022 •

edited

Loading

connor4312 commented Dec 26, 2022 •

edited

Loading

bpasero commented Jan 5, 2023 •

edited

Loading