Network operations for packages should be done in parallel #467

metajack · 2014-08-28T19:05:04Z

Currently this is serialized and is quite slow when you have many dependencies.

Note that I expect initial fetches to be slow, but this is the case where the git repos are all unchanged. This is happening to me all the time because I'm constantly failing builds while getting Servo ported over.

alexcrichton · 2014-09-07T19:02:03Z

Now that a Cargo.lock is generated as soon as resolve is completed, I don't think that this is as much of an issue any more. I'm worried about updating git repos in parallel because they're almost always I/O bound and I'm not sure that you're going to get much higher throughput by doing it all in parallel.

I'm going to close this for now, but if it comes back as a pressing issue, then we can definitely reopen!

metajack · 2014-09-08T00:07:30Z

I think you make a few assumption there about it being I/O bound that may not hold.

Is it likely network bound not disk bound.
The network bottleneck is for a single source location, but the git repos may be synced from several sources.

The fact that browsers make N requests in parallel from the same domain leads me to believe that being network bound on a single socket != being network bound.

#2 won't help servo, since everything is pointed at github. But parallel sockets may still have higher throughput.

That said, we did this in serial before, so this is not really a pressing issue.

alexcrichton · 2014-10-16T19:05:47Z

Reopening, I'd like to track this into the future.

SimonSapin · 2014-12-10T14:05:54Z

I believe that browsers making N requests in parallel is more about latency (for opening a new connection or making a new request) than about throughput. However, git-clone does multiple things after fetching packfiles (bound by network throughput), it extracts them and resolves them (bound by CPU or disk), so having multiple git-clones at different phases at the same time could help. Maybe.

alexcrichton · 2015-01-14T20:53:16Z

Clarifying that I'd like to download packages in parallel in as well. I'd basically like to have the ability to perform all of our network operations in parallel, even if we don't necessarily take advantage of it by default.

almereyda · 2016-01-20T00:59:39Z

If multiple dependencies share the same origin and are served via HTTP/2, a parallelized network IO stack should be able to make use of multiplexing.

ishitatsuyuki · 2018-03-05T07:13:13Z

I'm planning to take this. Some insights:

The point of parallelizing (or multiplexing) is to reduce the impact of network latency. Multiple connections has its own pros and cons, but I'm planning to maintain only one connection per server, as they are supported below.

https://crates.io/ (Heroku, used for redirects) probably have a correct HTTP 1.1 pipelining implementation.
https://static.crates.io/ (S3 CloudFront, used for tarballs) supports H2.

Another thing I'd like to do is to add file sizes to crates.io-index, so we can know the total download size beforehand and display a united progress bar. I'm not sure if the migration can be done smoothly though.

ishitatsuyuki · 2018-09-26T07:56:32Z

Close?

SimonSapin · 2018-09-26T10:22:53Z

Why?

dwijnand · 2018-09-26T10:34:28Z

Because of #6005 (is the question-I'm not sure, myself)

SimonSapin · 2018-09-26T12:26:47Z

Oh, nice! I haven’t tried it yet but yes, it seems this can be closed as fixed by #6005. (The back link shows up just above @ishitatsuyuki’s comment in the web view, but not in email notifications.)

alexcrichton · 2018-09-26T14:06:25Z

Ah yes, we can indeed close! There's also a call for testing on internals.

alexcrichton closed this as completed Sep 7, 2014

alexcrichton reopened this Oct 16, 2014

alexcrichton added A-git Area: anything dealing with git E-hard Experience: Hard labels Oct 20, 2014

alexcrichton changed the title ~~updating git repos should be done in parallel~~ Network operations for packages should be done in parallel Jan 14, 2015

alexcrichton mentioned this issue Sep 11, 2018

Download crates in parallel with HTTP/2 #6005

Merged

alexcrichton closed this as completed Sep 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network operations for packages should be done in parallel #467

Network operations for packages should be done in parallel #467

metajack commented Aug 28, 2014

alexcrichton commented Sep 7, 2014

metajack commented Sep 8, 2014

alexcrichton commented Oct 16, 2014

SimonSapin commented Dec 10, 2014

alexcrichton commented Jan 14, 2015

almereyda commented Jan 20, 2016

ishitatsuyuki commented Mar 5, 2018

ishitatsuyuki commented Sep 26, 2018

SimonSapin commented Sep 26, 2018

dwijnand commented Sep 26, 2018

SimonSapin commented Sep 26, 2018

alexcrichton commented Sep 26, 2018

Network operations for packages should be done in parallel #467

Network operations for packages should be done in parallel #467

Comments

metajack commented Aug 28, 2014

alexcrichton commented Sep 7, 2014

metajack commented Sep 8, 2014

alexcrichton commented Oct 16, 2014

SimonSapin commented Dec 10, 2014

alexcrichton commented Jan 14, 2015

almereyda commented Jan 20, 2016

ishitatsuyuki commented Mar 5, 2018

ishitatsuyuki commented Sep 26, 2018

SimonSapin commented Sep 26, 2018

dwijnand commented Sep 26, 2018

SimonSapin commented Sep 26, 2018

alexcrichton commented Sep 26, 2018