Downloading large files using range requests is slow #5123

jordan-thoms · 2014-08-02T12:05:55Z

When loading files from a server which supports range requests, PDF.js seems to load the entire file in ~=64kb chunks, using parallel connections to the server.

However this is often very slow - I thinksince the small 64K requests incur a overhead, and making many of them at one time overloads the connection.

For example loading http://mozilla.github.io/pdf.js/web/viewer.html?file=https://d2tkmshiozsr4v.cloudfront.net/documents/files/000/041/350/original/ad41798557504b40d26e11b4f059c0945bb5ca14/750614main_NASA_FY_2014_Budget_Estimates-508.pdf?1401953041 is quite slow - and downloading that file in a single thread with wget is faster.

I would propose that the chunk size of loading is scaled depending on the size of the file - for a 32mb file like the previous one, 1-2mb would probably be much more reasonable. I'm considering submitting a PR for this, would it be of interest?

timvandermeij · 2014-08-02T12:08:32Z

We have had some discussion about this before and #4739 was agreed to be a good solution. However, feel free to propose a patch for this so the developers can try both options.

jordan-thoms · 2014-08-02T13:05:41Z

I've done some more investigation of this, and it looks like what's happening is that requestChunks in ChunkedStreamManager is being called only for single chunks (by requestRange, which in turn seems to be being called by walk()), which means the grouping code in there is inoperative - it can only group chunk requests together if they come in at the same time. I'm not sure if that's expected behavior for this?

I've played around with this and increasing the chunk size certainly does decrease loading time - the payoff is increased time to rendering the first page / other pages as scrolling during the load. The reduced number of requests also seems to improve general browser performance - I guess it doesn't really like dealing with the hundreds of simultaneous range requests.

So I guess this is probably not the best solution - I've increased the RANGE_CHUNK_SIZE to 512KB for our build at the moment, but if it's possible to keep the chunk size small while removing the performance issue that would be better.

yurydelendik · 2014-09-26T13:07:09Z

Is it still a problem?

jordan-thoms · 2014-09-29T04:23:04Z

Yeah, this is still an issue as far as I know. I'll retest with the latest
code. 64K chunks each with a new connection is really slow when loading a
e.g 100mb file.
On Sep 27, 2014 1:14 AM, "Yury Delendik" notifications@github.com wrote:

Is it still a problem?

—
Reply to this email directly or view it on GitHub
#5123 (comment).

Snuffleupagus · 2014-09-29T10:54:25Z

@jordan-thoms I think that the question was related to the recently landed PR #5263, which should make PDF.js less reliant on range requests.

timvandermeij · 2014-10-22T18:02:40Z

Closing as fixed for now. If the problem remains, please let us know or open a new issue.

timvandermeij added the 1-core label Aug 2, 2014

yurydelendik added the information-requested label Sep 26, 2014

timvandermeij closed this as completed Oct 22, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downloading large files using range requests is slow #5123

Downloading large files using range requests is slow #5123

jordan-thoms commented Aug 2, 2014

timvandermeij commented Aug 2, 2014

jordan-thoms commented Aug 2, 2014

yurydelendik commented Sep 26, 2014

jordan-thoms commented Sep 29, 2014

Snuffleupagus commented Sep 29, 2014

timvandermeij commented Oct 22, 2014

Downloading large files using range requests is slow #5123

Downloading large files using range requests is slow #5123

Comments

jordan-thoms commented Aug 2, 2014

timvandermeij commented Aug 2, 2014

jordan-thoms commented Aug 2, 2014

yurydelendik commented Sep 26, 2014

jordan-thoms commented Sep 29, 2014

Snuffleupagus commented Sep 29, 2014

timvandermeij commented Oct 22, 2014