Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading large files using range requests is slow #5123

Closed
jordan-thoms opened this issue Aug 2, 2014 · 6 comments
Closed

Downloading large files using range requests is slow #5123

jordan-thoms opened this issue Aug 2, 2014 · 6 comments

Comments

@jordan-thoms
Copy link
Contributor

When loading files from a server which supports range requests, PDF.js seems to load the entire file in ~=64kb chunks, using parallel connections to the server.

However this is often very slow - I thinksince the small 64K requests incur a overhead, and making many of them at one time overloads the connection.

For example loading http://mozilla.github.io/pdf.js/web/viewer.html?file=https://d2tkmshiozsr4v.cloudfront.net/documents/files/000/041/350/original/ad41798557504b40d26e11b4f059c0945bb5ca14/750614main_NASA_FY_2014_Budget_Estimates-508.pdf?1401953041 is quite slow - and downloading that file in a single thread with wget is faster.

I would propose that the chunk size of loading is scaled depending on the size of the file - for a 32mb file like the previous one, 1-2mb would probably be much more reasonable. I'm considering submitting a PR for this, would it be of interest?

@timvandermeij
Copy link
Contributor

We have had some discussion about this before and #4739 was agreed to be a good solution. However, feel free to propose a patch for this so the developers can try both options.

@jordan-thoms
Copy link
Contributor Author

I've done some more investigation of this, and it looks like what's happening is that requestChunks in ChunkedStreamManager is being called only for single chunks (by requestRange, which in turn seems to be being called by walk()), which means the grouping code in there is inoperative - it can only group chunk requests together if they come in at the same time. I'm not sure if that's expected behavior for this?

I've played around with this and increasing the chunk size certainly does decrease loading time - the payoff is increased time to rendering the first page / other pages as scrolling during the load. The reduced number of requests also seems to improve general browser performance - I guess it doesn't really like dealing with the hundreds of simultaneous range requests.

So I guess this is probably not the best solution - I've increased the RANGE_CHUNK_SIZE to 512KB for our build at the moment, but if it's possible to keep the chunk size small while removing the performance issue that would be better.

@yurydelendik
Copy link
Contributor

Is it still a problem?

@jordan-thoms
Copy link
Contributor Author

Yeah, this is still an issue as far as I know. I'll retest with the latest
code. 64K chunks each with a new connection is really slow when loading a
e.g 100mb file.
On Sep 27, 2014 1:14 AM, "Yury Delendik" notifications@github.com wrote:

Is it still a problem?


Reply to this email directly or view it on GitHub
#5123 (comment).

@Snuffleupagus
Copy link
Collaborator

@jordan-thoms I think that the question was related to the recently landed PR #5263, which should make PDF.js less reliant on range requests.

@timvandermeij
Copy link
Contributor

Closing as fixed for now. If the problem remains, please let us know or open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants