Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement progressive loading of PDFs #2719

Merged
merged 5 commits into from
Apr 18, 2013
Merged

Conversation

mduan
Copy link
Contributor

@mduan mduan commented Feb 13, 2013

This implements progressive loading of the PDF using an implementation similar to that described in #1108.

Note: This is still a work in progress, so it's incomplete and a bit messy. It is NOT ready to be merged.

Some of the things that still have to be done are:

  • Integrate w/ extension
  • For addon: fallback to fetching entire PDF if range requests are not supported
  • For viewer: fallback to fetching entire PDF if range requests are not supported
  • When there are no PDF chunks that are actively being requested, continue fetching PDF chunks sequentially
  • Fetching the current page first when the user changes pages
  • Optimization of parsing XRef tables/streams
  • Optimization of fetching of page tree
  • Possibly more optimizations specifically for linearized PDFs
  • Code cleanup
  • Unit tests, regression tests, etc...

@waddlesplash
Copy link
Contributor

Does this supersede PR #1923 (HTTP range request support)?

@mduan
Copy link
Contributor Author

mduan commented Feb 14, 2013

That's the goal. We cannot go forward w/ #1923 because it makes synchronous xhr requests, which will not work with the extension. This implementation uses asynchronous xhr requests.

@waddlesplash
Copy link
Contributor

How does this support search? If someone opens the find bar, does it notify them & then download the rest of the PDF?

@mduan
Copy link
Contributor Author

mduan commented Feb 20, 2013

@waddlesplash I suppose it would search the portions of the pdf that have been loaded so far.

@mduan
Copy link
Contributor Author

mduan commented Feb 26, 2013

/botio-windows preview

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_preview from @mduan received. Current queue size: 0

Live output at: http://107.22.172.223:8877/0d6791a73c05149/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/0d6791a73c05149/output.txt

Total script time: 0.14 mins

@mduan
Copy link
Contributor Author

mduan commented Mar 2, 2013

This PR is fairly stable now, but is still in need of some real-world testing.

I've uploaded a packaged XPI of this PR at: http://dl.dropbox.com/u/5961585/pdf.js.xpi.

If anyone has some time, it'd be appreciated if you could try it out and let me know if you encounter any issues or have any feedback.

@waddlesplash
Copy link
Contributor

Just installed. Will use it normally and I"ll tell you what happens :)

@mduan
Copy link
Contributor Author

mduan commented Mar 2, 2013

@waddlesplash: sweet, thanks :)

@Snuffleupagus
Copy link
Collaborator

I just tried it, and it doesn't work for me. I tried both your XPI file and checking out your repo and building the extension myself, but I just get the following error:
[11:03:54.460] TypeError: request.setUserData is not a function @ resource://pdf.js/web/viewer.js:175

Edit: My configuration: Windows 7 Professional SP1 (64-bit), Nightly 22 (Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20130301 Firefox/22.0 ID:20130301030909 CSet: 993d7aff3109)

@mduan
Copy link
Contributor Author

mduan commented Mar 3, 2013

@Snuffleupagus I get the same issue on Nightly, but it works for me on Aurora and earlier. Could you try with an earlier version of FF while I try to figure out the issue wtih Nightly?

@Snuffleupagus
Copy link
Collaborator

@mduan It works in Aurora 21 for me too, and I must say that I'm very impressed by the performance improvement!
For smaller files it seems to subjectively be on par with the standard version of pdf.js. But for larger files, there is a very noticeable difference, in some instances it seems to load the first pages in half the time compared to the standard pdf.js version.

}

messageHandler.on('RequestDataRange', function transportDataRange(args) {
FirefoxCom.request('requestDataRange', args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move FirefoxCom and window.addEventListener('message') out of api.js? Provide service/facade for this functionality, e.g. PdfDataTransportService with requestDataRange and onDataRange members. That will be useful if we decide implement one for web viewer.

@yurydelendik
Copy link
Contributor

setUserData is fixed by 07491f5, I think can cherry-pick this commit

@waddlesplash
Copy link
Contributor

Just encountered a wierd bug. When opening a PDF from a file:/// URL (which I've done on plenty of other PDFs with no problems), I got the "displayed incorrectly" message (which I had previously gotten on that PDF, see #2637) but this time, only the first and second pages (p2 is the one with the problem) were rendered. I could click the next/prev page buttons, use the scrollbar, etc. - just blank pages. Not even the spinning progress thingy was there.

After about ~5s, the spinning progress GIF appeared and all pages and text selection were rendered normally. The only evidence that it ever happened was that the page previews in the sidebar of the pages I had scrolled past remained blank (white, not black with dashed outline). The other pages in the preview that I had not scrolled past appeared OK.

@mduan
Copy link
Contributor Author

mduan commented Mar 4, 2013

@waddlesplash: Do you have a link to the PDF? I don't see it in the issue you referenced.

@waddlesplash
Copy link
Contributor

Here's a similar one with the same problem as mentioned in #2637: http://www.rosettastone.com/us_assets/documentation/RSV2_UG_Level_1_2_English_%28US%29.pdf

It has a different problem as well: on all pages except page 1, there are just two graphics objects (a "+" and a line). No spinning icon indicating loading.

@mduan
Copy link
Contributor Author

mduan commented Mar 4, 2013

@waddlesplash: I've fixed the problem with the "+" & line showing. In regards to there being a blank page with no spinning icon while it's rendering, the same thing happens without my progressive loading changes.

@waddlesplash
Copy link
Contributor

Hm. Then that PDF has different behavior than the other PDF I have on my hard drive, because it works fine with the stable version of pdf.js.

@bit
Copy link
Contributor

bit commented Mar 5, 2013

while Firefox makes range requests, Chrome makes one request for the full pdf and while that downloads the progress does not update. once the full pdf has been loaded another pending request for Range:bytes=0-0 is made and comes from the cache.

@Snuffleupagus
Copy link
Collaborator

Another related issue #1375?

@mduan
Copy link
Contributor Author

mduan commented Mar 5, 2013

@bit: The behaviour I saw from Chrome's PDF viewer is that it first requests the full PDF but cancels that request if it finds the server supports range requests. It then enough of the PDF to determine if it is linearized.

If the PDF is linearized, it will fetch the PDF in sequential chunks. Otherwise, it will issue a full request to fetch the PDF.

Linearized PDF: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
Non-linearized PDF: http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf

@mduan
Copy link
Contributor Author

mduan commented Mar 5, 2013

@Snuffleupagus: This PR will currently continue to download the rest of the PDF in sequential chunks while it's rendering.

@bit
Copy link
Contributor

bit commented Mar 6, 2013

@mduan , sorry my last comment might not have been clear, I was talking about pdf.js in Firefox and pdf.js in Chrome. Both with your patch. Was testing with pdf.js and a larger pdf on the same server, server supports range request. But for some reason pdf.js does not use range requests in Chrome while Firefox works as expected.

@mduan
Copy link
Contributor Author

mduan commented Mar 6, 2013

@bit: Is the server public? If so could you share a link?

The way that range requests work right now is that a request is made for the full pdf and once the headers for the full request are returned, we issue a range request. If the range request completes successfully, we cancel the full request and start issuing range requests unless the full request is already complete.

What I think might be happening is that the server is not returning the headers separately from the body. So when headers are received, the full request is already complete and so range requests are not issued.

@bit
Copy link
Contributor

bit commented Mar 7, 2013

@mduan
Copy link
Contributor Author

mduan commented Mar 8, 2013

@bit: Can you try with the latest changes? I think I may have fixed the issue.

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/0a8868c275ad385/output.txt

Total script time: 25.08 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

brendandahl added a commit that referenced this pull request Apr 18, 2013
Implement progressive loading of PDFs
@brendandahl brendandahl merged commit 49ff029 into mozilla:master Apr 18, 2013
@timvandermeij
Copy link
Contributor

I just wanted to say that I am very impressed by this great patch, @mduan! This looks like a fantastic feature to me.

@Snuffleupagus Snuffleupagus mentioned this pull request Apr 27, 2013
@aamironline
Copy link

What's the status of this ticket? It looks like all the sub tasks have been finished (check boxes are checked) still ticket says it is NOT complete!

@timvandermeij
Copy link
Contributor

This PR has the status 'merged', so it is indeed done. This has already been implemented for a very long time in PDF.js. Every other issue related to this has been closed, so I see no problems here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.