Implement progressive loading of PDFs #2719

mduan · 2013-02-13T18:54:39Z

This implements progressive loading of the PDF using an implementation similar to that described in #1108.

Note: This is still a work in progress, so it's incomplete and a bit messy. It is NOT ready to be merged.

Some of the things that still have to be done are:

waddlesplash · 2013-02-14T18:45:13Z

Does this supersede PR #1923 (HTTP range request support)?

mduan · 2013-02-14T19:24:45Z

That's the goal. We cannot go forward w/ #1923 because it makes synchronous xhr requests, which will not work with the extension. This implementation uses asynchronous xhr requests.

waddlesplash · 2013-02-20T17:25:00Z

How does this support search? If someone opens the find bar, does it notify them & then download the rest of the PDF?

mduan · 2013-02-20T22:58:15Z

@waddlesplash I suppose it would search the portions of the pdf that have been loaded so far.

mduan · 2013-02-26T20:56:12Z

/botio-windows preview

pdfjsbot · 2013-02-26T20:56:13Z

From: Bot.io (Windows)

Received

Command cmd_preview from @mduan received. Current queue size: 0

Live output at: http://107.22.172.223:8877/0d6791a73c05149/output.txt

pdfjsbot · 2013-02-26T20:56:22Z

From: Bot.io (Windows)

Failed

Full output at http://107.22.172.223:8877/0d6791a73c05149/output.txt

Total script time: 0.14 mins

mduan · 2013-03-02T01:18:57Z

This PR is fairly stable now, but is still in need of some real-world testing.

I've uploaded a packaged XPI of this PR at: http://dl.dropbox.com/u/5961585/pdf.js.xpi.

If anyone has some time, it'd be appreciated if you could try it out and let me know if you encounter any issues or have any feedback.

waddlesplash · 2013-03-02T01:21:38Z

Just installed. Will use it normally and I"ll tell you what happens :)

mduan · 2013-03-02T01:22:27Z

@waddlesplash: sweet, thanks :)

Snuffleupagus · 2013-03-02T10:08:45Z

I just tried it, and it doesn't work for me. I tried both your XPI file and checking out your repo and building the extension myself, but I just get the following error:
[11:03:54.460] TypeError: request.setUserData is not a function @ resource://pdf.js/web/viewer.js:175

Edit: My configuration: Windows 7 Professional SP1 (64-bit), Nightly 22 (Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20130301 Firefox/22.0 ID:20130301030909 CSet: 993d7aff3109)

mduan · 2013-03-03T20:55:14Z

@Snuffleupagus I get the same issue on Nightly, but it works for me on Aurora and earlier. Could you try with an earlier version of FF while I try to figure out the issue wtih Nightly?

Snuffleupagus · 2013-03-03T22:11:42Z

@mduan It works in Aurora 21 for me too, and I must say that I'm very impressed by the performance improvement!
For smaller files it seems to subjectively be on par with the standard version of pdf.js. But for larger files, there is a very noticeable difference, in some instances it seems to load the first pages in half the time compared to the standard pdf.js version.

yurydelendik · 2013-03-04T01:37:21Z

src/api.js

+      }
+
+      messageHandler.on('RequestDataRange', function transportDataRange(args) {
+        FirefoxCom.request('requestDataRange', args);


Could you move FirefoxCom and window.addEventListener('message') out of api.js? Provide service/facade for this functionality, e.g. PdfDataTransportService with requestDataRange and onDataRange members. That will be useful if we decide implement one for web viewer.

yurydelendik · 2013-03-04T01:38:48Z

setUserData is fixed by 07491f5, I think can cherry-pick this commit

waddlesplash · 2013-03-04T22:40:47Z

Just encountered a wierd bug. When opening a PDF from a file:/// URL (which I've done on plenty of other PDFs with no problems), I got the "displayed incorrectly" message (which I had previously gotten on that PDF, see #2637) but this time, only the first and second pages (p2 is the one with the problem) were rendered. I could click the next/prev page buttons, use the scrollbar, etc. - just blank pages. Not even the spinning progress thingy was there.

After about ~5s, the spinning progress GIF appeared and all pages and text selection were rendered normally. The only evidence that it ever happened was that the page previews in the sidebar of the pages I had scrolled past remained blank (white, not black with dashed outline). The other pages in the preview that I had not scrolled past appeared OK.

mduan · 2013-03-04T22:44:08Z

@waddlesplash: Do you have a link to the PDF? I don't see it in the issue you referenced.

waddlesplash · 2013-03-04T22:52:42Z

Here's a similar one with the same problem as mentioned in #2637: http://www.rosettastone.com/us_assets/documentation/RSV2_UG_Level_1_2_English_%28US%29.pdf

It has a different problem as well: on all pages except page 1, there are just two graphics objects (a "+" and a line). No spinning icon indicating loading.

mduan · 2013-03-04T23:59:10Z

@waddlesplash: I've fixed the problem with the "+" & line showing. In regards to there being a blank page with no spinning icon while it's rendering, the same thing happens without my progressive loading changes.

waddlesplash · 2013-03-05T02:26:08Z

Hm. Then that PDF has different behavior than the other PDF I have on my hard drive, because it works fine with the stable version of pdf.js.

bit · 2013-03-05T09:55:55Z

while Firefox makes range requests, Chrome makes one request for the full pdf and while that downloads the progress does not update. once the full pdf has been loaded another pending request for Range:bytes=0-0 is made and comes from the cache.

Snuffleupagus · 2013-03-05T14:38:43Z

Another related issue #1375?

mduan · 2013-03-05T18:43:07Z

@bit: The behaviour I saw from Chrome's PDF viewer is that it first requests the full PDF but cancels that request if it finds the server supports range requests. It then enough of the PDF to determine if it is linearized.

If the PDF is linearized, it will fetch the PDF in sequential chunks. Otherwise, it will issue a full request to fetch the PDF.

Linearized PDF: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
Non-linearized PDF: http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf

mduan · 2013-03-05T18:45:32Z

@Snuffleupagus: This PR will currently continue to download the rest of the PDF in sequential chunks while it's rendering.

bit · 2013-03-06T10:07:32Z

@mduan , sorry my last comment might not have been clear, I was talking about pdf.js in Firefox and pdf.js in Chrome. Both with your patch. Was testing with pdf.js and a larger pdf on the same server, server supports range request. But for some reason pdf.js does not use range requests in Chrome while Firefox works as expected.

mduan · 2013-03-06T17:38:01Z

@bit: Is the server public? If so could you share a link?

The way that range requests work right now is that a request is made for the full pdf and once the headers for the full request are returned, we issue a range request. If the range request completes successfully, we cancel the full request and start issuing range requests unless the full request is already complete.

What I think might be happening is that the server is not returning the headers separately from the body. So when headers are received, the full request is already complete and so range requests are not issued.

bit · 2013-03-07T05:42:41Z

server is nginx, here an example:
https://indiancine.ma/texts/j:Indian%20Cinema%20in%20the%20Time%20of%20Celluloid/text.pdf.html#page=150

or the version loading above url into an iframe (https://indiancine.ma/texts/j:Indian%20Cinema%20in%20the%20Time%20of%20Celluloid/150)

mduan · 2013-03-08T18:44:45Z

@bit: Can you try with the latest changes? I think I may have fixed the issue.

pdfjsbot · 2013-04-18T18:07:29Z

From: Bot.io (Windows)

Success

Full output at http://107.22.172.223:8877/0a8868c275ad385/output.txt

Total script time: 25.08 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

Implement progressive loading of PDFs

timvandermeij · 2013-04-18T21:21:39Z

I just wanted to say that I am very impressed by this great patch, @mduan! This looks like a fantastic feature to me.

aamironline · 2015-04-26T02:37:28Z

What's the status of this ticket? It looks like all the sub tasks have been finished (check boxes are checked) still ticket says it is NOT complete!

timvandermeij · 2015-04-26T13:12:02Z

This PR has the status 'merged', so it is indeed done. This has already been implemented for a very long time in PDF.js. Every other issue related to this has been closed, so I see no problems here.

yurydelendik reviewed Mar 4, 2013
View reviewed changes

mduan mentioned this pull request Mar 9, 2013

Changes to viewer to support progressive loading #2914

Merged

brendandahl added a commit that referenced this pull request Apr 18, 2013

Merge pull request #2719 from mduan/chunked

49ff029

Implement progressive loading of PDFs

brendandahl merged commit 49ff029 into mozilla:master Apr 18, 2013

This was referenced Apr 18, 2013

Regression from #2719 #3115

Closed

initialBookmark doesn't work as expected, possible regression from #2914 #3068

Closed

Error: Bad end offset #3122

Closed

loftux mentioned this pull request Apr 19, 2013

Add Pdf.js support for progressive loading of pdf. share-extras/media-viewers#1

Open

This was referenced Apr 20, 2013

Some Chinese characters not displayed from PDF generated by Inkscape #3117

Closed

Regression from #2719 (3) #3133

Closed

Fallback UI is no longer kicked on error after #2719 #3138

Closed

This was referenced Apr 22, 2013

Use same obj/font id counter for all partial evaluators on page #3142

Merged

PDF not loading #3156

Closed

[PROTOTYPE] Implements chunked rendering #3164

Closed

Snuffleupagus mentioned this pull request Apr 27, 2013

Cannot open PDF #3168

Closed

gigaherz mentioned this pull request Apr 27, 2013

Chrome extension is broken #3167

Closed

vyv03354 mentioned this pull request May 1, 2013

Cannot load PDF: this.charProcOperatorList is undefined @ resource://pdf.js/build/pdf.js:21433 #3188

Closed

yurydelendik mentioned this pull request May 7, 2013

Android(2.2) stock browser and iPhone Safari fail to render the demo page since version 0.8.51 #3218

Closed

timvandermeij mentioned this pull request May 7, 2013

Begin drawing pages while still downloading data? #1375

Closed

Snuffleupagus mentioned this pull request May 8, 2013

Fix document loading with zoom set to 'page-fit', regression from #2719 #3224

Closed

Snuffleupagus mentioned this pull request Jun 30, 2013

CTRL+A breaks scrolling of the page by dragging the scroll bar #3220

Closed

Snuffleupagus mentioned this pull request Jan 10, 2014

Open parameter "page" doesn't work with large document #3949

Closed

benbro mentioned this pull request Feb 9, 2014

Incremental loading and non-blocking parsing and rendering mrdoob/three.js#4397

Closed

laurawu mentioned this pull request Oct 20, 2014

transfer-encoding: chunked vs content-length #5424

Closed

mustafa0x mentioned this pull request Mar 6, 2018

Full PDF doc loaded before single page could be rendered #9537

Closed

Snuffleupagus mentioned this pull request Jun 7, 2020

Remove unused methods from NetworkManager, in src/display/network.js #11976

Merged

Snuffleupagus mentioned this pull request Feb 12, 2021

Remove the unused "loading_error_indicator" l10n string (PR 2719 follow-up) #12986

Merged

Snuffleupagus mentioned this pull request Feb 13, 2023

A little clean-up of the getDocument call in test/driver.js #16054

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement progressive loading of PDFs #2719

Implement progressive loading of PDFs #2719

mduan commented Feb 13, 2013

waddlesplash commented Feb 14, 2013

mduan commented Feb 14, 2013

waddlesplash commented Feb 20, 2013

mduan commented Feb 20, 2013

mduan commented Feb 26, 2013

pdfjsbot commented Feb 26, 2013

pdfjsbot commented Feb 26, 2013

mduan commented Mar 2, 2013

waddlesplash commented Mar 2, 2013

mduan commented Mar 2, 2013

Snuffleupagus commented Mar 2, 2013

mduan commented Mar 3, 2013

Snuffleupagus commented Mar 3, 2013

yurydelendik Mar 4, 2013

yurydelendik commented Mar 4, 2013

waddlesplash commented Mar 4, 2013

mduan commented Mar 4, 2013

waddlesplash commented Mar 4, 2013

mduan commented Mar 4, 2013

waddlesplash commented Mar 5, 2013

bit commented Mar 5, 2013

Snuffleupagus commented Mar 5, 2013

mduan commented Mar 5, 2013

mduan commented Mar 5, 2013

bit commented Mar 6, 2013

mduan commented Mar 6, 2013

bit commented Mar 7, 2013

mduan commented Mar 8, 2013

pdfjsbot commented Apr 18, 2013

timvandermeij commented Apr 18, 2013

aamironline commented Apr 26, 2015

timvandermeij commented Apr 26, 2015

Implement progressive loading of PDFs #2719

Implement progressive loading of PDFs #2719

Conversation

mduan commented Feb 13, 2013

waddlesplash commented Feb 14, 2013

mduan commented Feb 14, 2013

waddlesplash commented Feb 20, 2013

mduan commented Feb 20, 2013

mduan commented Feb 26, 2013

pdfjsbot commented Feb 26, 2013

From: Bot.io (Windows)

Received

pdfjsbot commented Feb 26, 2013

From: Bot.io (Windows)

Failed

mduan commented Mar 2, 2013

waddlesplash commented Mar 2, 2013

mduan commented Mar 2, 2013

Snuffleupagus commented Mar 2, 2013

mduan commented Mar 3, 2013

Snuffleupagus commented Mar 3, 2013

yurydelendik Mar 4, 2013

Choose a reason for hiding this comment

yurydelendik commented Mar 4, 2013

waddlesplash commented Mar 4, 2013

mduan commented Mar 4, 2013

waddlesplash commented Mar 4, 2013

mduan commented Mar 4, 2013

waddlesplash commented Mar 5, 2013

bit commented Mar 5, 2013

Snuffleupagus commented Mar 5, 2013

mduan commented Mar 5, 2013

mduan commented Mar 5, 2013

bit commented Mar 6, 2013

mduan commented Mar 6, 2013

bit commented Mar 7, 2013

mduan commented Mar 8, 2013

pdfjsbot commented Apr 18, 2013

From: Bot.io (Windows)

Success

timvandermeij commented Apr 18, 2013

aamironline commented Apr 26, 2015

timvandermeij commented Apr 26, 2015