Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makes PDF data reading Streams API friendly. #6879

Merged
merged 1 commit into from
Mar 1, 2016

Conversation

yurydelendik
Copy link
Contributor

Adds PDFWorkerStream and PDFNetworkStream with classes that can provide readers (that are compatible with ReadableByteStream, see Streams API).

In the future PDFNetworkStream can be re-written to support Fetch API. The PDFWorkerStream can be refactored to be more lightweight and move most of I/O logic to the main thread.

Review on Reviewable

@yurydelendik
Copy link
Contributor Author

/cc @Rob--W shall help with #5319

}
};

function PDFNetworkStreamRangeRequest(manager, begin, end) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: for consistency with PDFNetworkStreamFullRequestReader, shouldn't this be named PDFNetworkStreamRangeRequestReader instead?

@Rob--W
Copy link
Member

Rob--W commented Jan 15, 2016

Which such a refactor, I'd like to see more unit tests that verifies that the network layer works as intended. That makes it much easier to build upon this interface (e.g. adding support for the streams API).

@yurydelendik
Copy link
Contributor Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Feb 9, 2016

From: Bot.io (Linux)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.21.233.14:8877/e8254fd4aa4c7cc/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 9, 2016

From: Bot.io (Windows)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/b9ed08ef04529cc/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 9, 2016

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/b9ed08ef04529cc/output.txt

Total script time: 20.61 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Feb 9, 2016

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/e8254fd4aa4c7cc/output.txt

Total script time: 21.75 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed


function arraysToBytes(arr) {
if (arr.length === 1 && (arr instanceof Uint8Array)) {
return arr[0]; // shortcut
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you returning a 8-bit integer here? Or should arr instanceof be arr[0] instanceof ?

@yurydelendik
Copy link
Contributor Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.21.233.14:8877/4db634964845804/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/b9d5ad22a94637b/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/b9d5ad22a94637b/output.txt

Total script time: 20.82 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/4db634964845804/output.txt

Total script time: 21.57 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/4cd3f72a2fbce1b/output.txt

Total script time: 14.56 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/4cd3f72a2fbce1b/reftest-analyzer.html#web=eq.log

@yurydelendik
Copy link
Contributor Author

/botio-linux test

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.21.233.14:8877/354031c9f11c02f/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/57a3ffa353c2399/output.txt

Total script time: 20.34 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/354031c9f11c02f/output.txt

Total script time: 21.80 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@yurydelendik
Copy link
Contributor Author

And in the interface all functions are anonymous. Is that intentional? With the similarly named methods in the different interfaces (e.g. read), grepping may be easier if the interface name is included in the function name.

Actually, there is no point on applying function names (it was legacy practice to ensure we are getting proper stack trace reports). IDEs are capable to navigate between code and jsdoc (as well as show pop-up hints). Not sure how can we be more useful here in grep situation, also once we start moving the code to the ES6 we will probably completely abandon this naming conversion in favor of class definition.

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/23904393ada2d2e/output.txt

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/bda3d85b7158ee5/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/05a169b92a7efb1/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/05a169b92a7efb1/output.txt

Total script time: 20.30 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/bda3d85b7158ee5/output.txt

Total script time: 23.85 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/bda3d85b7158ee5/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

/botio-linux test

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/e64f64ef17dec2d/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/e64f64ef17dec2d/output.txt

Total script time: 21.85 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

yurydelendik added a commit that referenced this pull request Mar 1, 2016
Makes PDF data reading Streams API friendly.
@yurydelendik yurydelendik merged commit 22341c0 into mozilla:master Mar 1, 2016
@Snuffleupagus
Copy link
Collaborator

Nice patch!

However, it seems that the issue @timvandermeij mentioned on IRC a few days back is still present in the patch that landed, please see http://logs.glob.uno/?c=mozilla%23pdfjs&s=26+Feb+2016&e=26+Feb+2016#c46715.

For example, open http://ftp.acc.umu.se/mirror/CTAN/info/lshort/english/lshort.pdf#disableRange=true&disableStream=true and check the browser console (Ctrl+Shift+J):

Error: pdfjs/core/network module is not loaded pdf.worker.js:2340:5
assert@resource://pdf.js/build/pdf.worker.js:2356:5
getPdfManager@resource://pdf.js/build/pdf.worker.js:41789:11
wphCreateDocumentHandler/setupDoc@resource://pdf.js/build/pdf.worker.js:41930:7
wphReady@resource://pdf.js/build/pdf.worker.js:42171:7
messageHandlerComObjOnMessage@resource://pdf.js/build/pdf.worker.js:3704:9
EventListener.handleEvent*MessageHandler@resource://pdf.js/build/pdf.worker.js:3710:3
wphCreateDocumentHandler@resource://pdf.js/build/pdf.worker.js:41719:19
wphSetupDoc@resource://pdf.js/build/pdf.worker.js:41706:14
messageHandlerComObjOnMessage/<@resource://pdf.js/build/pdf.worker.js:3681:18
promise callback*messageHandlerComObjOnMessage@resource://pdf.js/build/pdf.worker.js:3680:9
EventListener.handleEvent*MessageHandler@resource://pdf.js/build/pdf.worker.js:3710:3
initializeWorker@resource://pdf.js/build/pdf.worker.js:42218:17
pdfjsWrapper/<@resource://pdf.js/build/pdf.worker.js:42226:3
pdfjsWrapper/<@resource://pdf.js/build/pdf.worker.js:41395:5
pdfjsWrapper@resource://pdf.js/build/pdf.worker.js:41393:2
@resource://pdf.js/build/pdf.worker.js:40:13
@resource://pdf.js/build/pdf.worker.js:25:1
@resource://pdf.js/build/pdf.worker.js:18:2
 pdf.worker.js:2341:5
PDF 47642ae5bebee4e458b77187094e0f92 [1.5 pdfTeX-1.40.14 / LaTeX with hyperref package] (PDF.js: 1.4.109 [WebGL]) viewer.js:7135:7

@yurydelendik
Copy link
Contributor Author

@Snuffleupagus
Copy link
Collaborator

Hmm, worked for me at http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#disableAutoFetch=true&disableStream=true

In order to trigger this, both disableRange and disableStream need to be set (as in my link above), and you might need to do a skip-cache reload as well.
Another, simpler way, to trigger this in Firefox is to open a local file by dragging it into the browser window. In that case you don't need to set any disable... parameters.

Edit: I should probably have clarified one thing, the PDF file will still load/render correctly in the above cases, it's just that there's a lot of console spew because of the asserthere https://github.com/mozilla/pdf.js/blob/master/src/core/worker.js#L551.

Also, the error message isn't entirely correct since network.js is loaded, it's just that the code in question code is only included when //#if !(FIREFOX || MOZCENTRAL).

@Snuffleupagus
Copy link
Collaborator

As outlined in #6879 (comment), this is not a breaking error. But it'd be nice to get rid of all that console output (caused by the assert), so couldn't we just do something like the diff below?

diff --git a/src/core/worker.js b/src/core/worker.js
index c78b52e..4e4446e 100644
--- a/src/core/worker.js
+++ b/src/core/worker.js
@@ -548,8 +548,14 @@ var WorkerMessageHandler = PDFJS.WorkerMessageHandler = {
         if (source.chunkedViewerLoading) {
           pdfStream = new PDFWorkerStream(source, handler);
         } else {
+//#if (FIREFOX || MOZCENTRAL)
+//        pdfManagerCapability.reject(new Error('PDFNetworkStream is ' +
+//          'undefined in the addon/built-in version.'));
+//        return pdfManagerCapability.promise;
+//#else
           assert(PDFNetworkStream, 'pdfjs/core/network module is not loaded');
           pdfStream = new PDFNetworkStream(data);
+//#endif
         }
       } catch (ex) {
         pdfManagerCapability.reject(ex);

@yurydelendik
Copy link
Contributor Author

As outlined in #6879 (comment), this is not a breaking error. But it'd be nice to get rid of all that console output (caused by the assert), so couldn't we just do something like the diff below?

Problem that it shall not be called without source.chunkedViewerLoading set in add-on. I'm trying to understand how it's happening.

@yurydelendik
Copy link
Contributor Author

It's due to this case https://github.com/mozilla/pdf.js/blob/master/web/viewer.js#L512 and we already resolved a promise at https://github.com/mozilla/pdf.js/blob/master/src/core/worker.js#L537 for that (that's why it is working).

@@ -196,144 +541,113 @@ var WorkerMessageHandler = PDFJS.WorkerMessageHandler = {
} catch (ex) {
pdfManagerCapability.reject(ex);
}
return pdfManagerCapability.promise;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall not be removed?

yurydelendik added a commit to yurydelendik/pdf.js that referenced this pull request Mar 2, 2016
yurydelendik added a commit that referenced this pull request Mar 2, 2016
Reverts back un-need change made at #6879.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants