Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor getPage (in the worker), and attempt to use the Linearization dictionary to lookup the first Page #9931

Merged
merged 2 commits into from
Jul 29, 2018

Conversation

Snuffleupagus
Copy link
Collaborator

@Snuffleupagus Snuffleupagus commented Jul 27, 2018

As expected, using the Linearization data doesn't appear to have had any (noticeable) performance impact. This can probably be attributed to Catalog.getPageDict only fetching the required nodes, and not the entire Pages tree, in combination with caching of already resolved nodes (in pageKidsCountCache).

Edit: To clarify the above, for Linearized files this patch does result in (at most) a handful of fewer invocations of this code since the first Page can be accessed directly.
So while the amount of data being loaded when fetching the first Page is reduced, the difference is really tiny in practice. Hence it seems that unless the server/connection is really slow, the difference would most likely not be seen/felt. (Also, keeping in mind that the default viewer will pre-render the next/previous page.)

Fixes #9716.

@Snuffleupagus
Copy link
Collaborator Author

/botio-windows test

…ent` instead

Addresses an existing TODO, and avoids having to pass in a `pageFactory` when creating `Catalog` instances.
…irst Page

Since PDF.js already supports range requests and streaming, not to mention chunked rendering, attempting to use the `Linearization` dictionary in `PDFDocument.getPage` probably isn't going to improve performance in any noticeable way.
Nonetheless, when `Linearization` data is available, it will allow looking up the first Page *directly* without having to descend into the `Pages` tree to find the correct object.
@mozilla mozilla deleted a comment from pdfjsbot Jul 29, 2018
@mozilla mozilla deleted a comment from pdfjsbot Jul 29, 2018
@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/77c2813403c7e04/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/77c2813403c7e04/output.txt

Total script time: 2.85 mins

Published

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/f773d67175a8472/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/7ef07cc460ed5a7/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/f773d67175a8472/output.txt

Total script time: 19.49 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/7ef07cc460ed5a7/output.txt

Total script time: 26.97 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@timvandermeij timvandermeij merged commit f45450b into mozilla:master Jul 29, 2018
@timvandermeij
Copy link
Contributor

Good to have this; thanks!

@Snuffleupagus Snuffleupagus deleted the refactor-getPage branch July 30, 2018 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Take advantage of linearization dictionary to get the first page
3 participants