Refactor `getPage` (in the worker), and attempt to use the `Linearization` dictionary to lookup the first Page #9931

Snuffleupagus · 2018-07-27T10:33:20Z

As expected, using the Linearization data doesn't appear to have had any (noticeable) performance impact. This can probably be attributed to Catalog.getPageDict only fetching the required nodes, and not the entire Pages tree, in combination with caching of already resolved nodes (in pageKidsCountCache).

Edit: To clarify the above, for Linearized files this patch does result in (at most) a handful of fewer invocations of this code since the first Page can be accessed directly.
So while the amount of data being loaded when fetching the first Page is reduced, the difference is really tiny in practice. Hence it seems that unless the server/connection is really slow, the difference would most likely not be seen/felt. (Also, keeping in mind that the default viewer will pre-render the next/previous page.)

Fixes #9716.

Snuffleupagus · 2018-07-27T10:40:39Z

/botio-windows test

…ent` instead Addresses an existing TODO, and avoids having to pass in a `pageFactory` when creating `Catalog` instances.

…irst Page Since PDF.js already supports range requests and streaming, not to mention chunked rendering, attempting to use the `Linearization` dictionary in `PDFDocument.getPage` probably isn't going to improve performance in any noticeable way. Nonetheless, when `Linearization` data is available, it will allow looking up the first Page *directly* without having to descend into the `Pages` tree to find the correct object.

timvandermeij · 2018-07-29T14:55:56Z

/botio-linux preview

pdfjsbot · 2018-07-29T14:55:57Z

From: Bot.io (Linux m4)

Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/77c2813403c7e04/output.txt

pdfjsbot · 2018-07-29T14:58:48Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/77c2813403c7e04/output.txt

Total script time: 2.85 mins

Published

Viewer: http://54.67.70.0:8877/77c2813403c7e04/web/viewer.html

timvandermeij · 2018-07-29T14:58:54Z

/botio test

pdfjsbot · 2018-07-29T14:58:55Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/f773d67175a8472/output.txt

pdfjsbot · 2018-07-29T14:58:55Z

From: Bot.io (Windows)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/7ef07cc460ed5a7/output.txt

pdfjsbot · 2018-07-29T15:18:25Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/f773d67175a8472/output.txt

Total script time: 19.49 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2018-07-29T15:25:54Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/7ef07cc460ed5a7/output.txt

Total script time: 26.97 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

timvandermeij · 2018-07-29T17:34:08Z

Good to have this; thanks!

timvandermeij added the core label Jul 27, 2018

Snuffleupagus mentioned this pull request Jul 28, 2018

[Regression] Convert Catalog.builtInCMapCache into a Map, instead of an Object, to ensure that it's correctly reset (PR 8064 follow-up) #9935

Merged

Snuffleupagus added 2 commits July 28, 2018 22:23

Move getPage, on the worker side, from Catalog and into `PDFDocum…

fbb25ff

…ent` instead Addresses an existing TODO, and avoids having to pass in a `pageFactory` when creating `Catalog` instances.

Snuffleupagus force-pushed the refactor-getPage branch from 915d26f to ec3728b Compare July 28, 2018 20:23

mozilla deleted a comment from pdfjsbot Jul 29, 2018

timvandermeij approved these changes Jul 29, 2018

View reviewed changes

timvandermeij merged commit f45450b into mozilla:master Jul 29, 2018

Snuffleupagus deleted the refactor-getPage branch July 30, 2018 08:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `getPage` (in the worker), and attempt to use the `Linearization` dictionary to lookup the first Page #9931

Refactor `getPage` (in the worker), and attempt to use the `Linearization` dictionary to lookup the first Page #9931

Snuffleupagus commented Jul 27, 2018 •

edited

Loading

Snuffleupagus commented Jul 27, 2018

timvandermeij commented Jul 29, 2018

pdfjsbot commented Jul 29, 2018

pdfjsbot commented Jul 29, 2018

timvandermeij commented Jul 29, 2018

pdfjsbot commented Jul 29, 2018

pdfjsbot commented Jul 29, 2018

pdfjsbot commented Jul 29, 2018

pdfjsbot commented Jul 29, 2018

timvandermeij commented Jul 29, 2018

Refactor getPage (in the worker), and attempt to use the Linearization dictionary to lookup the first Page #9931

Refactor getPage (in the worker), and attempt to use the Linearization dictionary to lookup the first Page #9931

Conversation

Snuffleupagus commented Jul 27, 2018 • edited Loading

Snuffleupagus commented Jul 27, 2018

timvandermeij commented Jul 29, 2018

pdfjsbot commented Jul 29, 2018

From: Bot.io (Linux m4)

Received

pdfjsbot commented Jul 29, 2018

From: Bot.io (Linux m4)

Success

Published

timvandermeij commented Jul 29, 2018

pdfjsbot commented Jul 29, 2018

From: Bot.io (Linux m4)

Received

pdfjsbot commented Jul 29, 2018

From: Bot.io (Windows)

Received

pdfjsbot commented Jul 29, 2018

From: Bot.io (Linux m4)

Success

pdfjsbot commented Jul 29, 2018

From: Bot.io (Windows)

Success

timvandermeij commented Jul 29, 2018

Refactor `getPage` (in the worker), and attempt to use the `Linearization` dictionary to lookup the first Page #9931

Refactor `getPage` (in the worker), and attempt to use the `Linearization` dictionary to lookup the first Page #9931

Snuffleupagus commented Jul 27, 2018 •

edited

Loading