Convert `Catalog.getPageDict` to an `async` method #11312

Snuffleupagus · 2019-11-08T16:51:52Z

This makes it possible to remove the internal next helper function, and also gets rid of the need to manually resolve/reject a PromiseCapability.

This makes it possible to remove the internal `next` helper function, and also gets rid of the need to manually resolve/reject a `PromiseCapability`.

As we've seen in numerous other cases, avoiding unnecessary function calls is never a bad thing (even if the effect is probably tiny here). With these changes we also avoid potentially two back-to-back `isDict` checks when evaluating possible Page nodes, and can also no longer accidentally pick a dictionary with an incorrect /Type.

Snuffleupagus · 2019-11-08T16:56:15Z

/botio test

pdfjsbot · 2019-11-08T16:56:17Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/84f317acfc45ce7/output.txt

pdfjsbot · 2019-11-08T16:56:17Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/b6d98b39c223b9f/output.txt

pdfjsbot · 2019-11-08T17:14:59Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/84f317acfc45ce7/output.txt

Total script time: 18.69 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2019-11-08T17:23:04Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/b6d98b39c223b9f/output.txt

Total script time: 26.77 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

timvandermeij · 2019-11-08T21:39:09Z

Thank you for refactoring this! I found the next mechanism a bit hard to read.

Snuffleupagus · 2019-11-09T11:46:16Z

I found the next mechanism a bit hard to read.

Yeah, the old setup was essentially a recursive function which was called asynchronously. This was also a case where the use of a PromiseCapability probably didn't help readability either.

I suppose the one criticism you could have of this patch, is that it's adding asynchronous behaviour inside of a loop. However I think all the skipped function calls and the general clean-up outweighs that.

Snuffleupagus · 2019-11-09T21:36:02Z

Sigh, it seems that I forgot to test this with really large files :-(

With a large file taken from an old GitHub issue (warning, the file is 16 MB): https://github.com/mozilla/pdf.js/files/876321/kjv.pdf there's now a considerable "Page Request" regression for the second page upon document load; from 2000 -> 3000 ms which obviously doesn't look good.

In light of this I don't believe that it's a good idea keep these changes unfortunately, since the new way in which this code is asynchronous obviously performs worse in some cases; sorry about the mess here!

Snuffleupagus added 2 commits November 8, 2019 17:45

Convert Catalog.getPageDict to an async method

0d89006

This makes it possible to remove the internal `next` helper function, and also gets rid of the need to manually resolve/reject a `PromiseCapability`.

Snuffleupagus force-pushed the async-getPageDict branch from e5f103a to 79d7c00 Compare November 8, 2019 16:53

timvandermeij added the core label Nov 8, 2019

timvandermeij approved these changes Nov 8, 2019

View reviewed changes

timvandermeij merged commit b1440a1 into mozilla:master Nov 8, 2019

Snuffleupagus deleted the async-getPageDict branch November 8, 2019 21:58

Snuffleupagus mentioned this pull request Nov 9, 2019

Revert "Convert Catalog.getPageDict to an async method" #11314

Merged

Snuffleupagus mentioned this pull request Dec 10, 2021

Avoid overloading the worker-thread during eager page initialization in the viewer (PR 11263 follow-up) #14359

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert `Catalog.getPageDict` to an `async` method #11312

Convert `Catalog.getPageDict` to an `async` method #11312

Snuffleupagus commented Nov 8, 2019

Snuffleupagus commented Nov 8, 2019

pdfjsbot commented Nov 8, 2019

pdfjsbot commented Nov 8, 2019

pdfjsbot commented Nov 8, 2019

pdfjsbot commented Nov 8, 2019

timvandermeij commented Nov 8, 2019

Snuffleupagus commented Nov 9, 2019

Snuffleupagus commented Nov 9, 2019

Convert Catalog.getPageDict to an async method #11312

Convert Catalog.getPageDict to an async method #11312

Conversation

Snuffleupagus commented Nov 8, 2019

Snuffleupagus commented Nov 8, 2019

pdfjsbot commented Nov 8, 2019

From: Bot.io (Linux m4)

Received

pdfjsbot commented Nov 8, 2019

From: Bot.io (Windows)

Received

pdfjsbot commented Nov 8, 2019

From: Bot.io (Linux m4)

Success

pdfjsbot commented Nov 8, 2019

From: Bot.io (Windows)

Success

timvandermeij commented Nov 8, 2019

Snuffleupagus commented Nov 9, 2019

Snuffleupagus commented Nov 9, 2019

Convert `Catalog.getPageDict` to an `async` method #11312

Convert `Catalog.getPageDict` to an `async` method #11312