Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the XRef.cache Array with a Map instead #11076

Merged
merged 2 commits into from
Aug 18, 2019

Conversation

Snuffleupagus
Copy link
Collaborator

Given that the different types of Streams will never be cached, this thus implies that the XRef.cache Array will always be more-or-less sparse.
Generally speaking, the longer the document the more sparse the XRef.cache will thus become. For example, looking at the pdf.pdf file from the test-suite: The length of the XRef.cache Array will be a few hundred thousand elements, with approximately 95% of them being empty.

Hence it seems pretty clear that an Array isn't really the best data-structure for this kind of cache, and this patch thus changes it to a Map instead.

This patch-series was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:

[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]

which gave the following results when comparing this patch-series against the master branch:

-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   200 |         2736 |        2736 |   1 |  0.02 |
Firefox | Page Request |   200 |            2 |           2 |   0 | -8.26 |        faster
Firefox | Rendering    |   200 |         2733 |        2734 |   1 |  0.03 |

The relevant methods are usually not hot enough for these changes to have an easily measurable effect, however there's been a lot of other cases where similiar inlining has helped performance. (And these changes may help offset the changes made in the next patch.)
Given that the different types of `Stream`s will never be cached, this thus implies that the `XRef.cache` Array will *always* be more-or-less sparse.
Generally speaking, the longer the document the more sparse the `XRef.cache` will thus become. For example, looking at the `pdf.pdf` file from the test-suite: The length of the `XRef.cache` Array will be a few hundred thousand elements, with approximately 95% of them being empty.

Hence it seems pretty clear that an Array isn't really the best data-structure for this kind of cache, and this patch thus changes it to a Map instead.

This patch-series was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch-series against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   200 |         2736 |        2736 |   1 |  0.02 |
Firefox | Page Request |   200 |            2 |           2 |   0 | -8.26 |        faster
Firefox | Rendering    |   200 |         2733 |        2734 |   1 |  0.03 |
```
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/d177e255d8be3ae/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/7f47b0cddab23cb/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/7f47b0cddab23cb/output.txt

Total script time: 17.43 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/d177e255d8be3ae/output.txt

Total script time: 26.23 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@timvandermeij timvandermeij merged commit 852fc95 into mozilla:master Aug 18, 2019
@timvandermeij
Copy link
Contributor

That's a really good find. Thank you!

@Snuffleupagus Snuffleupagus deleted the XRef-fetch-isRef/cache branch August 18, 2019 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants