-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to significantly reduce the number of ChunkedStream.{ensureByte, ensureRange}
calls by inlining the this.progressiveDataLength
checks at the call-sites
#10986
Conversation
…yte, ensureRange}` calls by inlining the `this.progressiveDataLength` checks at the call-sites The number of in particular `ChunkedStream.ensureByte` calls is often absolutely *huge* (on the order of million calls) when loading and rendering even moderately complicated PDF files, which isn't entirely surprising considering that the `getByte`/`getBytes`/`peekByte`/`peekBytes` methods are used for essentially all data reading/parsing. The idea implemented in this patch is to inline an inverted `progressiveDataLength` check at all of the `ensureByte`/`ensureRange` call-sites, which in practice will often result in *several* orders of magnitude fewer function calls. Obviously this patch will only help if the browser supports streaming, which all reasonably modern browsers now do (including the Firefox built-in PDF viewer), and assuming that the user didn't set the `disableStream` option (e.g. for using `disableAutoFetch`). However, I think we should be able to improve performance for the default out-of-the-box use case, without worrying about e.g. older browsers (where this patch will thus incur *one* additional check before calling `ensureByte`/`ensureRange`). This patch was inspired by the *first* commit in PR 5005, which was subsequently backed out in PR 5145 for causing regressions. Since the general idea of avoiding unnecessary function calls was really nice, I figured that re-attempting this in one way or another wouldn't be a bad idea. Given that streaming is now supported, which it wasn't back then, using `progressiveDataLength` seemed like an easier approach in general since it also allowed supporting both `ensureByte` and `ensureRange`. This sort of patch obviously needs data to back it up, hence I've benchmarked the changes using the following manifest file (with the default `tracemonkey` file): ``` [ { "id": "tracemonkey-eq", "file": "pdfs/tracemonkey.pdf", "md5": "9a192d8b1a7dc652a19835f6f08098bd", "rounds": 250, "type": "eq" } ] ``` I get the following complete results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05) ------- | ------------ | ----- | ------------ | ----------- | --- | ----- | ------------- Firefox | Overall | 3500 | 140 | 134 | -6 | -4.46 | faster Firefox | Page Request | 3500 | 2 | 2 | 0 | -0.10 | Firefox | Rendering | 3500 | 138 | 131 | -6 | -4.54 | faster ``` Here it's pretty clear that the patch does have a positive net effect, even for a PDF file of fairly moderate size and complexity. However, in this case it's probably interesting to also look at the results per page: ``` -- Grouped By page, stat -- page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05) ---- | ------------ | ----- | ------------ | ----------- | --- | ------ | ------------- 0 | Overall | 250 | 74 | 75 | 1 | 0.69 | 0 | Page Request | 250 | 1 | 1 | 0 | 33.20 | 0 | Rendering | 250 | 73 | 74 | 0 | 0.25 | 1 | Overall | 250 | 123 | 121 | -2 | -1.87 | faster 1 | Page Request | 250 | 3 | 2 | 0 | -11.73 | 1 | Rendering | 250 | 121 | 119 | -2 | -1.67 | 2 | Overall | 250 | 64 | 63 | -1 | -1.91 | 2 | Page Request | 250 | 1 | 1 | 0 | 8.81 | 2 | Rendering | 250 | 63 | 62 | -1 | -2.13 | faster 3 | Overall | 250 | 97 | 97 | 0 | -0.06 | 3 | Page Request | 250 | 1 | 1 | 0 | 25.37 | 3 | Rendering | 250 | 96 | 95 | 0 | -0.34 | 4 | Overall | 250 | 97 | 97 | 0 | -0.38 | 4 | Page Request | 250 | 1 | 1 | 0 | -5.97 | 4 | Rendering | 250 | 96 | 96 | 0 | -0.27 | 5 | Overall | 250 | 99 | 97 | -3 | -2.92 | 5 | Page Request | 250 | 2 | 1 | 0 | -17.20 | 5 | Rendering | 250 | 98 | 95 | -3 | -2.68 | 6 | Overall | 250 | 99 | 99 | 0 | -0.14 | 6 | Page Request | 250 | 2 | 2 | 0 | -16.49 | 6 | Rendering | 250 | 97 | 98 | 0 | 0.16 | 7 | Overall | 250 | 96 | 95 | -1 | -0.55 | 7 | Page Request | 250 | 1 | 2 | 1 | 66.67 | slower 7 | Rendering | 250 | 95 | 94 | -1 | -1.19 | 8 | Overall | 250 | 92 | 92 | -1 | -0.69 | 8 | Page Request | 250 | 1 | 1 | 0 | -17.60 | 8 | Rendering | 250 | 91 | 91 | 0 | -0.52 | 9 | Overall | 250 | 112 | 112 | 0 | 0.29 | 9 | Page Request | 250 | 2 | 1 | 0 | -7.92 | 9 | Rendering | 250 | 110 | 111 | 0 | 0.37 | 10 | Overall | 250 | 589 | 522 | -67 | -11.38 | faster 10 | Page Request | 250 | 14 | 13 | 0 | -1.26 | 10 | Rendering | 250 | 575 | 508 | -67 | -11.62 | faster 11 | Overall | 250 | 66 | 66 | -1 | -0.86 | 11 | Page Request | 250 | 1 | 1 | 0 | -16.48 | 11 | Rendering | 250 | 65 | 65 | 0 | -0.62 | 12 | Overall | 250 | 303 | 291 | -12 | -4.07 | faster 12 | Page Request | 250 | 2 | 2 | 0 | 12.93 | 12 | Rendering | 250 | 301 | 289 | -13 | -4.19 | faster 13 | Overall | 250 | 48 | 47 | 0 | -0.45 | 13 | Page Request | 250 | 1 | 1 | 0 | 1.59 | 13 | Rendering | 250 | 47 | 46 | 0 | -0.52 | ``` Here it's clear that this patch *significantly* improves the rendering performance of the slowest pages, while not causing any big regressions elsewhere. As expected, this patch thus helps larger and/or more complex pages the most (which is also where even small improvements will be most beneficial). There's obviously the question if this is *slightly* regressing simpler pages, but given just how short the times are in most cases it's not inconceivable that the page results above are simply caused be e.g. limited `Date.now()` and/or limited numerical precision.
/botio test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/6624b687b393ddd/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/80407d69cb65d05/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/80407d69cb65d05/output.txt Total script time: 17.36 mins
|
From: Bot.io (Windows)FailedFull output at http://54.215.176.217:8877/6624b687b393ddd/output.txt Total script time: 25.61 mins
Image differences available at: http://54.215.176.217:8877/6624b687b393ddd/reftest-analyzer.html#web=eq.log |
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/dba2d2ac774d114/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/dba2d2ac774d114/output.txt Total script time: 1.78 mins Published |
Nice work, and really good that you revisited the ideas from the original PR! |
The number of in particular
ChunkedStream.ensureByte
calls is often absolutely huge (on the order of million calls) when loading and rendering even moderately complicated PDF files, which isn't entirely surprising considering that thegetByte
/getBytes
/peekByte
/peekBytes
methods are used for essentially all data reading/parsing.The idea implemented in this patch is to inline an inverted
progressiveDataLength
check at all of theensureByte
/ensureRange
call-sites, which in practice will often result in several orders of magnitude fewer function calls.Obviously this patch will only help if the browser supports streaming, which all reasonably modern browsers now do (including the Firefox built-in PDF viewer), and assuming that the user didn't set the
disableStream
option (e.g. for usingdisableAutoFetch
). However, I think we should be able to improve performance for the default out-of-the-box use case, without worrying about e.g. older browsers (where this patch will thus incur one additional check before callingensureByte
/ensureRange
).This patch was inspired by the first commit in PR #5005, which was subsequently backed out in PR #5145 for causing regressions. Since the general idea of avoiding unnecessary function calls was really nice, I figured that re-attempting this in one way or another wouldn't be a bad idea.
Given that streaming is now supported, which it wasn't back then, using
progressiveDataLength
seemed like an easier approach in general since it also allowed supporting bothensureByte
andensureRange
.This sort of patch obviously needs data to back it up, hence I've benchmarked the changes using the following manifest file (with the default
tracemonkey
file):I get the following complete results when comparing this patch against the
master
branch:Here it's pretty clear that the patch does have a positive net effect, even for a PDF file of fairly moderate size and complexity. However, in this case it's probably interesting to also look at the results per page:
Here it's clear that this patch significantly improves the rendering performance of the slowest pages, while not causing any big regressions elsewhere. As expected, this patch thus helps larger and/or more complex pages the most (which is also where even small improvements will be most beneficial).
There's obviously the question if this is slightly regressing simpler pages, but given just how short the times are in most cases it's not inconceivable that the page results above are simply caused be e.g. limited
Date.now()
and/or limited numerical precision.