Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to significantly reduce the number of ChunkedStream.{ensureByte, ensureRange} calls by inlining the this.progressiveDataLength checks at the call-sites #10986

Merged

Commits on Jul 18, 2019

  1. Attempt to significantly reduce the number of `ChunkedStream.{ensureB…

    …yte, ensureRange}` calls by inlining the `this.progressiveDataLength` checks at the call-sites
    
    The number of in particular `ChunkedStream.ensureByte` calls is often absolutely *huge* (on the order of million calls) when loading and rendering even moderately complicated PDF files, which isn't entirely surprising considering that the `getByte`/`getBytes`/`peekByte`/`peekBytes` methods are used for essentially all data reading/parsing.
    
    The idea implemented in this patch is to inline an inverted `progressiveDataLength` check at all of the `ensureByte`/`ensureRange` call-sites, which in practice will often result in *several* orders of magnitude fewer function calls.
    Obviously this patch will only help if the browser supports streaming, which all reasonably modern browsers now do (including the Firefox built-in PDF viewer), and assuming that the user didn't set the `disableStream` option (e.g. for using `disableAutoFetch`). However, I think we should be able to improve performance for the default out-of-the-box use case, without worrying about e.g. older browsers (where this patch will thus incur *one* additional check before calling `ensureByte`/`ensureRange`).
    
    This patch was inspired by the *first* commit in PR 5005, which was subsequently backed out in PR 5145 for causing regressions. Since the general idea of avoiding unnecessary function calls was really nice, I figured that re-attempting this in one way or another wouldn't be a bad idea.
    Given that streaming is now supported, which it wasn't back then, using `progressiveDataLength` seemed like an easier approach in general since it also allowed supporting both `ensureByte` and `ensureRange`.
    
    This sort of patch obviously needs data to back it up, hence I've benchmarked the changes using the following manifest file (with the default `tracemonkey` file):
    ```
    [
        {  "id": "tracemonkey-eq",
           "file": "pdfs/tracemonkey.pdf",
           "md5": "9a192d8b1a7dc652a19835f6f08098bd",
           "rounds": 250,
           "type": "eq"
        }
    ]
    ```
    
    I get the following complete results when comparing this patch against the `master` branch:
    ```
    -- Grouped By browser, stat --
    browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
    ------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
    Firefox | Overall      |  3500 |          140 |         134 |  -6 | -4.46 |        faster
    Firefox | Page Request |  3500 |            2 |           2 |   0 | -0.10 |
    Firefox | Rendering    |  3500 |          138 |         131 |  -6 | -4.54 |        faster
    ```
    
    Here it's pretty clear that the patch does have a positive net effect, even for a PDF file of fairly moderate size and complexity. However, in this case it's probably interesting to also look at the results per page:
    ```
    -- Grouped By page, stat --
    page | stat         | Count | Baseline(ms) | Current(ms) | +/- |     %  | Result(P<.05)
    ---- | ------------ | ----- | ------------ | ----------- | --- | ------ | -------------
    0    | Overall      |   250 |           74 |          75 |   1 |   0.69 |
    0    | Page Request |   250 |            1 |           1 |   0 |  33.20 |
    0    | Rendering    |   250 |           73 |          74 |   0 |   0.25 |
    1    | Overall      |   250 |          123 |         121 |  -2 |  -1.87 |        faster
    1    | Page Request |   250 |            3 |           2 |   0 | -11.73 |
    1    | Rendering    |   250 |          121 |         119 |  -2 |  -1.67 |
    2    | Overall      |   250 |           64 |          63 |  -1 |  -1.91 |
    2    | Page Request |   250 |            1 |           1 |   0 |   8.81 |
    2    | Rendering    |   250 |           63 |          62 |  -1 |  -2.13 |        faster
    3    | Overall      |   250 |           97 |          97 |   0 |  -0.06 |
    3    | Page Request |   250 |            1 |           1 |   0 |  25.37 |
    3    | Rendering    |   250 |           96 |          95 |   0 |  -0.34 |
    4    | Overall      |   250 |           97 |          97 |   0 |  -0.38 |
    4    | Page Request |   250 |            1 |           1 |   0 |  -5.97 |
    4    | Rendering    |   250 |           96 |          96 |   0 |  -0.27 |
    5    | Overall      |   250 |           99 |          97 |  -3 |  -2.92 |
    5    | Page Request |   250 |            2 |           1 |   0 | -17.20 |
    5    | Rendering    |   250 |           98 |          95 |  -3 |  -2.68 |
    6    | Overall      |   250 |           99 |          99 |   0 |  -0.14 |
    6    | Page Request |   250 |            2 |           2 |   0 | -16.49 |
    6    | Rendering    |   250 |           97 |          98 |   0 |   0.16 |
    7    | Overall      |   250 |           96 |          95 |  -1 |  -0.55 |
    7    | Page Request |   250 |            1 |           2 |   1 |  66.67 |        slower
    7    | Rendering    |   250 |           95 |          94 |  -1 |  -1.19 |
    8    | Overall      |   250 |           92 |          92 |  -1 |  -0.69 |
    8    | Page Request |   250 |            1 |           1 |   0 | -17.60 |
    8    | Rendering    |   250 |           91 |          91 |   0 |  -0.52 |
    9    | Overall      |   250 |          112 |         112 |   0 |   0.29 |
    9    | Page Request |   250 |            2 |           1 |   0 |  -7.92 |
    9    | Rendering    |   250 |          110 |         111 |   0 |   0.37 |
    10   | Overall      |   250 |          589 |         522 | -67 | -11.38 |        faster
    10   | Page Request |   250 |           14 |          13 |   0 |  -1.26 |
    10   | Rendering    |   250 |          575 |         508 | -67 | -11.62 |        faster
    11   | Overall      |   250 |           66 |          66 |  -1 |  -0.86 |
    11   | Page Request |   250 |            1 |           1 |   0 | -16.48 |
    11   | Rendering    |   250 |           65 |          65 |   0 |  -0.62 |
    12   | Overall      |   250 |          303 |         291 | -12 |  -4.07 |        faster
    12   | Page Request |   250 |            2 |           2 |   0 |  12.93 |
    12   | Rendering    |   250 |          301 |         289 | -13 |  -4.19 |        faster
    13   | Overall      |   250 |           48 |          47 |   0 |  -0.45 |
    13   | Page Request |   250 |            1 |           1 |   0 |   1.59 |
    13   | Rendering    |   250 |           47 |          46 |   0 |  -0.52 |
    ```
    
    Here it's clear that this patch *significantly* improves the rendering performance of the slowest pages, while not causing any big regressions elsewhere. As expected, this patch thus helps larger and/or more complex pages the most (which is also where even small improvements will be most beneficial).
    There's obviously the question if this is *slightly* regressing simpler pages, but given just how short the times are in most cases it's not inconceivable that the page results above are simply caused be e.g. limited `Date.now()` and/or limited numerical precision.
    Snuffleupagus committed Jul 18, 2019
    Configuration menu
    Copy the full SHA
    b5254f2 View commit details
    Browse the repository at this point in the history