When can we trust a /Pages dictionary to contain Page references directly? #18637
richard-smith-preservica
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Related: my patch here #18627 and discussion on this issue #14570
Use case: Render-on-demand (i.e. disableStream and disableAutoFetch set, and the server supports ranged requests) on a large document like a PDF where each page is an image, and we want to show something to the user quickly. Such documents are common outputs of digitisation.
When the PDF has (i) all the /Page dicts referenced from the top level /Pages dict and (ii) /Page dicts are interleaved with content, loading the last page causes a XHR to be sent for every prior page (see my comment on #14570 for more detail).
If we could trust that the /Kids of the top level /Pages dict contained exactly the refs to /Page dicts and nothing else, we could avoid loading all the previous pages and fetch of a single page is quick. The team rightly don't want to be too trusting of that, since lots of PDFs get created wrongly (see all the justifications on tickets mentioned here #14570 (comment)). The question in this discussion is: are there circumstances where we can trust that.
This change master...richard-smith-preservica:pdf.js:rcs/assume-all-pages-in-top-level-when-likely contains the core of the idea - if the page count and number of children of the top level /Pages align then assume it's 1:1 and fetch pages independently.
Possible criteria for 'this is ok'
... but really I'm opening this to see if anyone has more robust suggestions for ways we can avoid fetching every /Page dictionary in this case.
Beta Was this translation helpful? Give feedback.
All reactions