-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get "unnecessary" range on first page #14570
Comments
Please note that disabling of streaming will generally lead to worse performance, compared to the default values.
Unfortunately this is the expected result of PRs such as #14311, #14335, #14358, and #14411 which were necessary in order to avoid serious problems in corrupt PDF documents.
Closing as INVALID, since we cannot "fix" this without breaking other PDF documents and that this issue is specific to badly generated PDF documents. |
@Snuffleupagus I already knew that pdfjs need Lines 177 to 179 in c37d785
In order to improve initialization performance, If I can probably get a correct Lines 1362 to 1366 in 263c895
become 👇 |
Having the same issue with a 2000+ page PDF that is definitely linearized. I'm also in the same position as @liu-dongyu in that we do a server-side integirty check already. Removing the line
solves the problem for the first page but not for subsequent pages. For example loading page 1,000 requires loading everything piror to 1,000. I think it's hard to say this is not a bug that is worth addressing. I think most people would prefer that all linearized PDFs load pages on demand - which is the point of linearizing - than have foolproof error checking to deal with corrupt PDFs. Couldn't we be given the option to skip this integrity check as one of the |
This comment was marked as duplicate.
This comment was marked as duplicate.
Leaving a comment for those that end up here from a search. Or a link on stack overflow. We also had this issue with PDFs that we were creating and linearizing with QPDF. We found that if we force The test.pdf in the original post is at PDF version 1.3. I downloaded it and used QPDF to transform it to version 1.6 That document renders the first page after 3 range requests. For anyone looking at this I believe you should still consider Snuffleupagus' comment:
If the user is likely to read/scroll through most of the PDF, or perform a search then the streaming mode is probably better. Edit: My testing performed with Windows 11, PDF.js 3.7.107, Chrome 115.0.5790.110, Apache 2.4 webserver |
Leaving a comment here as well, as I have a similar issue. Rewriting problem PDFs with QPDF does help, but it's not related to the PDF version. It's because QPDF restructures the file. A summary of the cause of the problem:
/Pages << /Kids [ all the /Page refs ] >>
In addition, pdf.js awaits each of those requests in turn, so for a 300 page document you are waiting for 300 XHRs one after the other. I recently had a PR accepted #18627 which will send the XHRs in parallel, which will mitigate the issue, but you'll still be sending 300 XHRs. QPDF fixes this because when it rewrites the PDF the structure is /Pages << /Kids [ all the /Page refs ] >> ... and so all the /Page dicts are within a small number of chunks. Although there's an argument that such interleaved PDFs are 'bad', they seem to be quite common from digitisation. I will start a discussion about whether there are circumstances where we can trust a PDF enough to use the /Pages->/Kids as a random access lookup table. Edit: #18637 |
Attach (recommended) or Link to PDF file here:
https://github.com/liu-dongyu/demo/blob/main/pdf/test.pdf
Configuration:
Steps to reproduce the problem:
disableAutoFetch: true
disableStream: true
rangeChunkSize: 65536 * 16
to getDocumentgetPage(1)
thenrender
What is the expected behavior? (add screenshot)
With v2.12.313,first page need 60 range request,online demo
source code
With v2.3.200,first page only need 3 range request (3 is the expected request times) online demo
source code
What went wrong? (add screenshot)
Why v2.12.313 need too many chunks to render first page and it is possible to optimize it ?
The text was updated successfully, but these errors were encountered: