-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major performance issues for a simple PDF file #6961
Comments
The CPU profiler shows that most of the time is spent in Lines 144 to 200 in a0aa781
To profile (using Chrome), simply:
|
Thank you for profiling this! This code seems to loop over (potentially) all ExtGState and XObject dictionaries, and there are a lot of them in this PDF file. I'm afraid the |
I have narrowed down the flaw a bit. For the first page, Line 182 in a0aa781
|
This seems to do the trick, but I've not had time to run tests yet: master...Snuffleupagus:issue-6961. |
Funny, I also tried using |
So, my patch seems to pass all tests locally, but there're two problems:
|
The only test I can imagine is a unit test where we assert that the number of processed XObjects is less than it is currently. It's not great, but it's the only kind of test I can come up with since measuring the runtime is not an option. Otherwise I think it suffices to review the patch, test it manually and make sure that the test suite passes. Regarding the second point, I would have to look into this more. If they are equal then the current code should do, so I'm not yet sure what the difference is with your patch. |
27 seconds for such a simple PDF is excessive. We could create a suite of PDFs whose rendering time is measured (can be as simple as subtracting two time stamps), and then report the results to some central place. If we occasionally look at the results (e.g. a table, or a fancy graph), then we should get a good picture of what rendering times are normal and detect performance regressions.
What exactly is unclear? Replacing getAll with getKeys seems like an obvious boost, is there something else with your patch that magically improves the runtime? |
The
|
Using (this is my guess, I didn't check whether it is really the reason). |
The reason above somewhat right. We had a thought to disable getAll(), not for performance but because it can pull recursively not-needed data into operator list. We probably need to review all getAll usages and remove it. |
@Rob--W You are absolutely correct, I don't know how I missed that myself. Thank you! |
When an xobject is a group we were double applying the matrix and bounding box. This improves mozilla#6961 quite a bit, but it still is missing the indention in the ruler.
When an xobject is a group we were double applying the matrix and bounding box. This improves mozilla#6961 quite a bit, but it still is missing the indention in the ruler.
In `beginGroup` we create a new canvas that is the size of the bounding box and we translate it to the offset. This means we don't need to also apply the bounding box during `paintFormXObjectBegin`. This improves mozilla#6961 quite a bit, but it still is missing the indention in the ruler.
In `beginGroup` we create a new canvas that is the size of the bounding box and we translate it to the offset. This means we don't need to also apply the bounding box during `paintFormXObjectBegin`. This improves mozilla#6961 quite a bit, but it still is missing the indention in the ruler.
In `beginGroup` we create a new canvas that is the size of the bounding box and we translate it to the offset. This means we don't need to also apply the bounding box during `paintFormXObjectBegin`. This improves mozilla#6961 quite a bit, but it still is missing the indention in the ruler.
I have created a simple PDF file using Scribus 1.5.0svn. Notice that the file size is large for a two-page PDF file, so I can only suspect that Scribus is doing something really unefficient when exporting the PDF file. Nevertheless, the PDF file below renders instantly with both Adobe Acrobat Reader DC and Foxit Reader (within 0.5 seconds), however PDF.js takes 27 seconds to render only the first page of this file. I have no idea why PDF.js is taking such an excessive amount of time, but we need to do better here, given that other viewers do not have any problems with this file. Are there perhaps inefficient patterns in this file that the optimizer could remove? I notice that the PDF file contains an excessive amount of resources in the Resources dictionary of each page (XObject, Font, Pattern and ExtGState).
Below is the PDF file. I made this myself, so anyone is free to use this as a test case in a PR that addresses this issue:
test.pdf
The text was updated successfully, but these errors were encountered: