Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge increase in PDF filesize since 1.3.12 #450

Closed
renber opened this issue Nov 4, 2020 · 25 comments · Fixed by #451 or #456
Closed

Huge increase in PDF filesize since 1.3.12 #450

renber opened this issue Nov 4, 2020 · 25 comments · Fixed by #451 or #456
Labels

Comments

@renber
Copy link
Contributor

renber commented Nov 4, 2020

Describe the bug
After updating to the latest version we noticed that the pdfs we create using OpenPDF have massively increased in size.
(e.g. a PDF with ~200 pages containing nested tables was 10MB and now is 150MB).
I have traced it back to changes in 1.3.12. If I use a OpenPdf version < 1.3.12 our software outputs a 10MB file, If I update OpenPDF to anything >= 1.3.12 the very same code produces a 150MB file.

To Reproduce
Currently I have no example, since we are running this in production. But it is solely dependent on changes in OpenPDF since reverting back to an old version results in smaller pdfs without changing any of the pdf generation code.
Maybe someone has an idea where this behavior might stem from?
Preparing an example is possible, but will take some time.

@renber renber added the bug label Nov 4, 2020
@andreasrosdal
Copy link
Contributor

Thank you for reporting.

These are the release notes for 1.3.12:
https://github.com/LibrePDF/OpenPDF/releases/tag/1.3.12

@andreasrosdal
Copy link
Contributor

Can you please submit a Pdf file generated with the two relevant versions of OpenPDF, it will be useful to understand the problem.

@renber
Copy link
Contributor Author

renber commented Nov 6, 2020

I have created two pdf files (with sensitive information removed) for comparison. Although the actual document has 198 pages I uploaded only 20 pages for each since the difference is already visible.

File created with OpenPDF 1.2.17 ~1MB
File created with OpenPDF 1.3.22 ~14MB

When opened in a PDF Viewer the files look the same.

@mkl-public
Copy link
Contributor

The large PDF has more than 180000 indirect objects and all but at most 150 are extended graphics state dictionaries setting foreground or background opacity to 1.
The small PDF has merely 100 objects, none of those extended graphics state dictionaries at all.

From the release notes the obvious candidate is

Fix PdfPCell background color ignoring alpha channel of an rgba #282

@sixdouglas
Copy link
Contributor

@mkl-public what tool do you use to see this indirect objects?

@mkl-public
Copy link
Contributor

what tool do you use to see this indirect objects?

A standard text viewer (the Total Commander built-in one in my case but most text viewers should do).

109 0 obj
<<
/ca 1
>>
endobj
110 0 obj
<<
/ca 1
>>
endobj
111 0 obj
<<
/CA 1
>>
endobj
112 0 obj
<<
/ca 1
>>
endobj 
...
180567 0 obj
<<
/CA 1
>>
endobj
180568 0 obj
<<
/ca 1
>>
endobj
180569 0 obj
<<
/ca 1
>>
endobj 

@sixdouglas
Copy link
Contributor

@renber do you think you could provide a simple Java program to reproduce these tables ?

@sixdouglas
Copy link
Contributor

@renber could you give this branch build a try?

@asturio asturio closed this as completed in 0abe991 Nov 9, 2020
@asturio asturio linked a pull request Nov 9, 2020 that will close this issue
@asturio
Copy link
Member

asturio commented Nov 9, 2020

Or try the SNAPSHOT Version :-) Already merged.

@asturio asturio reopened this Nov 9, 2020
@renber
Copy link
Contributor Author

renber commented Nov 9, 2020

Thank you.
I have tested the latest master and there is already an improvement. Unfortunately, the resulting file is still around 7 times larger than that created with OpenPDF 1.2.17. So, there seems to be another thing contributing to the size.

File created with OpenPDF 1.3.24-SNAPSHOT (2020/11/09) ~7MB

I can prepare a sample program, but it will take some time.

@sixdouglas
Copy link
Contributor

@renber it would be really nice to have this sample because your tables are really complicated to reproduce.

@renber
Copy link
Contributor Author

renber commented Nov 10, 2020

I have created a sample repository with a program which replicates the table structure (as far as possible by stripping away everything unnecessary): https://github.com/renber/OpenPDF_Issue450

@sixdouglas
Copy link
Contributor

@renber
Super, merci
I'll have a look at it. Thanks

sixdouglas pushed a commit to sixdouglas/OpenPDF that referenced this issue Nov 10, 2020
@sixdouglas
Copy link
Contributor

sixdouglas commented Nov 10, 2020

@renber the new version from my branch should the good for you. If you can give it a try and let me know, it would be great.

@asturio
Copy link
Member

asturio commented Nov 10, 2020

@renber Please check the latest SNAPSHOT. Is the problem solved now?

@asturio asturio reopened this Nov 10, 2020
@renber
Copy link
Contributor Author

renber commented Nov 11, 2020

It is a further improvement, thank you. The pdf from the sample repository is fine now.
However, our production code still produces a larger file than with 1.2.17.
Although filesize has been reduced from 150MB to 40MB, this is still 4 times larger than the old file, unfortunately.
There still seems to be something contributing to the size.

How should we proceed in debugging this?

@arnthom
Copy link

arnthom commented Nov 11, 2020

Hi renber, have you tried to set the compression level of the pdf?
Maybe the default compression has changed since 1.2.17?
Use writer.setFullCompression(); or writer.setCompressionLevel(...)
with PdfStream.BEST_COMPRESSION or PdfStream.DEFAULT_COMPRESSION);

@renber
Copy link
Contributor Author

renber commented Nov 12, 2020

@arnthom: Unfortunately, neither setFullCompression() nor using BEST_COMPRESSION lead to a change in filesize (still 40MB).

@renber
Copy link
Contributor Author

renber commented Nov 12, 2020

I have investigated this a bit further. When using OpenPDF 1.3.12 as it is, the resulting file is 150MB. By reverting only the changes from the two commits from PR #282 the filesize drops to 10 MB like before. So there still seems to be something in this feat which does not play nice with our pdfs (and has not been caught yet by the changes in 5feac69 and a2f5e3b).

@sixdouglas
Copy link
Contributor

@renber is possible for you update your sample to be closer to the PDF generated in production? This way I'll be able to investigate further.

@renber
Copy link
Contributor Author

renber commented Nov 12, 2020

@sixdouglas: I created the sample by stripping away sensible (and hard to port) stuff, so I am afraid I cannot provide a better one within a reasonable time.

I made some changes to the OpenPDF code of 1.2.12 myself (based on the commit diff) and if I change the saveColorStroke and saveColorFill to NOPs i get the old behavior (10MB file) and the pdf looks the same (at least to me). Since the original feature PR added lots of calls to these methods, I suppose this is the problem (every call seems to write a new PdfGState to the pdf). Your last commits already reduced the number of times these PdfGState's were created, but it seems they are not needed for my pdf. Maybe more constraints to not create these states can be added, since we aren't using any opacity related table features in our code?
Alas, I have no expertise with the pdf format, so I do not know why these calls might be needed in the first place.

@sixdouglas
Copy link
Contributor

@renber can you give a sample of your production file in order to try to see how I can go further on this subject?

@sixdouglas
Copy link
Contributor

@renber : I pushed a new commit on my branch,
In this commit I want to avoid using the new GraphicState created during the saveState() function call, by resetting the values on the existing object.
Can you give it a try?

@renber
Copy link
Contributor Author

renber commented Nov 20, 2020

@sixdouglas: That's it. I have run our pdf generation with the code from your fix_#450/fixPdfSize branch and now the resulting file is back at ~10 MB. So this issue seems to be resolved with the changes you made. Thank you, this is much appreciated.
Looking forward to the next release version ;)

@sixdouglas
Copy link
Contributor

It really was a wild guess, but if it works for you it good for me! 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants