Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse error in Acrobat Reader #29

Closed
kiwiwings opened this issue Nov 28, 2020 · 12 comments
Closed

Parse error in Acrobat Reader #29

kiwiwings opened this issue Nov 28, 2020 · 12 comments

Comments

@kiwiwings
Copy link

Hello Emmeran,

thank you for providing 0.29.

I have another request, which I haven't yet tried to investigate thoroughly myself.
In #64876 the user mentions an error when opening in Adobe Reader.
Webkit based PDF renderer (e.g. Chrome) or typical Linux PDF viewer don't complain and opening the file just fine.

I've opened the referenced file in Adobe Acrobat 6.0 and it fails with mentioning "Type 3 fonts".

In PdfBoxGraphics2DFontTextDrawer.mapFont I see the calls to PDType0Font.load() vs. the PDFBox page mentions PDTrueTypeFont.loadTTF() and PDType1AfmPfbFont::new

Currently Apache POI extends PdfBoxGraphics2DFontTextDrawer via PDFFontMapper, which adds fonts on the fly. hasDynamicFontMapping() is currently not overridden - in case this might affect the font embedding.

Thank you, Andi.

test4.pdf

@rototor
Copy link
Owner

rototor commented Nov 28, 2020

Hi Andi,

Acrobat 6 gives at least a hint, the current Acrobat Reader DC just says
image
which is not helpful at all.

Regarding PDFFontMapper:

Be aware that when using system installed fonts they may fail to embed with the current PDFBox library, because PDFBox now respects the "allow embed" flag in the font file. I had to adjust my font fallback logic for that (see 3577568). I.e. registerFont() will always work, but when trying to draw text mapFont() will fail with an exception if the font used is not allowed to be embedded.

Using the fonts installed in the system seems sensible. But you must be prepared to encounter "bad" fonts. There are many different version of the same font (e.g. Arial) depending on the OS and OS version used. You will not be able to reproduce problems if you don't have the same font...

Regarding PDTrueTypeFont.loadTTF() vs PDType0Font.load(), it's rather a bug that they list PDTrueTypeFont.loadTTF() as it does not support Unicode. The PDTrueTypeFont.load() mentions this in it's comment. So for TrueType fonts PDType0Font.load() should be the way to go.

Current PdfBoxGraphics2DFontTextDrawer does not support loading .afm (PostScript) fonts. Supporting those would not be a problem. If someone has such a font and really wants to support it, he can just override mapFont() and load it. AFAIK .afm fonts are kind of "legacy", nowadays you just use .ttf fonts.

And well, you linked to a PDFBox 1.8 documentation. We are using PDFBox 2. Yes, the online documentation for PDFBox 2 is not that great at the moment...

==> This seems more like a PDFBox problem. Something goes wrong when embedding the font.

To investigate that future the underlying original .ttf is required. If I understand you correctly, you can not reproduce that problem because you don't have a matching pptx file, the one included in the bug report seems to be empty (at least for me when trying to open it in LibreOffice). And I'm pretty sure that you may even be able to correctly convert the pptx to pdf, as you might have other (not broken?) fonts locally installed.

So: Without the font file and the maybe even a sample .pptx file there is nothing I can do here. It also does not make sense to raise an issue at PDFBox, as we can not reproduce that problem.

As a side note: you should at least set the pdf version to 1.5, better to 1.7 (i.e. document.setVersion(1.7)). Most PDF readers just ignore this setting, but some features where not possible with 1.4. e.g. 16 bit images etc.

Greetings and have a nice weekend

Emmeran

@kiwiwings kiwiwings changed the title Type 3 font error in Acrobat Reader Parse error in Acrobat Reader Nov 30, 2020
@kiwiwings
Copy link
Author

I could't reproduce the font error with a simple test case.
Therefore I've generated the original pptx without the drawString() calls - i.e. no fonts (/texts) are embedded.
So it's probably related to the shapes somehow.

test-pptx.pdf

@rototor
Copy link
Owner

rototor commented Nov 30, 2020

So I suspect it might be some error when mapping the paints. Can you please attach the source pptx here? I can not edit a PDF directly, I can only "regenerate" it and see if I can isolate the error.

I would get a POI 5.0 snapshot and use PPTX2PNG to trigger the convert.

Thanks

@kiwiwings
Copy link
Author

Sorry for keeping you waiting, here's a modified version. The only text left is in the embedded wmf.
test.pptx
I suspect the error in an non-closed shape/path.

@rototor
Copy link
Owner

rototor commented Dec 14, 2020

Thanks, I try to look into this, but I likely wont have time before the holidays.

@kiwiwings
Copy link
Author

I have another version which doesn't contain the wmf. I'm currently rewiring a DummyGraphics2D context to log the statements.
Maybe I can come up with something with less dependencies.
test.mod.pptx

rototor added a commit that referenced this issue Dec 16, 2020
rototor added a commit that referenced this issue Dec 16, 2020
@rototor
Copy link
Owner

rototor commented Dec 16, 2020

@kiwiwings I was able to build a small test using POI 4.1.2 which shows this problem using the test.mod.pptx in Acrobat Reader. So the problem is in the Graphics2d adapter.

@kiwiwings
Copy link
Author

Find attached an AWT only example - it's still a bit noisy, but I think you can strip it down easily to find the culprit.
PDFTest2.java.zip

rototor added a commit that referenced this issue Dec 17, 2020
The used font does not matter at all.
@rototor
Copy link
Owner

rototor commented Dec 17, 2020

I've integrated your test, but still have to investigate it. But would you be so kind and provide your DummyGraphics2D? That would make a useful tool to investigate future problems like this. (I would put it into a pdfbox-graphics2d:tools module)

@rototor
Copy link
Owner

rototor commented Dec 17, 2020

This problem is somewhat an "user error". The test has some BasicStroke's with a miter limit of 0.25. This is in violation with the BasicStroke constructor, as the minimum value specified there is 1f. The BasicStroke maps 1:1 to PDF (i.e. the values can be written 1:1 into the content stream of the PDF). And so seem to map the limits.

I fixed this by clipping the miter limit with a Math.max(1f,miterLimit). It is never the less an error in POI to have a miterLimit < 1f.

Note: I will make a new release after PDFBox 2.0.22 is released. The release votes are currently running, should be a matter of days.

@kiwiwings
Copy link
Author

Thank you finding the source, I wasn't aware of this ... I'll clip it accordingly then.
Not sure if this of any help ... but here is the makeshift DummyGraphics2d.java.zip

@rototor
Copy link
Owner

rototor commented Dec 20, 2020

PDFBox 2.0.22 was released yesterday, I've just released version 0.30 of this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants