Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF size when using OTF fonts. #36

Open
pveber opened this issue Feb 15, 2024 · 4 comments
Open

PDF size when using OTF fonts. #36

pveber opened this issue Feb 15, 2024 · 4 comments

Comments

@pveber
Copy link

pveber commented Feb 15, 2024

vg includes OTF fonts when it generates PDF files. This leads to rather large files, compared to what can be obtained with e.g. matplotlib, where only glyphs that are actually used in the document are included. I'd like to have a try at this, would you consider such a contribution?

@dbuenzli
Copy link
Owner

Why not but I'm not sure font subsetting is an entirely trivial task.

So before getting into subsetting, I'd rather have basic compression which would also be beneficial for the vector data of images which sometimes also grows quite big with the current renderer1. Font data also compresses quite well. Did you try to do the transform mentioned here on the files you generate ? Would the results satisfy you ?

If that is the case I think it would be easier and bring more benefits to try to extend the PDF renderer with an optional string -> (string, string) result function (which can be plugged e.g. with Zipc_deflate.deflate or your favourite deflate implementation) optional argument that when present is used to deflate the object streams.

What do you think ?

Footnotes

  1. In one of my uses of vg I generate 129MB pdfs which after stream compression via the gs rune are reduced to ~15MB pdfs

@pveber
Copy link
Author

pveber commented Feb 16, 2024

Thanks for your feedback!

Did you try to do the transform mentioned here on the files you generate ? Would the results satisfy you ?

Damn, I missed that paragraph. On my use case, the document shrinks from 2.4 MB to 517 KB with cpdf and to 62 KB with gs. This is super nice, and it comes for free!

What do you think ?

For my current need, the case is settled. I think both compression and subsetting would be nice, in particular for the numerous crowd (of which I'm a sorry representative) that does not read the manual until the end. Also it might not be convenient having to perform an external call to get the pdf right. Your proposal seems to be lighter to implement than subsetting and is nicely composable, it looks promising to me. On the other hand subsetting does look important to me: for instance in the DejaVu font, I counted more 3000 glyphs, while in a typical plot I'm unlikely to see more than 30 effectively used. With compression only (see the figures above, using cpdf) the generated files will still remain abnormally large.

Having looked at the code in Vgr_pdf, it's true that subsetting requires quite some changes. I'll have a look at it next week and report on how it went.

Thanks again Daniel!

@dbuenzli
Copy link
Owner

On the other hand subsetting does look important to me: for instance in the DejaVu font, I counted more 3000 glyphs, while in a typical plot I'm unlikely to see more than 30 effectively used.

I think you should be able to evaluate the gain with the -dSubsetFonts option of gs.

Having looked at the code in Vgr_pdf, it's true that subsetting requires quite some changes.

I'm not sure that's the complicated bit, I expect that you should simply add state to the renderer that collects glyph ids per font.

The complicated bit is rather how to do it correctly in PDF. There is a bit about it in §9.6.4 of ISO 32000-1:2008. The other problem is that you will likely need to re-encode OpenType tables which otfm doesn't support at the moment and make sure all the tables required by PDF are there (see table 126).

@pveber
Copy link
Author

pveber commented Feb 22, 2024

I think you should be able to evaluate the gain with the -dSubsetFonts option of gs.

Alas, it seems gs ignores -dSubsetFonts for PDF files (that's what I observe too).

Right maybe it's a bit more than I can chew at the moment. Maybe a useful intermediate step would be to add an encoder in otfm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants