Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#683] Font reuse and override introduced to support multiple runs against the same target PDF document #684

Open
wants to merge 3 commits into
base: open-dev-v1
Choose a base branch
from

Conversation

stechio
Copy link

@stechio stechio commented Apr 3, 2021

This is the implementation of the proposal discussed in #683 to solve the issues impairing multiple runs against the same target PDF document, in particular:

  • font reuse: this implementation introduces a concrete-font cache (com.openhtmltopdf.pdfboxout.PdfBoxFontResolver.FontCache) that keeps track of org.apache.pdfbox.pdmodel.font.PDFont instances realized by corresponding com.openhtmltopdf.pdfboxout.PdfBoxFontResolver.FontDescription objects across multiple runs, preventing font structures from being reimported at each run;
  • font override: this implementation introduces CSS-imported fonts overriding via user-supplied fonts, allowing users to programmatically substitute fonts at will.

- font structure duplication on multiple runs against the same target PDF document fixed
- CSS-imported fonts overriding via user-supplied fonts implemented
@stechio
Copy link
Author

stechio commented Apr 3, 2021

[PR commit: 7115b0a9985ad9eafcec0dba412d1dea810d0862]

Here it is a demonstration of its use (see generating code below), applying the same source HTML file for 3 runs against the same target PDF document: run 1 and 2 generate two identical pages with overridden font (see left thumbnail), while run 3 generates a page with the original CSS-imported font (see right thumbnail):

683-font_reuse_1 683-font_reuse_2

The resulting PDF document, contrary to the current OHTP implementation, doesn't have any font structure duplication:
683-font_reuse.pdf

Source HTML:
683-font_reuse.html

Generating code:

import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;

import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.w3c.dom.Element;

import com.openhtmltopdf.extend.impl.FSDefaultCacheStore;
import com.openhtmltopdf.pdfboxout.PdfBoxRenderer;
import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;
import com.openhtmltopdf.pdfboxout.PdfRendererBuilder.CacheStore;

public class FontReuseCase {
    public static void main(String[] args) throws Exception {
        /*
         * NOTE: In this case we use as source an HTML file featuring two paragraphs styled with
         * distinct fonts ('MyFont' and 'AnotherFont'), both imported via @font-face rules.
         * 
         * The target PDF document is built through 3 runs sharing the same state (font cache
         * included). The objective is to generate a PDF document without font structure duplica-
         * tions and enabling font override.
         */
        try (PDDocument document = new PDDocument()) {
            /*-
             *  Run 1: Renderer configuration is prepared from scratch.
             *
             *  - 'MyFont' import via @font-face is OVERRIDDEN
             *  - 'AnotherFont' import via @font-face is ACTIVE
             */
            PdfRendererBuilder rendererBuilder = new PdfRendererBuilder().usePDDocument(document)
                    .useCacheStore(CacheStore.PDF_FONT_METRICS, new FSDefaultCacheStore())
                    // Override font 'MyFont' declared by CSS @font-face!
                    .useFont(new File(
                            /*-
                             * Font name: Gentium
                             * Copyright: Copyright (c) 2003-2005, SIL International 
                             *            (http://scripts.sil.org/). All Rights Reserved.
                             * License: SIL Open Font License, Version 1.0.
                             */
                            "GenR102.TTF"),
                            "MyFont")
                    // Update 'myFont' paragraph content to signal that its font was overridden!
                    .addDOMMutator(dom -> {
                        try {
                            XPath xPath = XPathFactory.newInstance().newXPath();
                            ((Element) xPath.evaluate("//p[@id='myFont']", dom,
                                    XPathConstants.NODE))
                                            .setTextContent("This is MyFont (overridden)");
                        } catch (XPathExpressionException e) {
                            e.printStackTrace();
                        }
                    });
            try (PdfBoxRenderer renderer = rendererBuilder
                    .withFile(new java.io.File("683-font_reuse.html"))
                    .buildPdfRenderer()) {
                renderer.createPDFWithoutClosing();

                // Create a new builder for run 2, inheriting the current configuration!
                rendererBuilder = renderer.toBuilder();
            }

            /*-
             *  Run 2: Renderer configuration is reused as-is (the new page should be identical to
             *  the previous iteration, without font structure duplication).
             *
             *  - 'MyFont' import via @font-face is OVERRIDDEN
             *  - 'AnotherFont' import via @font-face is ACTIVE
             */
            try (PdfBoxRenderer renderer = rendererBuilder
                    .withFile(new java.io.File("683-font_reuse.html"))
                    .buildPdfRenderer()) {
                renderer.createPDFWithoutClosing();

                // Create a new builder for run 3, inheriting the current configuration!
                rendererBuilder = renderer.toBuilder();
            }

            /*-
             *  Run 3: Renderer configuration is reused dropping the font override (the new page
             *  should show the original font instead of the overriding one).
             *
             *  - 'MyFont' import via @font-face is ACTIVE
             *  - 'AnotherFont' import via @font-face is ACTIVE
             */
            // Remove the font override (next rendering will use the font declared by CSS @font-face)!
            rendererBuilder.dropFont("MyFont");
            rendererBuilder.dropDOMMutators() /* Removes 'myFont' paragraph content updater */;
            try (PdfBoxRenderer renderer = rendererBuilder
                    .withFile(new java.io.File("683-font_reuse.html"))
                    .buildPdfRenderer()) {
                renderer.createPDFWithoutClosing();
            }
            try (OutputStream os = new FileOutputStream("683-font_reuse.pdf")) {
                document.save(os);
            }
        }
    }
}

@stechio stechio changed the title #683 Font reuse and override introduced to support multiple runs against the same target PDF document [#683] Font reuse and override introduced to support multiple runs against the same target PDF document Apr 3, 2021
@danfickle
Copy link
Owner

Firstly huge thanks for contributing and I'm sorry it took me a month to respond. There is obviously a large amount of work investigating and coding this PR.

When I first read through your issue I wasn't sure it was possible to share fonts at all so I created some code in #695 and found it is possible but not for @font-face fonts.

I understand this PR is about addressing that gap. However, I think it is a lot of code to maintain for a use case that may be quite rare. This project is not a general purpose browser that will handle any HTML, effectively HTML must be crafted for it. Therefore, I'm thinking that most users will be able to use code to add fonts, especially in the advanced case of putting multiple html documents into one PDF. The problem is that the font code has already become too complex (my fault).

Can you convince me that there are more use-cases that I'm missing?

Thanks again,
@danfickle.

@stechio
Copy link
Author

stechio commented May 5, 2021

Hi @danfickle,
glad that you finally popped up.

[...] it is possible but not for @font-face fonts.

[...] This project is not a general purpose browser that will handle any HTML, effectively HTML must be crafted for it. Therefore, I'm thinking that most users will be able to use code to add fonts [...]

Since, as you know, @font-face is the standard, basic mechanism for font definition in modern HTML, and this library is about "rendering arbitrary well-formed XML/XHTML (and even HTML5) using CSS 2.1 for layout and formatting" (as claimed in the project overview), the exclusion of that standard, basic mechanism from caching on the assumption that users should ditch that mechanism in favor of font addition via custom code sounds more like an ugly workaround to an arbitrary functional limitation rather than a solid design choice. IMO, font management should behave consistently, no matter if fonts are user-supplied or CSS-defined. I agree with you that, currently, font code (including latest FontStores refactoring) is affected by unnecessary complexity, but that can't be an excuse to fend innovation: if font code needs simplification, it has to be simplified; otherwise, its weakness is going to hinder, sooner or later, its future developments. As Frank Lloyd Wright used to say, an architect can only advise his client to plant vines to hide his/her mistakes... but a programmer can do much better ;)
(BTW, along with cache support to @font-face, this PR provides also support to font overriding, which allows users to apply arbitrary fonts in place of the ones defined in the input HTML file)

[...] most users will be able to use code to add fonts, especially in the advanced case of putting multiple html documents into one PDF.

Putting multiple HTML documents into one PDF should be considered an obvious generalization, NOT a special case: the ability to smoothly accommodate disparate cases through generalization is what makes software not just workable, but actually powerful. This PR is about the same use case as your implementation for last y position related to #427 and #662: allowing multiple contents (possibly from multiple documents) to get appended to the same target document. IMO, generalizing your engine workflow to support multiple runs over the same target document is a big bonus which requires only limited, backward-compatible tweaks to the existing codebase (contrary to your perception, the changes introduced by this PR are really simple and neat, just an optional font cache used to back FontDescription.realizeFont(..) method (which, as a by-product, has been consolidated into a more robust algorithm)).

@danfickle
Copy link
Owner

OK, you make a powerful case! Perhaps I was being a bit lazy (not like me at all!).

I still have some minor issues before merging.

  • I'm happy with the font cache code other than the public API names. useFontCache(FontCache fontCache) sound like it is a generic font cache that can be used between multiple PDDocument instances. I'm struggling to think of an elegant name, maybe useMultiDocumentFontShare, you'll probably have something better. Also FontCache needs at least to be FSFontCache. Can we add a javadoc to this method?
  • I'm not so happy with the code that makes the builder partially reusable and cloneable. It complicates the builder (which already has a lot of methods) and is not needed. Can you revert changes to the builder (other than the font cache setter), builder state and the toBuilder method of the renderer?
  • _fontDescriptions.add(0, descr); in FontFamily I think needs to be reverted. We don't have many tests for font-fallback but it may break client code in the wild. If we need a method to override @font-face fonts from code, we could add an enum member to FSFontUseCase perhaps instead?
  • Finally we need a test of the font-cache. If you've got time, you could add one along the lines of #683 #684 Test of avoiding double embedding fonts #695. If not, I can do it after merging this PR in #683 #684 Test of avoiding double embedding fonts #695.

Thanks again.

P.S. I need to change that description in the readme! We clearly now only render a "reasonable subset" of the HTML/CSS standards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants