-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#683] Font reuse and override introduced to support multiple runs against the same target PDF document #684
base: open-dev-v1
Are you sure you want to change the base?
Conversation
- font structure duplication on multiple runs against the same target PDF document fixed - CSS-imported fonts overriding via user-supplied fonts implemented
[PR commit: 7115b0a9985ad9eafcec0dba412d1dea810d0862] Here it is a demonstration of its use (see generating code below), applying the same source HTML file for 3 runs against the same target PDF document: run 1 and 2 generate two identical pages with overridden font (see left thumbnail), while run 3 generates a page with the original CSS-imported font (see right thumbnail): The resulting PDF document, contrary to the current OHTP implementation, doesn't have any font structure duplication: Source HTML: Generating code: import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.w3c.dom.Element;
import com.openhtmltopdf.extend.impl.FSDefaultCacheStore;
import com.openhtmltopdf.pdfboxout.PdfBoxRenderer;
import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;
import com.openhtmltopdf.pdfboxout.PdfRendererBuilder.CacheStore;
public class FontReuseCase {
public static void main(String[] args) throws Exception {
/*
* NOTE: In this case we use as source an HTML file featuring two paragraphs styled with
* distinct fonts ('MyFont' and 'AnotherFont'), both imported via @font-face rules.
*
* The target PDF document is built through 3 runs sharing the same state (font cache
* included). The objective is to generate a PDF document without font structure duplica-
* tions and enabling font override.
*/
try (PDDocument document = new PDDocument()) {
/*-
* Run 1: Renderer configuration is prepared from scratch.
*
* - 'MyFont' import via @font-face is OVERRIDDEN
* - 'AnotherFont' import via @font-face is ACTIVE
*/
PdfRendererBuilder rendererBuilder = new PdfRendererBuilder().usePDDocument(document)
.useCacheStore(CacheStore.PDF_FONT_METRICS, new FSDefaultCacheStore())
// Override font 'MyFont' declared by CSS @font-face!
.useFont(new File(
/*-
* Font name: Gentium
* Copyright: Copyright (c) 2003-2005, SIL International
* (http://scripts.sil.org/). All Rights Reserved.
* License: SIL Open Font License, Version 1.0.
*/
"GenR102.TTF"),
"MyFont")
// Update 'myFont' paragraph content to signal that its font was overridden!
.addDOMMutator(dom -> {
try {
XPath xPath = XPathFactory.newInstance().newXPath();
((Element) xPath.evaluate("//p[@id='myFont']", dom,
XPathConstants.NODE))
.setTextContent("This is MyFont (overridden)");
} catch (XPathExpressionException e) {
e.printStackTrace();
}
});
try (PdfBoxRenderer renderer = rendererBuilder
.withFile(new java.io.File("683-font_reuse.html"))
.buildPdfRenderer()) {
renderer.createPDFWithoutClosing();
// Create a new builder for run 2, inheriting the current configuration!
rendererBuilder = renderer.toBuilder();
}
/*-
* Run 2: Renderer configuration is reused as-is (the new page should be identical to
* the previous iteration, without font structure duplication).
*
* - 'MyFont' import via @font-face is OVERRIDDEN
* - 'AnotherFont' import via @font-face is ACTIVE
*/
try (PdfBoxRenderer renderer = rendererBuilder
.withFile(new java.io.File("683-font_reuse.html"))
.buildPdfRenderer()) {
renderer.createPDFWithoutClosing();
// Create a new builder for run 3, inheriting the current configuration!
rendererBuilder = renderer.toBuilder();
}
/*-
* Run 3: Renderer configuration is reused dropping the font override (the new page
* should show the original font instead of the overriding one).
*
* - 'MyFont' import via @font-face is ACTIVE
* - 'AnotherFont' import via @font-face is ACTIVE
*/
// Remove the font override (next rendering will use the font declared by CSS @font-face)!
rendererBuilder.dropFont("MyFont");
rendererBuilder.dropDOMMutators() /* Removes 'myFont' paragraph content updater */;
try (PdfBoxRenderer renderer = rendererBuilder
.withFile(new java.io.File("683-font_reuse.html"))
.buildPdfRenderer()) {
renderer.createPDFWithoutClosing();
}
try (OutputStream os = new FileOutputStream("683-font_reuse.pdf")) {
document.save(os);
}
}
}
} |
…skipping state._caches initialization)
Firstly huge thanks for contributing and I'm sorry it took me a month to respond. There is obviously a large amount of work investigating and coding this PR. When I first read through your issue I wasn't sure it was possible to share fonts at all so I created some code in #695 and found it is possible but not for I understand this PR is about addressing that gap. However, I think it is a lot of code to maintain for a use case that may be quite rare. This project is not a general purpose browser that will handle any HTML, effectively HTML must be crafted for it. Therefore, I'm thinking that most users will be able to use code to add fonts, especially in the advanced case of putting multiple html documents into one PDF. The problem is that the font code has already become too complex (my fault). Can you convince me that there are more use-cases that I'm missing? Thanks again, |
Hi @danfickle,
Since, as you know,
Putting multiple HTML documents into one PDF should be considered an obvious generalization, NOT a special case: the ability to smoothly accommodate disparate cases through generalization is what makes software not just workable, but actually powerful. This PR is about the same use case as your implementation for last y position related to #427 and #662: allowing multiple contents (possibly from multiple documents) to get appended to the same target document. IMO, generalizing your engine workflow to support multiple runs over the same target document is a big bonus which requires only limited, backward-compatible tweaks to the existing codebase (contrary to your perception, the changes introduced by this PR are really simple and neat, just an optional font cache used to back |
OK, you make a powerful case! Perhaps I was being a bit lazy (not like me at all!). I still have some minor issues before merging.
Thanks again. P.S. I need to change that description in the readme! We clearly now only render a "reasonable subset" of the HTML/CSS standards. |
This is the implementation of the proposal discussed in #683 to solve the issues impairing multiple runs against the same target PDF document, in particular:
com.openhtmltopdf.pdfboxout.PdfBoxFontResolver.FontCache
) that keeps track oforg.apache.pdfbox.pdmodel.font.PDFont
instances realized by correspondingcom.openhtmltopdf.pdfboxout.PdfBoxFontResolver.FontDescription
objects across multiple runs, preventing font structures from being reimported at each run;