-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generation of pdf is too slow for large html #506
Comments
hi @Infinity821 , Using the master branch and version 1.0.3, I've been able to generate the pdf using the attached test html. Code for pdf generation (note, I was not able to find the correct font for try (OutputStream os = new FileOutputStream("out.pdf")) {
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.useFont(new File("PMINGLIU.ttf"), "PMingLiU");
builder.useFont(new File("PMINGLIU.ttf"), "PMingLiU-ExtB");
builder.useFont(new File("seguiemj.ttf"), "Segoe UI Emoji");
builder.useFastMode();
builder.withFile(new File("test" +
".html"));
builder.toStream(os);
builder.run();
} resulting pdf: By the way, have you tried with the version 1.0.3? (Using a 16gb ram ryzen 1700 pc, java 11, default heap configuration, execution time 5716ms) |
I've noticed that with heavy mixed font text, up to 80% of cpu self-time is spent initialising the P.s. According to VisualVM. @Infinity821 , can you try cpu sampling with visualvm and posting a screenshot of hotspots? |
I've got the same issues with some very (very) large HTML files (up to 600 MB). I have several files that ends up in a OOM, so I had to test some smaller files ( ~ 22 MB) I can confirm that many IllegalArgumentException are raised, as seen in the following screenshot (from a JFR recording): Unfortunately I can't test a larger file due to the memory limitation (-Xmx13g -XX:+UseG1GC). Here is some other useful metrics : Is there any way to prevent OOM (even if the generation takes longer) @danfickle I'm willing to provides some HTML sample in PM if you need to |
The biggest problem seems to be caused by the numerous zerowidthspace characters inserted for whitespace contained within the HTML. It is not available in Helvetica and width should just be zero (name says it). I checked the HTML for any zerowidthspaces that I could remove, but they seem to be inserted internally. 🤷♂️
|
I'm trying to convert html with 30 MB and it takes around 50 sec anyway to enhance this. |
I am now using version 1.0.2, but the pdf build is still hang.
The size of html is 13241929
I have tried many times and increased the heap size to 4G.
My running machine is i5 4460, 16G RAM.
Attafched with the test html
test.txt
My code for pdf generation is as follow:
Originally posted by @Infinity821 in #180 (comment)
The text was updated successfully, but these errors were encountered: