Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rendering a w3cdom document : infinite loop creation of TableCellBox #466

Closed
AlexisCothenet opened this issue Apr 16, 2020 · 4 comments
Closed

Comments

@AlexisCothenet
Copy link

AlexisCothenet commented Apr 16, 2020

Hello,

I found an OOM but cannot understand the reason. It seems there is a cascade of TableCellBox created using this html (i tried to keep it small but i seems the number of td inside the first tr is mandatory and the 2 others tr as well...) :

String bodyhtml=
                "<table style=\"border-collapse:separate;border:none;padding:0;margin:0;table-layout:fixed;width:711px\" width=\"711\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n" +
                        "<tbody>\n" +
                        "<tr style=\"height:1px\">"+
                        "<td style=\"border:none;padding:0\" width=\"91\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"45\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"1\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"1\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"75\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"20\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"52\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"17\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"55\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"17\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"74\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"15\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"2\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"74\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"17\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"21\"></td>"+
                        "<td style=\"border:none;padding:0\" width=\"87\"></td>"+
                        "</tr>" +
                        "<tr style=\"height:4px\">" +
                        "<td style=\"font-style:normal;font-family:Arial;font-size:1px;color:#000000;background-color:#ffffff;text-align:Left;vertical-align:Top;word-wrap:break-word;overflow:hidden;border-collapse:separate;border:none;padding-left:2px;padding-right:2px;padding-top:1px;padding-bottom:1px\" colspan=\"2\" rowspan=\"2\"> </td>" +
                        "</tr>" +
                        "<tr style=\"height:34px\">" +
                        "<td style=\"font-style:normal;font-family:Arial;font-size:1px;color:#000000;background-color:#ffffff;text-align:Left;vertical-align:Top;word-wrap:break-word;overflow:hidden;border-collapse:separate;border:none;padding-left:2px;padding-right:2px;padding-top:1px;padding-bottom:1px\"> </td>" +
                        "</tr>" +
                        "</tbody></table>";
Document doc = Jsoup.parse(htmContent);
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.useFastMode();
builder.withW3cDocument(new W3CDom().fromJsoup(doc), "");
builder.toStream(outStream);
builder.run();

The version of htmltopdf used is 1.0.2 (jsoup 1.13.1).

Here is the snapshot of the profiler heap dump analysis.
OOM_htmltopdf

@AlexisCothenet AlexisCothenet changed the title Out Of Memory error : rendering a w3cdom document rendering a w3cdom document : infinite loop creation of TableCellBox Apr 17, 2020
danfickle added a commit that referenced this issue Apr 21, 2020
@danfickle
Copy link
Owner

Hi @AlexisCothenet,

This bug is very concerning as it involves text breaking. I was able, after much trial and error, to reduce your test case to the following (no Jsoup needed):

<table style="width: 3px;table-layout: fixed;">
<tr>
 <td colspan="2"></td>
 <td style="word-wrap: break-word;">ABC</td>
</tr>
</table>

Now I have narrowed it down to fixed table layout with colspan (or rowspan) and break-word, I'll try to find the root cause and fix it.

As always, thanks for reporting.

@syjer
Copy link
Contributor

syjer commented Apr 28, 2020

hi @danfickle , it continue to loop inside https://github.com/danfickle/openhtmltopdf/blob/open-dev-v1/openhtmltopdf-core/src/main/java/com/openhtmltopdf/layout/InlineBoxing.java#L160 .

It continue to try to handle the "ABC" string.

lbContext.isFinished() is never finised / lbContext.getStartSubstring().length() is never 0.

@danfickle
Copy link
Owner

Well, this is embarrassing...

It turns out that replicating is simple as:

<div style="width: 0; word-wrap: break-word;">ABC</div>

Ie. Any zero width box with content and break-word will trigger it. This is a significant bug so I'll try to do a release soon with the fix. In the meantime, avoid break-word or make sure you do not have any boxes with zero width (calculated or explicit) such as in tables.

And yes, I should have tested this edge case when implementing break-word.

Thanks everyone.

@AlexisCothenet
Copy link
Author

Hello @danfickle ,

Is a release is planned soon for this problem ?
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants