Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemoryError for HTML with long line(s) #482

Closed
swarl opened this issue May 14, 2020 · 16 comments
Closed

OutOfMemoryError for HTML with long line(s) #482

swarl opened this issue May 14, 2020 · 16 comments

Comments

@swarl
Copy link

swarl commented May 14, 2020

Good evening

Version 1.0.2
The following HTML sent to my application produces a hell of an OutOfMemoryError:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>lineOfDeath</title></head><body><div class="nl2go_preheader" style="display: none !important; mso-hide:all !important; mso-line-height-rule: exactly; visibility: hidden !important; line-height: 0px !important; font-size: 0px !important;">Wohoo, I have some very important news<div style="display: none; max-height: 0px; overflow: hidden;">&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌&nbsp;‌</div><br><img src="http://news.some.url/5fhrqwkz-tb0bc4v7-wmg82uhc-11n6.gif" style="display:none" width="1" height="1" alt="" title=""></div> <table cellspacing="0" cellpadding="0" border="0" role="presentation" style="background-color: #C4C4C4; width: 100%;" class="nl2go-body-table" width="100%"> <tr> <td align="center" class="r0-c"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="660" style="table-layout: fixed; width: 660px;" class="r1-o">  <tr> <td style="background-color: #ffffff;" valign="top" class="r2-i"> <table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation"> <tr> <td class="r3-c"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r4-o"> <!-- --> <tr class="nl2go-responsive-hide"> <td style="font-size: 8px; line-height: 8px; background-color: #C4C4C4;" height="8">­</td> </tr> <tr> <td style="background-color: #C4C4C4;" class="r5-i"> <table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation"> <tr> <th width="100%" valign="top" class="r6-c"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r4-o"> <!-- --> <tr> <td style="" valign="top" class="r7-i"> <table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation"> <tr> <td class="r8-c" align="left"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r9-o">  <tr> <td align="center" valign="top" style="line-height: 16px; text-align: center; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 15px;" class="r10-i nl2go-default-textstyle"> <span class="nl2go_class_12_dark-blue_reg" style="color: #1E3650; font-family: Arial, Helvetica, sans-serif; font-size: 12px;">I am the newsletter of death <a href="http://news.some.url/5fhrqwkz-wmg82uhc-sa8xybo2-pvk" style="color: #4a769a; text-decoration: none;"><span class="nl2go_class_12_ocher_reg_u" style="color: #977537; font-family: Arial, Helvetica, sans-serif; font-size: 12px; text-decoration: underline;">hier</span><span class="nl2go_class_12_ocher_reg_u" style="color: #977537; font-family: Arial, Helvetica, sans-serif; font-size: 12px; text-decoration: underline;">.</span></a></span> </td> </tr> </table> </td> </tr> </table> </td> </tr> </table> </th> </tr> </table> </td> </tr> <tr class="nl2go-responsive-hide"> <td style="font-size: 8px; line-height: 8px; background-color: #C4C4C4;" height="8">­</td> </tr> </table> </td> </tr> <tr> <td class="r11-c" align="center"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r12-o"> <!-- --> <tr class="nl2go-responsive-hide"> <td style="font-size: 10px; line-height: 10px;" height="10" width="10">­</td> <td style="font-size: 10px; line-height: 10px;" height="10">­</td> <td style="font-size: 10px; line-height: 10px;" height="10" width="10">­</td> </tr> <tr> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="10">­</td> <td style="" class="r13-i"> <table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation"> <tr> <th width="" valign="top" class="r6-c"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r14-o"> <!-- --> <tr> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="10">­</td> <td style="" valign="top" class="r15-i"> <table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation"> <tr> <td class="r16-c" align="center"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="171" style="table-layout: fixed; width: 171px;" class="r17-o">  <tr> <td style="font-size: 0px; line-height: 0px;" class=""> <img src="http://news.some.url/lbp9n0eb/s_5fhrqwkz/files/HeaderLogo_1.png" width="171" border="0" style="display: block; width: 100%;" class=""> </td> </tr> </table> </td> </tr> </table> </td> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="10">­</td> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="20">­</td> </tr> </table> </th> <th width="66.67%" valign="top" class="r6-c"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r4-o"> <!-- --> <tr> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="10">­</td> <td style="" valign="top" class="r15-i"> <table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation"> <tr> <td class="r18-c" align="center"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r12-o">  <tr class="nl2go-responsive-hide"> <td style="font-size: 10px; line-height: 10px;" height="10">­</td> <td style="font-size: 10px; line-height: 10px;" height="10" width="10">­</td> </tr> <tr> <td style="" valign="top" class="r19-i"> <table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation"> <tr> <td class="r20-c" align="center"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class=""> <!-- --> <tr class="nl2go-responsive-hide"> <td style="font-size: 10px; line-height: 10px;" height="10">­</td> </tr> <tr> <td style="" class=""> <table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation"> <tr> <th width="50%" valign="top" class="r21-c"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r4-o"> <!-- --> <tr> <td style="" valign="top" class="r7-i"> <table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation"> <tr> <td class="r22-c" align="right"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100" style="table-layout: fixed; width: 100px;" class="r23-o">  <tr> <td align="right" valign="top" style="text-align: right; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 15px;" class="r24-i nl2go-default-textstyle"> <font color="#4a769a"><span style="font-size: 12px;"><a href="http://news.some.url/5fhrqwkz-wmg82uhc-yn4ivcoz-xdy" style="color: #4a769a; text-decoration: none;">K</a></span></font><a href="http://news.some.url/5fhrqwkz-wmg82uhc-5z8jvpxe-11uc" style="color: #4a769a; text-decoration: none;"><font color="#4a769a"><span style="font-size: 12px;">ommentierte Mustereingaben</span></font></a> </td> </tr> </table> </td> </tr> </table> </td> </tr> </table> </th> <th width="50%" valign="top" class="r21-c"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r4-o"> <!-- --> <tr> <td style="" valign="top" class="r7-i"> <table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation"> <tr> <td class="r22-c" align="right"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100" style="table-layout: fixed; width: 100px;" class="r23-o">  <tr> <td align="right" valign="top" style="text-align: right; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 15px;" class="r24-i nl2go-default-textstyle"> <font color="#4a769a"><span style="font-size: 12px;"><a href="http://news.some.url/5fhrqwkz-wmg82uhc-73ogv4gy-1bjh" style="color: #4a769a; text-decoration: none;">Some-<br>thing</a></span></font> </td> </tr> </table> </td> </tr> </table> </td> </tr> </table> </th> </tr> </table> </td> </tr> <tr class="nl2go-responsive-hide"> <td style="font-size: 10px; line-height: 10px;" height="10">­</td> </tr> </table> </td> </tr> </table> </td> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="10">­</td> </tr> <tr class="nl2go-responsive-hide"> <td style="font-size: 10px; line-height: 10px;" height="10">­</td> <td style="font-size: 10px; line-height: 10px;" height="10" width="10">­</td> </tr> </table> </td> </tr> </table> </td> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="10">­</td> </tr> </table> </th> </tr> </table> </td> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="10">­</td> </tr> <tr class="nl2go-responsive-hide"> <td style="font-size: 10px; line-height: 10px;" height="10" width="10">­</td> <td style="font-size: 10px; line-height: 10px;" height="10">­</td> <td style="font-size: 10px; line-height: 10px;" height="10" width="10">­</td> </tr> </table> </td> </tr> <tr> <td class="r25-c"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r4-o"> <!-- --> <tr> <td style="font-size: 0px; line-height: 0px;" class=""> <img src="http://news.some.url/3odbztg3/s_5fhrqwkz/files/Schulthess_Neutral_Header_660.png" width="" border="0" style="display: block; width: 100%;" class=""> </td> </tr> </table> </td> </tr> <tr> <td class="r22-c" align="right"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r23-o">  <tr class="nl2go-responsive-hide"> <td style="font-size: 20px; line-height: 20px;" height="20" width="20">­</td> <td style="font-size: 20px; line-height: 20px;" height="20">­</td> <td style="font-size: 20px; line-height: 20px;" height="20" width="20">­</td> </tr> <tr> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="20">­</td> <td align="right" valign="top" style="text-align: right; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 15px;" class="r26-i nl2go-default-textstyle"> Location, Date </td> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="20">­</td> </tr> </table> </td> </tr> <tr> <td class="r3-c"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r4-o"> <!-- --> <tr class="nl2go-responsive-hide"> <td style="font-size: 20px; line-height: 20px;" height="20" width="20">­</td> <td style="font-size: 20px; line-height: 20px;" height="20">­</td> <td style="font-size: 20px; line-height: 20px;" height="20" width="20">­</td> </tr> <tr> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="20">­</td> <td style="" class="r27-i"> <table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation"> <tr> <th width="100%" valign="top" class="r6-c"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r4-o"> <!-- --> <tr> <td style="" valign="top" class="r7-i"> <table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation"> <tr> <td class="r8-c" align="left"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r9-o">  <tr> <td align="left" valign="top" style="line-height: 26px; text-align: left; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 15px;" class="r10-i nl2go-default-textstyle"> <font color="#4a769a"><span style="font-size: 22px;"><b>Something</b></span></font> </td> </tr> </table> </td> </tr> </table> </td> </tr> </table> </th> </tr> </table> </td> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="20">­</td> </tr> </table> </td> </tr> <tr> <td class="r8-c" align="left"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r9-o">  <tr class="nl2go-responsive-hide"> <td style="font-size: 20px; line-height: 20px;" height="20" width="20">­</td> <td style="font-size: 20px; line-height: 20px;" height="20">­</td> <td style="font-size: 20px; line-height: 20px;" height="20" width="20">­</td> </tr> <tr> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="20">­</td> <td align="left" valign="top" style="line-height: 20px; text-align: left; color: #333333; font-family: Arial, Helvetica, sans-serif; font-size: 15px;" class="r28-i nl2go-default-textstyle"> <span style="color: rgb(51, 51, 51); font-size: 15px;">Hello</span><br><br>There is&nbsp;<span style="font-weight: bold;">Some</span> text&nbsp;<span style="font-weight: bold;">More of it</span>. Still very long.<br>And important <a href="http://news.some.url/5fhrqwkz-wmg82uhc-bmo3i836-12zm" style="color: #4a769a; text-decoration: none;">Bundle </a>available.<br><br>I like tables.&nbsp;<br><br>And rows!<br><br><span style="color: rgb(51, 51, 51); font-size: 15px;">Yeah<br>Greeting</span> </td> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="20">­</td> </tr> </table> </td> </tr> <tr> <td class="r3-c"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r4-o"> <!-- --> <tr class="nl2go-responsive-hide"> <td style="font-size: 20px; line-height: 20px;" height="20" width="20">­</td> <td style="font-size: 20px; line-height: 20px;" height="20">­</td> <td style="font-size: 20px; line-height: 20px;" height="20" width="20">­</td> </tr> <tr> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="20">­</td> <td style="" class="r27-i"> <table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation"> <tr> <th width="100%" valign="top" class="r6-c"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="100%" style="table-layout: fixed; width: 100%;" class="r4-o"> <!-- --> <tr> <td style="" valign="top" class="r7-i"> <table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation"> <tr> <td class="r18-c" align="center"> <table cellspacing="0" cellpadding="0" border="0" role="presentation" width="620" style="table-layout: fixed;" class="r12-o">  <tr> <td style="height: 4px;" class=""> <table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation"> <tr> <td><table width="100%" cellspacing="0" cellpadding="0" border="0" role="presentation" valign="" class="" height="4" style="border-top-style: solid; background-clip: border-box; border-top-color: #4A769A; border-top-width: 4px; font-size: 4px; line-height: 4px;"> <tr> <td height="0" style="font-size: 0px; line-height: 0px;">­</td> </tr> </table></td> </tr> </table> </td> </tr> </table> </td> </tr> </table> </td> </tr> </table> </th> </tr> </table> </td> <td style="font-size: 0px; line-height: 0px;" class="nl2go-responsive-hide" width="20">­</td> </tr> </table> </td> </tr> <tr> <td class="r3-c"></td></tr></table></td></tr></table></td></tr></table></body></html></body></html>
java.lang.StringBuilder.toString(Unknown Source)   at 
com.openhtmltopdf.css.newmatch.CascadedStyle.getFingerprint(CascadedStyle.java:242)       at 
com.openhtmltopdf.css.style.CalculatedStyle.deriveStyle(CalculatedStyle.java:185)  at 
com.openhtmltopdf.layout.SharedContext.getStyle(SharedContext.java:503)    at 
com.openhtmltopdf.layout.SharedContext.getStyle(SharedContext.java:482)    at 
com.openhtmltopdf.layout.BoxBuilder.createChildren(BoxBuilder.java:1093)  at 
com.openhtmltopdf.layout.BoxBuilder.createChildren(BoxBuilder.java:130)    at 
com.openhtmltopdf.render.BlockBox.ensureChildren(BlockBox.java:1184)       at 
com.openhtmltopdf.layout.BoxBuilder.createChildren(BoxBuilder.java:1194)   at 
com.openhtmltopdf.layout.BoxBuilder.createChildren(BoxBuilder.java:130)    at 
com.openhtmltopdf.render.BlockBox.ensureChildren(BlockBox.java:1184)       at 
com.openhtmltopdf.render.BlockBox.isVerticalMarginsAdjoin(BlockBox.java:1523)     at 
com.openhtmltopdf.render.BlockBox.collapseMargins(BlockBox.java:1342)      at 
com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:1022)      at 
com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:973)       at 
com.openhtmltopdf.pdfboxout.PdfBoxRenderer.layout(PdfBoxRenderer.java:335) at 
com.openhtmltopdf.pdfboxout.PdfRendererBuilder.run(PdfRendererBuilder.java:40)     at
myCode

Seems that the incredibly long line is the problem. After reformatting / pretty printing it, everything is fine. Is this fixable / preventable? An OutOfMemoryError is never nice.

Greetings and thank you for looking into it
Joe

@swarl swarl changed the title OutOfMemory in OutOfMemoryError for HTML with long line(s) May 14, 2020
@rototor
Copy link
Contributor

rototor commented May 14, 2020

The HTML you gave in your report is not a valid XHMTL. It even ends with two closing tags

</body></html></body></html>

After fixing the not closed img and br tags and removing the duplicate </body></html> I could generate a report from it without an OOM. But it could not fetch the images at http://news.some.url.

What platform / JDK are you on? This sample without the images works for me on JDK 1.8 / MacOS.

When you are using the OpenJDK, then you should set a breakpoint in your IDE on the OutOfMemoryError. Then use VisualVM and connect to your program. When the OutOfMemoryError triggers in the IDE you should make a memory dump in the VisualVM and analyze it. I'm pretty sure that CascadedStyle.getFingerprint() is not the root cause of the OOM. It's just the last allocation which happens to get no memory. The real cause can be anything, even very big pictures at http://news.some.url. Is your process single threaded? Or are you doing this in some web server? Does it happen always, or only when you have load on the server?

If you can provide a "working" testcase, which triggers the OOM, I'll look into. The best way to provide that would be a small project with all needed files in it. Attach that as ZIP here or even better provide a GitHub URL so that I can check it out.

@swarl
Copy link
Author

swarl commented May 14, 2020

Thanks for the incredible fast answer. Yeah, the doubled tags come from trying to cutting it down do the minimum, sorry. Well, what happens if you don't do this: "After fixing the not closed img and br tags"? And remember: reformatting solves the problem.
I will provide a project as soon as possible, but with img and br tags as is, because the html is not under my control.

Thanks and greetings

@rototor
Copy link
Contributor

rototor commented May 14, 2020

If I don't fix the HTML I just get a XML parser error:

Exception in thread "main" com.openhtmltopdf.util.XRRuntimeException: Can't load the XML resource (using TRaX transformer). org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 738; Elementtyp "br" muss mit dem entsprechenden Endtag "</br>" beendet werden.
	at com.openhtmltopdf.resource.XMLResource$XMLResourceBuilder.createXMLResource(XMLResource.java:274)
...

So you must be using JTidy or something like that to cleanup the HTML. Only correct XHTML can be parsed by openhtmltopdf.

@swarl
Copy link
Author

swarl commented May 15, 2020

I tricked myself - again :-( I use a html file as input for a JUnit test and forget that the app does quite some work before rendering the pdf...
Here is the repo: https://github.com/swarl/html2pdf.git
The test will show this:
image

Removing

                "    .content {\n" +
                "        word-wrap: break-word;\n" +
                "        white-space:pre-wrap;\n" +
                "    }\n" +

fixes the problem. I also removed the img tags as well as closing the br tags.

@syjer
Copy link
Contributor

syjer commented May 15, 2020

hi @swarl , I think the issue you are facing is this one #466 (as removing word-wrap:break-word fix yours). It has been fixed in the master in this commit: c5d5452 .

@rototor
Copy link
Contributor

rototor commented May 15, 2020

@syjer Yes, this seems to be related to word-wrap handling, as that loops endless. I've created the pull request #483 to integrate that as test case.

But it does not work for me on master. Maybe some other corner case are not handled correctly here? I'm not familiar with the word wrapping code, so it would be cool if someone with more knowledge of it could look into it.

@syjer
Copy link
Contributor

syjer commented May 15, 2020

@rototor you are right, there is another corner case that makes it stuck inside the do {} while loop at:

https://github.com/danfickle/openhtmltopdf/blob/open-dev-v1/openhtmltopdf-core/src/main/java/com/openhtmltopdf/layout/InlineBoxing.java#L160 .

I guess the next step will be to trim down the html file

@syjer
Copy link
Contributor

syjer commented May 15, 2020

I think I found the minimal test case:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
    <style>
		* {
			margin: 0;
			padding: 0;
		}

		.content {
			word-wrap: break-word;
			white-space:pre-wrap;
		}
    </style>
</head>
<body>
<table class="content">
    <tr>
        <td>­</td>
    </tr>
</table>
</body>
</html>

It seems that the css rule * {...} is complicit :), removing it works.

To be noted, the <td> is not empty, it contain the unicode character: '\u00AD'.

If you copy paste the html, you should see a character in your editor like that:

Screenshot from 2020-05-15 15-37-10

edit:

the character in question is SOFT_HYPHEN , so there is an issue if the string is composed of only a SOFT_HYPHEN :D.

edit2: seems to be an issue in Breaker.doBreakText(), it has a special case for handling soft hyphens, but apparently, if the text is only composed of soft hyphens, it will cause the infinite loop.

rototor added a commit to rototor/openhtmltopdf that referenced this issue May 18, 2020
rototor added a commit to rototor/openhtmltopdf that referenced this issue May 18, 2020
This is a special case only for this single case: Line-Width is 0 and the
line only contains of a soft-hypen.  In this case we don't need to try to
break on the character level, as this will not work, but will lead to a
endless loop. So we pretend we already did try to break on the character
level. This ends the loop for this line.
@rototor
Copy link
Contributor

rototor commented May 18, 2020

@syjer Thanks for the analysis. I've pushed a special fix for this case into the #483 pull request.

rototor added a commit to rototor/openhtmltopdf that referenced this issue May 18, 2020
…tter how many

they are.

Also extended the test file. Thanks to @syjer to bring these cases up.
rototor added a commit to rototor/openhtmltopdf that referenced this issue May 18, 2020
@swarl
Copy link
Author

swarl commented May 26, 2020

Hi
First: thank you for to super fast fix and release. I just tried 1.0.3 but the OutOfMemory Error is still there:
VisualVM 1 4 3_2020-05-26_13-31-37

Any ideas?
Thank you and greetings

@syjer
Copy link
Contributor

syjer commented May 26, 2020

hi @swarl , could you provide the exact input which cause your issue?

There is likely another corner case which has not been covered.

Thank you.

@swarl
Copy link
Author

swarl commented May 26, 2020

Hi @syjer
Just updated the example from above to version 1.0.3: https://github.com/swarl/html2pdf.git
Does this help?
Thanks

@syjer
Copy link
Contributor

syjer commented May 26, 2020

@swarl thank you, I'll have a look asap :)

@syjer
Copy link
Contributor

syjer commented May 26, 2020

this is quite interesting, I can confirm that I'm able to reproduce the error.

Additionally, removing the loading of the font fix it. So it must be some interesting combination :)

@swarl
Copy link
Author

swarl commented May 26, 2020

@syjer just wanted to say: your awesome! You see problems as an opportunity to learn instead of pain in the ass which is highly inspiring :-)

@syjer
Copy link
Contributor

syjer commented May 26, 2020

haha thank you @swarl ,

I've narrowed down the problematic file input as:

EDIT: updated the narrowed file

<html>
<head>
<style>
    .content {
        font-family: 'Liberation Sans', sans-serif;
        word-wrap: break-word;
        width:10px;
    }
</style>
</head>
<body>
<div class="content">­</div>
</body>
</html>

To be noted:

  • there is a SOFT_HYPHEN inside the div span element
  • the font loaded as: builder.useFont(() -> SimpleUsage.class.getClassLoader().getResourceAsStream("org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf"), "Liberation Sans"); is required to cause the issue.
  • the issues is a combination of width:10px , the SOFT_HYPHEN and the fact that the width of it with the Liberation Sans font is not 0.

danfickle added a commit that referenced this issue May 26, 2020
add failing test that highlight the infinite loop issue in the inline box layout algorithm #482
danfickle added a commit that referenced this issue May 26, 2020
…phen overflowing line.

The core problem seems to be the under-reporting of width when we have a soft hyphen that is found to be unbreakable.
danfickle added a commit that referenced this issue Jun 1, 2020
Plus minor behaviour change for word break method to avoid setting ends-on -soft-hyphen flag for soft hyphen at end of box.
danfickle added a commit that referenced this issue Jun 7, 2020
This is to make sure infinite loop fixes do not break this functionality.
danfickle added a commit that referenced this issue Jun 7, 2020
…w line.

With test to prove that we don't trigger safety valve with lots of unbreakable words. Safety valve should only be triggered if there is still a bug in our code.
danfickle added a commit that referenced this issue Jun 9, 2020
This should ensure no infinite loop bugs creep in over time.
danfickle added a commit that referenced this issue Jun 13, 2020
#482 #483 #491 Make word breaker testable and start writing tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants