-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OutOfMemoryError for HTML with long line(s) #482
Comments
The HTML you gave in your report is not a valid XHMTL. It even ends with two closing tags
After fixing the not closed img and br tags and removing the duplicate What platform / JDK are you on? This sample without the images works for me on JDK 1.8 / MacOS. When you are using the OpenJDK, then you should set a breakpoint in your IDE on the OutOfMemoryError. Then use VisualVM and connect to your program. When the OutOfMemoryError triggers in the IDE you should make a memory dump in the VisualVM and analyze it. I'm pretty sure that If you can provide a "working" testcase, which triggers the OOM, I'll look into. The best way to provide that would be a small project with all needed files in it. Attach that as ZIP here or even better provide a GitHub URL so that I can check it out. |
Thanks for the incredible fast answer. Yeah, the doubled tags come from trying to cutting it down do the minimum, sorry. Well, what happens if you don't do this: "After fixing the not closed img and br tags"? And remember: reformatting solves the problem. Thanks and greetings |
If I don't fix the HTML I just get a XML parser error:
So you must be using JTidy or something like that to cleanup the HTML. Only correct XHTML can be parsed by openhtmltopdf. |
I tricked myself - again :-( I use a html file as input for a JUnit test and forget that the app does quite some work before rendering the pdf... Removing
fixes the problem. I also removed the img tags as well as closing the br tags. |
@syjer Yes, this seems to be related to word-wrap handling, as that loops endless. I've created the pull request #483 to integrate that as test case. But it does not work for me on master. Maybe some other corner case are not handled correctly here? I'm not familiar with the word wrapping code, so it would be cool if someone with more knowledge of it could look into it. |
@rototor you are right, there is another corner case that makes it stuck inside the do {} while loop at: I guess the next step will be to trim down the html file |
I think I found the minimal test case: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<style>
* {
margin: 0;
padding: 0;
}
.content {
word-wrap: break-word;
white-space:pre-wrap;
}
</style>
</head>
<body>
<table class="content">
<tr>
<td></td>
</tr>
</table>
</body>
</html> It seems that the css rule To be noted, the If you copy paste the html, you should see a character in your editor like that: edit: the character in question is edit2: seems to be an issue in |
This is a special case only for this single case: Line-Width is 0 and the line only contains of a soft-hypen. In this case we don't need to try to break on the character level, as this will not work, but will lead to a endless loop. So we pretend we already did try to break on the character level. This ends the loop for this line.
…tter how many they are. Also extended the test file. Thanks to @syjer to bring these cases up.
hi @swarl , could you provide the exact input which cause your issue? There is likely another corner case which has not been covered. Thank you. |
Hi @syjer |
@swarl thank you, I'll have a look asap :) |
this is quite interesting, I can confirm that I'm able to reproduce the error. Additionally, removing the loading of the font fix it. So it must be some interesting combination :) |
@syjer just wanted to say: your awesome! You see problems as an opportunity to learn instead of pain in the ass which is highly inspiring :-) |
haha thank you @swarl , I've narrowed down the problematic file input as: EDIT: updated the narrowed file <html>
<head>
<style>
.content {
font-family: 'Liberation Sans', sans-serif;
word-wrap: break-word;
width:10px;
}
</style>
</head>
<body>
<div class="content"></div>
</body>
</html> To be noted:
|
add failing test that highlight the infinite loop issue in the inline box layout algorithm #482
…phen overflowing line. The core problem seems to be the under-reporting of width when we have a soft hyphen that is found to be unbreakable.
Plus minor behaviour change for word break method to avoid setting ends-on -soft-hyphen flag for soft hyphen at end of box.
This is to make sure infinite loop fixes do not break this functionality.
This should ensure no infinite loop bugs creep in over time.
Good evening
Version 1.0.2
The following HTML sent to my application produces a hell of an OutOfMemoryError:
Seems that the incredibly long line is the problem. After reformatting / pretty printing it, everything is fine. Is this fixable / preventable? An OutOfMemoryError is never nice.
Greetings and thank you for looking into it
Joe
The text was updated successfully, but these errors were encountered: