Skip to content

Infinite loop parsing in tidy #380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gaa-cifasis opened this issue Mar 1, 2016 · 7 comments
Closed

Infinite loop parsing in tidy #380

gaa-cifasis opened this issue Mar 1, 2016 · 7 comments

Comments

@gaa-cifasis
Copy link

If you try to parse a small fragment of HTML code, tidy will loop. It was tested with release 5.1.25. For instance:

echo "<a/</li><li><a href=x" | tidy

A similar example was discovered in revision 03a643f.

@geoffmcl
Copy link
Contributor

geoffmcl commented Mar 2, 2016

@gaa-cifasis thanks for your continued support... tidy needs this... keep em coming ;=)) tidy loves your support!

Unfortunately at this stage we can not go back to fix the 5.1.25 release! But I can see the problem with release 5.1.25... It does get locked in an infinite loop! Very BAD!!

Hopefully this will be addressed in future releases, where there should be a branch for each release... and thus we can retest, and push back later important fixes to such a release, bumping the release number... read README/VERSION.md for the general idea. But 5.1.25 was an odd man out! No branch was created!

But testing your fragment with the latest master 5.1.44 I can not repeat the problem with this latest... Thus can do nothing about it...

Test case input: input5\in_380.html

<a/</li><li><a href=x

Config: $ tidy --show-info no input5\in_380.html

Output:

<!DOCTYPE html>
<html>
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Windows version 5.1.44">
<title></title>
</head>
<body>
</body>
</html>

Message output:

line 1 column 1 - Warning: <a> attribute "/" lacks value
line 1 column 1 - Warning: <a> missing '>' for end of tag
line 1 column 4 - Warning: discarding unexpected </li>
line 1 column 1 - Warning: missing </a> before <li>
line 1 column 9 - Warning: inserting implicit <ul>
line 1 column 22 - Warning: <a> end of file while parsing attributes
line 1 column 9 - Warning: missing </li> before <a>
line 1 column 9 - Warning: missing </ul>
line 1 column 1 - Warning: missing </a>
line 1 column 9 - Warning: trimming empty <li>
line 1 column 9 - Warning: trimming empty <ul>
line 1 column 1 - Warning: trimming empty <a>
Tidy found 12 warnings and 0 errors!

Info: Document content looks like HTML5

Even adding --drop-empty-elements no did not change this except it adds the empty elements...

Please re-test with the latest, always stable master branch, find a problem, and we will certainly look into it further...

Again, thanks for your support... tidy need this type of testing... at this stage I can only mark this as Technical Support, with an indefinite future...

Find a repeatable problem case, with the latest, and this will quickly change... thanks...

@gaa-cifasis
Copy link
Author

Hi,

You can find a test case of an infinite loop to reproduce in the last revision here.

@geoffmcl
Copy link
Contributor

geoffmcl commented Mar 3, 2016

@gaa-cifasis wow, thanks I think ;=))

It is certainly a weird document, with SUB and NUL chars, but yes it seems to repeat forever...

Will look at it soonest... unless someone beats me to it with a patch or PR... thanks...

@geoffmcl geoffmcl modified the milestones: 5.2, Indefinite future Mar 3, 2016
geoffmcl added a commit that referenced this issue Mar 4, 2016
geoffmcl added a commit that referenced this issue Mar 4, 2016
Added more debug code to try to track this bug!
@geoffmcl
Copy link
Contributor

geoffmcl commented Mar 4, 2016

@gaa-cifasis, first I have added a lot a MSVC debug code to master, and bumped the version to 5.1.45.

But unfortunately this debug is only available with a Windows MSVC compiler, and does not exist in the Release. I must do something about that one day... make it available with other compilers... it is very helpful tracking down where, and when...

I have narrowed it down where the problem starts... As suspected it is nothing to do with the odd character values in your file. As previously mentioned tidy does just drop all these...

It is the sequence of events, and have found a relatively simple html sample, my in_380-3.html, that can trigger it, and still trying to reduce that sample even more...

Have found a patch, but it chops some code added in 2004, so I am not sure this is the full answer... or what all the consequences of that are...

It does not seem present in the last CVS release in 2009, nor in tidy-4.9.13, circa 08/02/2015, so something changed, added, modified, since then, and am trying to work backward to see what that is/was... a slow painful process...

This is just an update... moving forward slowly... thanks for this interesting bug ;=))

geoffmcl added a commit that referenced this issue Mar 5, 2016
Added more debug code to try to track this bug!
geoffmcl added a commit that referenced this issue Mar 5, 2016
@geoffmcl
Copy link
Contributor

geoffmcl commented Mar 5, 2016

@gaa-cifasis have found this occurred from 4.9.17 to 4.9.18. Here commit 86f626c reverted the anchor tag to just CM_INLINE to fix issue #167. This reverted a fix for issue #169, while trying to keep open the dual parsing of html4-- and html5++. So this fix must stay.

So now I have identified exactly when a certain set of events can cause this infinite loop, but that does not provide a simple solution.

As mentioned I have found a patch, created an issue-380 branch, and pushed it.

Will now explore that further... moving forward...

It would be much appreciated if you get a chance to checkout and test that patch, issue-380 branch, version 5.1.45-Exp1... thanks...

@geoffmcl
Copy link
Contributor

geoffmcl commented Mar 6, 2016

@gaa-cifasis after adding in the patch for #379 have bumped the version to 5.1.45-Exp2 in the issue-380 branch...

Everyone's help in fully testing this branch would be most appreciated... thanks...

@geoffmcl
Copy link
Contributor

@gaa-cifasis this is now in master, where I have continued testing, and find no problems, so closing this...

Feel free to re-open, or file a new issue... thanks...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants