Missing </head> doesn't generate error #327

benkasminbullock · 2015-11-29T00:01:26Z

The following HTML is missing </head>:

<!DOCTYPE html>
<html>
<head>
<title>Monkey</title>
<body>
<p> Born from an egg on a mountain top.</p>
</body>
</html>

Run with tidy dd4eb46, built from source, the output is this:

$ ./tidy --quiet yes < ~/head-body.html
<!DOCTYPE html>
<html>
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for FreeBSD version 5.1.27">
<title>Monkey</title>
</head>
<body>
<p>Born from an egg on a mountain top.</p>
</body>
</html>

It's notable that there is no error message printed here although the </head> is inserted correctly. I would have expected an error message here.

The text was updated successfully, but these errors were encountered:

balthisar · 2015-11-29T05:20:53Z

@benkasminbullock, thanks for the report. Would you check to see if 5.1.28 provides the expected behavior?

benkasminbullock · 2015-11-29T09:33:14Z

The branch seems to have been deleted.

balthisar · 2015-11-29T09:38:38Z

Sorry, I guess editing my comment wasn't fast enough. Actually look at master, which should be at 5.1.28 now.

geoffmcl · 2015-11-29T12:14:40Z

@balthisar fast work ;=))

@benkasminbullock I pulled 5.1.28 and now see a warning... guess we can close this... thanks...

benkasminbullock · 2015-11-29T23:36:17Z

Thanks @geoffmcl and @balthisar. Since you've gone to the trouble of making this, it would be a nice if there was a test confirming both the insertion of and the production of the error message, so the work doesn't get undone later. I would provide one, but I am currently struggling to understand the testing scripts.

geoffmcl · 2015-11-30T18:29:14Z

@benkasminbullock well this was the simple addition of an error message, so not sure of the need for a test case, but maybe...

struggling to understand the testing scripts

This is the main problem. The current testing procedure and scripts are not well documented, and mostly do not do what people think! In their present form they are certainly not any form of unit testing, or any other formal testing method...

There are some 228 tests in test/input. Some are repeats which need to be pruned. Some were a segfault. Some can only be visually compared to decide pass or fail. Some are to compare the output. At present the tests only compare the exit code with an expected exit code. In most case that is not sufficient.

As it turns out, this case could be detected by the current procedure in that previously the exit code was 0, now it is a 1. And that is all the scripts presently compare.

Recently I did add a test/testbase folder which contains the expected output, thus could do a simple diff and check that. But even that is not complete in that right now for instance there are 2 or 3 differences, which only a developer would know to except. That is they are not a bug. See #266, and others. So at present these tests are not recommended for the casual user.

FWIIW I have been collecting a new set of tests, about 230 plus, since around Jan 2015, and for sure I added this case in that. In two forms - one without </head> to test the warning, and one with to make sure the warning went away. But again have not shared that yet in that it is very much WIP. But could...

It would certainly be appreciated if someone were to take this up. Define a group of tests, clearly indicating what is tested. Prune, combine, fix, sort the current input tests. Like dividing them say into categories, or something... maybe designing new tests... fix up and document the scripts... etc, etc...

One suggestion by @vielmetti has been to use say travis-ci or the like - see #269 - We have rejected this automation in the past, but if the tests could be defined as in the above, fixed to specifically test something, then maybe it is time to reconsider this method...

This testing is certainly an area of Tidy that could do with some TLC ;=))

geoffmcl · 2015-11-30T19:50:17Z

Oops, found a simple case where this warning should not be issued when the following processed with --show-body-only yes!

Input:

<p>This a minimal example for<strong>
TidyTidyTidyTidyTidyTidyTidyTidyTidy</strong> is another nice open source tool :)</p>

Output:

line 1 column 1 - Warning: missing </head>
Info: Document content looks like HTML5
1 warning, 0 errors were found!

<p>This a minimal example
for<strong>TidyTidyTidyTidyTidyTidyTidyTidyTidy</strong> is another
nice open source tool :)</p>

And this sample code exposes another feature request. Most browser will show a space between for and Tidy... while Tidy loses that newline, which I assume the browsers take as a space.

This is an old bug https://sourceforge.net/p/tidy/bugs/949/ posted 2014-03-01 by @Yvand, now moved here...

yapper-git · 2015-11-30T21:05:54Z

@geoffmcl : Thanks for the ticket "garbage collector" on sourceforge. It didn't know you migrate to GitHub.

So I created a new issue on GitHub for the corresponding issue, see #329.

benkasminbullock · 2015-12-01T08:32:28Z

I'm not sure what you're thinking about in terms of tests but just to get the ball rolling I've gathered a few github issues into a simple set of tests here:

https://github.com/benkasminbullock/tidy-html5/tree/bkbtest

You need to run the tests like this:

$ prove bkbtest/test.t

geoffmcl · 2015-12-01T16:35:22Z

@benkasminbullock wow, thank you for the quick test... sort of a proof of concept...

I have pulled and tried your branch in Window, and was pleasantly surprised that the perl install includes a prove.bat which seems to work...

There were some windows TIDY exe issues. Like the current test scripts maybe this can be externalized as an environment variable, TIDY, or something... there are times I need to test with other than the currently built tidy...

I am particularly interested in how you will handle an output compare, where just testing the exit code is not sufficient? And can we not use, leverage, the files in input and testbase? Must they all be done again? What is the process needed to add a new test? Assume this can be moved to the current test folder, and not be a new one?

I do have some other questions, concerns, but today is my very short computer day, so will get into that tomorrow, so do not suggest you put too much more time into this until I can reply...

Again, really thank you for helping in this... Maybe this should all be moved to a new important issue?

geoffmcl · 2015-12-02T02:14:30Z

@benkasminbullock have moved the test procedures to #330

geoffmcl · 2015-12-02T13:41:05Z

@benkasminbullock, @balthisar have now done the full regression tests, and this fix changes MANY message outputs! Of course it does not effect the html output, just the message output.

There are many cases where the test sample does not have a doctype, head, or title. In the past tidy would only warn no doctype, but it has never warned that it added an implied head, added a title. But now it warns that it is missing a </head>!

Typical change is -

diff -u testbase\msg_1050673.txt temp-5\msg_1050673.txt
--- testbase\msg_1050673.txt    Mon Nov 23 16:09:01 2015
+++ temp-5\msg_1050673.txt  Wed Dec 02 12:50:06 2015
@@ -1,4 +1,5 @@
 line 1 column 1 - Warning: missing <!DOCTYPE> declaration
+line 1 column 1 - Warning: missing </head>
 line 1 column 1 - Warning: inserting implicit <body>
 line 1 column 1 - Warning: missing </b> before <ul>
 line 3 column 1 - Warning: missing <li>
@@ -12,7 +13,7 @@
 line 2 column 1 - Warning: trimming empty <ul>
 line 4 column 1 - Warning: trimming empty <b>
 Info: Document content looks like HTML5
-13 warnings, 0 errors were found!
+14 warnings, 0 errors were found!

 About HTML Tidy: https://github.com/htacg/tidy-html5
 Bug reports and comments: https://github.com/htacg/tidy-html5/issues

This effects some 35 tests - 1050673 1056023 1062511 1063256 1072528 1210752 1266647 1398397 1408034 1415137 1603538-1 1603538-2 1632470 1707836 1715153 427836 427840 431874 431889 433666 435903 438956 441508 487204 511679 566542 620531 640473 647255 658230 676156 678268 765852 795643-1 795643-2

Further, in the following sample tidy does not report a missing </body>, nor a missing </html>, nor the missing </p>!

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Issue #327 - missing head warning</title>
<body>
<p>Born from an egg on a mountain top.

BTW that above sample passes the W3C validator!

So why should tidy now start reporting a missing </head>?

So this is not only a problem when the option --show-body-only yes is added. It even shows this missing </head> if you add the option --omit-optional-tags yes when no implied <head> would be output!!!

I am all for reverting commit 61cfcb1!

And close this issue with a statement like - "Yes, tidy does not warn about a missing </head>, nor many other non-essential closing tags, although it will add them to the output unless the option --omit-optional-tags yes is used."

Thoughts?

benkasminbullock · 2015-12-02T23:17:00Z

Further, in the following sample tidy does not report a missing </body>, nor a missing </html>, nor the missing </p>!

Shouldn't it do that though?

BTW that above sample passes the W3C validator!

I cannot reach the W3C validator at the moment to check this, but I don't think it should do that. A typical error in generated HTML files from templates is producing an error during the file generation and leaving off the </body> tag and possibly other parts of the content. Not warning about this kind of thing makes it less valuable as a check of the HTML.

So why should tidy now start reporting a missing </head>?

tidy usually warns on insertion of other missing tags like </p>. It seems this behaviour is not very consistent.

geoffmcl · 2015-12-03T13:26:08Z

@benkasminbullock well I hope you get access to the W3C validator soon. Today it is one of the quick tests I always try to run, just to check what the validator says. If you "don't think it should do that" then file a bug on the validator, after you very carefully check the W3C documentation.

While reading W3C specs often makes my head spin, this page on the-p-element makes it clear the end tag may be omitted in certain circumstances. And this is true for a number of elements...

In tags.c there are some 39 elements with the CM_OPT bit set, which indicates the end tag is optional. These will not be printed if you set --omit-optional-tags yes. And assume the person who built that table knew what they were talking about.

Please check the validator... please read the documentation... please present a simple sample... before you suggest "this behaviour is not very consistent"! As it stands this is just speading FUD!

And please remember it is not as simple as "what you think"!

I still think reverting commit 61cfcb1 is the correct thing! Right now this is the inconsistent behaviour!

benkasminbullock · 2015-12-03T23:25:28Z

Thanks for the insight @geoffmcl. The validator started working right after I wrote the above. I can confirm that it does not report errors, and neither does Tidy usually report about closing </p> tags, so you are correct in saying that the warning on addition of </head> elements is inconsistent.

geoffmcl · 2015-12-04T18:03:24Z

@balthisar it is your commit... can we revert it?

balthisar · 2015-12-04T23:00:06Z

We can, but no computer for me until Sunday night. You can find the commit?

Sent from a mobile device.

On Dec 5, 2015, at 2:03 AM, Geoff McLane notifications@github.com wrote:

@balthisar it is your commit... can we revert it?

—
Reply to this email directly or view it on GitHub.

This reverts commit 61cfcb1. This added an inconsistent warning about a missing optional close tag. In general tidy does not report such optional close tags. See issue #327 for some discussion on this.

geoffmcl · 2015-12-05T12:53:50Z

@benkasminbullock @balthisar commit reverted, and version bumped to 5.1.31...

Changed this from a bug to technical support on the tidy way ;=))

Assume this can be closed...

balthisar added a commit that referenced this issue Nov 29, 2015

Fix for head closing tag not reported (#327)

61cfcb1

geoffmcl added the Bug label Nov 29, 2015

geoffmcl added this to the 5.1 milestone Nov 29, 2015

geoffmcl closed this as completed Nov 29, 2015

geoffmcl reopened this Nov 30, 2015

geoffmcl mentioned this issue Dec 2, 2015

Tidy needs some test procedures! #330

Closed

geoffmcl added a commit that referenced this issue Dec 5, 2015

Reverted #327 bumped to 5.1.31

5f8aac9

geoffmcl added Technical Discussion and removed Bug labels Dec 5, 2015

geoffmcl closed this as completed Dec 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing </head> doesn't generate error #327

Missing </head> doesn't generate error #327

benkasminbullock commented Nov 29, 2015

balthisar commented Nov 29, 2015

benkasminbullock commented Nov 29, 2015

balthisar commented Nov 29, 2015

geoffmcl commented Nov 29, 2015

benkasminbullock commented Nov 29, 2015

geoffmcl commented Nov 30, 2015

geoffmcl commented Nov 30, 2015

yapper-git commented Nov 30, 2015

benkasminbullock commented Dec 1, 2015

geoffmcl commented Dec 1, 2015

geoffmcl commented Dec 2, 2015

geoffmcl commented Dec 2, 2015

benkasminbullock commented Dec 2, 2015

geoffmcl commented Dec 3, 2015

benkasminbullock commented Dec 3, 2015

geoffmcl commented Dec 4, 2015

balthisar commented Dec 4, 2015

geoffmcl commented Dec 5, 2015

Missing </head> doesn't generate error #327

Missing </head> doesn't generate error #327

Comments

benkasminbullock commented Nov 29, 2015

balthisar commented Nov 29, 2015

benkasminbullock commented Nov 29, 2015

balthisar commented Nov 29, 2015

geoffmcl commented Nov 29, 2015

benkasminbullock commented Nov 29, 2015

geoffmcl commented Nov 30, 2015

geoffmcl commented Nov 30, 2015

yapper-git commented Nov 30, 2015

benkasminbullock commented Dec 1, 2015

geoffmcl commented Dec 1, 2015

geoffmcl commented Dec 2, 2015

geoffmcl commented Dec 2, 2015

benkasminbullock commented Dec 2, 2015

geoffmcl commented Dec 3, 2015

benkasminbullock commented Dec 3, 2015

geoffmcl commented Dec 4, 2015

balthisar commented Dec 4, 2015

geoffmcl commented Dec 5, 2015