Skip to content

Nested a tags should be an error? #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
benkasminbullock opened this issue Nov 18, 2015 · 5 comments
Closed

Nested a tags should be an error? #307

benkasminbullock opened this issue Nov 18, 2015 · 5 comments
Assignees
Labels
Milestone

Comments

@benkasminbullock
Copy link
Contributor

At the moment a document of this type:

<!DOCTYPE html>
<html>
<head>
<title>baba</title>
</head>
<body>
<a href='http://example.org'>mini<a href='http://example.org'>monkey</a></a>
</body>
</html>

does not produce errors using tidy 5.0.0. However, most online validators give an error with the double a tag above. I cannot decipher the HTML specification as to which is correct.

@geoffmcl
Copy link
Contributor

@benkasminbullock thanks for reporting... it certainly looks like a bug has crept in ;=((

Running very close to your exact sample, my local only input5\in_307.html, through tidy 5.1.22+ yields no warning or error! And the output has the same nested anchors...

Running it through tidy-cvs (circa 2009) will yield -

line 8 column 1 - Warning: missing </a> before <a>
line 8 column 76 - Warning: discarding unexpected </a>

And the output will be corrected to -

<a href='http://example1.org'>mini</a> <a href=
'http://example2.org'>monkey</a>

Note I added one space between mini and monkey, which I want preserved...

I too have trouble deciphering W3C documents, fully understanding, so will let others comment on that. But I too note the W3C validator also flags an Error! So for now marking it as a bug!

In fact I now note we have an existing regression test for this!

Test 427827. However this does not flag a problem because someone, probably me!, has put the nested anchor output in testbase. This bug was reported way back in 2001, and fixed in July of that year.

There are some clues in the text of the above bug report, and I hope to use them to again fix 2015 tidy, unless someone else beat me to it with a PR or patch, or cites W3C documentation that specifically allows nested anchors.

@geoffmcl geoffmcl added the Bug label Nov 18, 2015
@geoffmcl geoffmcl added this to the 5.1 milestone Nov 18, 2015
@geoffmcl
Copy link
Contributor

@benkasminbullock have found a fix for this...

Essentially the 23 Aug 00 427827 bugs/28 fix copied from ParseInline(), which used to handle anchors, and still does if the document has a legacy html4-- doctype, to ParseBlock(), which now handles anchors in html5++ mode, but will hold off pushing a few more days in case someone can cite W3C documentation that specifically allows nested anchors.

But all my W3C reading so far on anchors indicates they represent a link in the document, and thus nested anchors would make no sense. Browsers tend to display it correctly, but the W3C validator spits out an error.

This fix also includes changing the testbase files, out_427827.html and msg_427827.txt to match the current output. And actually found a second case 431874, bugs/53, 2001-06-10, marked as a duplicate, with similar nested anchors, files also now fixed.

The current full diff can be found at http://geoffair.org/tmp/issue307.diff if you want to try it meantime...

@geoffmcl geoffmcl self-assigned this Nov 21, 2015
@benkasminbullock
Copy link
Contributor Author

but will hold off pushing a few more days in case someone can cite W3C documentation that specifically allows nested anchors.

Well it says right at the bottom of the link you've given here:

http://sourceforge.net/p/tidy/bugs/53/

Dave Raggett responded on 03 Jan 2001 : I need to ensure that nested anchors are detected as these aren't legal in HTML or XHTML. This involves a special check which I suspect is missing. I will add this to the list of things to look at for the next release.

Since Dave Raggett actually wrote one of the HTML specifications, I don't think this is really worth worrying about.

@geoffmcl
Copy link
Contributor

@benkasminbullock hmmm, and assuming the spec has not changed in nearly 15 years since that post ;=))

The reason this bug re-arrived was due to a HTML5 change in the content of an anchor.

Previously it only allowed inline or phrasing content, but can now have block or flow content. See #167 and #169 and the commits for that...

But I have now found in a relatively recent document a.html a clear statement The interactive element a must not appear as a descendant of the a element. As repeated that makes sense.

So have pushed this change to master, and bump the version to 5.1.25.

Hope you get a chance to pull, test, and close this if fixed... thanks...

@geoffmcl
Copy link
Contributor

@benkasminbullock no further comments in nearly 3 weeks, so assume this can be closed?

Please feel free to re-open, or open a new issue... thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants