Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan more than the head of the tag stack for tags closing other tags #39

Closed
fb55 opened this issue Apr 7, 2013 · 5 comments
Closed
Labels

Comments

@fb55
Copy link
Owner

fb55 commented Apr 7, 2013

As described here, <p><a>a<p>b is currently handled as <p><a>a<p>b</p></a></p>.

@nik0kin
Copy link

nik0kin commented Mar 27, 2014

this bug makes me sad

@fb55
Copy link
Owner Author

fb55 commented Mar 28, 2014

PRs welcome :)

@sparecycles
Copy link

To fix this specific issue, just make p close anchor tags, change this line
https://github.com/fb55/htmlparser2/blob/100d86e/lib/Parser.js#L42
to

   p: { p: true, a: true }

But this may not be the correct thing to do, since, in both chrome and firefox: data:text/html,<html><body><p><a href="%23">a<p>b makes two links.

But they duplicate the anchor like so:

<html>
  <head></head>
  <body>
    <p><a href="#">a</a></p>
    <p><a href="#">b</a></p>
  </body>
</html>

@fb55
Copy link
Owner Author

fb55 commented May 3, 2014

That's called restoring formatting elements in the HTML spec. Another thing not supported by this parser, as well as attribute propagation, foster parenting, head tag insertion, implicit opening tags etc..

@fb55
Copy link
Owner Author

fb55 commented Jan 20, 2018

Please refer to inikulin/parse5 if you need a spec-compliant parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants