Skip to content
This repository has been archived by the owner on Feb 1, 2022. It is now read-only.

[Fixes #249] Use parse5 for standard-compliant parsing #251

Closed
wants to merge 1 commit into from

Conversation

kkirsche
Copy link
Contributor

[Fixes #249] Use parse5 for standard-compliant parsing

[Fixes twbs#249] Use parse5 for standard-compliant parsing

New shrinkwrap file for NPM

Fix variable names
@coveralls
Copy link

Coverage Status

Coverage remained the same at 99.55% when pulling 0a42498 on kkirsche:use_parse5 into f454029 on twbs:master.

@@ -59,7 +59,7 @@ exports.bootlint = {
[],
'should not complain when the normal simple HTML5 doctype is used.');
test.deepEqual(lintHtml(utf8Fixture('doctype/html5-legacy.html')),
[],
['Document declares a non-HTML5 DOCTYPE'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need to change this. Either E001's implementation needs tweaking or presumably parse5 has a minor bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then parae5 has a bug.

@cvrebert
Copy link
Collaborator

cvrebert commented Mar 1, 2015

So, the reason I hesitated to make a PR myself (was hoping to have some discussion) is that I have some concerns about switching to Whacko. Mind you, I'm not sure where currently I stand, but I think the decision merits discussion either way.

Pros:

  • Completely spec-compliant HTML parsing. Cheerio's htmlparser2 has a few rare edge cases where it's not compliant/correct; see its issue tracker.

Cons:

  • Whacko seems to be maintained by a single guy
  • Cheerio is popular and has some level of an ecosystem around it. By contrast, there only seems to be one popular third-party package currently depending on Whacko (namely, vulcanize), so by comparison it has virtually no ecosystem.
  • It's unclear how up-to-date Whacko is with respect to upstream Cheerio ("This branch is 36 commits ahead, 294 commits behind cheeriojs:master")
  • htmlparser2's edge cases haven't really impacted us so far.

Mitigating factors:

  • The only maintenance Whacko really needs is merging in upstream Cheerio changes and keeping up with any parse5 updates. The former are fairly infrequent and the latter shouldn't be an issue as it's by the same author.
  • It's not like we're currently hitting any gaps that necessitate frequent Cheerio updates anyway.
  • Whacko is used in the TestCafé commercial product, which provides an incentive to keep it maintained.
  • Whacko is very highly compatible with mainline Cheerio, so if it became necessary to switch back to Cheerio later, it shouldn't be that painful to do so.

CC: @hnrch02, as it's ultimately his decision to make as maintainer.

@kkirsche
Copy link
Contributor Author

kkirsche commented Mar 1, 2015

Thanks. Honestly for me this is an opportunity to try to learn more than anything else so I really appreciate your explanation about the pros and cons.

@kkirsche
Copy link
Contributor Author

kkirsche commented Mar 1, 2015

X-Ref: inikulin/parse5#45

@cvrebert
Copy link
Collaborator

cvrebert commented Mar 1, 2015

Hah, I filed inikulin/whacko#13

@kkirsche
Copy link
Contributor Author

kkirsche commented Mar 1, 2015

haha oops. Sorry about that! Didn't mean to create any overlay, just wanted to make sure it was reported 😃 Want me to close mine on parse5?

@hnrch02
Copy link
Collaborator

hnrch02 commented Mar 1, 2015

I'd rather stay with Cheerio.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider using parse5 for fully standard-compliant HTML parsing
4 participants