-
Notifications
You must be signed in to change notification settings - Fork 660
Conversation
I get one new html5lib test failure with this branch:
This is as reported by Not sure which revision of html5lib-tests I should be using for this branch though... |
Just tried using html5lib/html5lib-tests@80fc271 (current Also noticed a new warning, FWIW:
|
…for lexing, to ease debugging non-printable characters.
…or. (The whole error handling really needs to be redone, it's not very helpful to users.)
This merges in the tag_set fixes.
…make sure their input mechanisms can accept this without relying on strlen.
I've updated this pull request to fix the failing test case and compiler warning. I'll let it sit for a bit, but plan to merge fairly soon because I'd like to get it in to master before merging @vmg and @kevinhendricks changes in to a v0.10 branch. Hopefully somebody can get around to reviewing it shortly. |
Looks good to me! |
As far as I can tell from the spec [1], tokenizing a CDATA section just emits the text contained within the section and never produces a "CDATA node". So it doesn't seem like |
@aroben
|
I see, so |
@aroben - yes and nicely only if you care to use that extra information. So it can not hurt so to speak.
|
Discussion was in bug #266. GUMBO_NODE_CDATA already existed in gumbo.h, but was never actually being set. Given gumbo's primary audience of tool writers and the existence of GUMBO_NODE_WHITESPACE already, we felt the extra node type would give the most useful information to client code. |
This changes the tokenizer so that CDATA sections are properly plumbed through to the parser, so that the parser actually produces GUMBO_NODE_CDATA nodes. Fixes #266. @craigbarnes, would you like to review?