-
Notifications
You must be signed in to change notification settings - Fork 660
Matching on local names instead of (ns, name) pair causes assertion from close_current_cell when dealing with math:td #278
Comments
This comes about as the matching is done on the |
Yeah, I noticed this when fixing up the insertion machinery for #274. It does not fix it, however - the changes would be too invasive, since the signature of (node_)tag_i[ns] would have to change and those're used all over the place. It may be time to bite the bullet and do it, but it probably should be in a different branch...the template tag stuff is already really complicated, it'll become nearly impossible to review with this change too. I'd probably also want to do another round of profiling & optimization before possibly changing the signatures of these. It's always annoyed me that they work by looping over a varargs list, which my intuition tells me ought to be really slow. The round of profiling I did while at Google indicated that the time was dwarfed by UTF-8 decoding and entity references, but now that both of those have been sped up significantly, I wonder how much the tag matching functions contribute to runtime. |
I wonder if GumboElement.tag should be set based on the (ns, name) pair instead, which avoids that problem? In my mind, the field should categorise elements into known elements ( |
#279 turns out to be the same, except being during resetting the insertion mode instead of while popping elements from the stack. |
If anyone is interested, please see the must recent pull request (number 5?) which I think is meant to address this very issue. |
More bugs from afl!
The following input:
Results in:
When processing the
</tr>
tag in the parser, the tree looks as follows:The
</tr>
starts off being processed "in foreign content" before falling back to the current HTML insertion mode ("in cell"). This then calls "close the cell" where Gumbo fails.This appears to be the common HTML parser bug of matching on local name alone (it should be true that there is never both an
html:td
and anhtml:th
in table scope at the asme time). I haven't looked at whether #274 fixes this, but it is certainly the case that html5lib-tests has tests for cases similar to this nowadays (though probably doesn't in the 0.95 version that Gumbo currently uses), so #3 is somewhat relevant here.The text was updated successfully, but these errors were encountered: