Skip to content

Commit

Permalink
fix: escape foreign style tag content when serializing HTML5
Browse files Browse the repository at this point in the history
Normally, a `style` tag is considered to be a raw text element,
meaning `<` is parsed as part of a possible "tag start" token, and is
serialized literally (and not rendered as an escaped character
reference `&lt;`).

However, when appearing in either SVG or MathML foreign content, a
`style` tag should *not* be considered a raw text element, and should
be escaped when serialized. libgumbo is parsing this case correctly,
but our HTML5 serialization code does not escape the content.

This commit updates the static `is_one_of()` C function to consider
the namespace of the parent node as well as the tag's local name when
deciding whether the tag matches the list of HTML elements, so that a
`style` tag in foreign content will *not* match, but a `style` tag in
HTML content will match.

(cherry picked from commit 44e3a74aff2c93873c82d55db8f08912f4e69d59)
  • Loading branch information
flavorjones committed Nov 30, 2024
1 parent d8d6ba3 commit 02572e8
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 1 deletion.
8 changes: 7 additions & 1 deletion ext/nokogiri/xml_node.c
Original file line number Diff line number Diff line change
Expand Up @@ -1849,13 +1849,19 @@ is_one_of(xmlNodePtr node, char const *const *tagnames, size_t num_tagnames)
if (name == NULL) { // fragments don't have a name
return false;
}

if (node->ns != NULL) {
// if the node has a namespace, it's in a foreign context and is not one of the HTML tags we're
// matching against.
return false;
}

for (size_t idx = 0; idx < num_tagnames; ++idx) {
if (!strcmp(name, tagnames[idx])) {
return true;
}
}
return false;

}

static void
Expand Down
16 changes: 16 additions & 0 deletions test/html5/test_serialize.rb
Original file line number Diff line number Diff line change
Expand Up @@ -553,4 +553,20 @@ def test_serializing_html5_fragment
refute(fragment.send(:prepend_newline?))
assert_equal("<div>hello</div>goodbye", fragment.to_html)
end

describe "foreign content style tag serialization is escaped" do
it "with svg parent" do
input = %{<svg><style>&lt;img src>}
expected = %{<svg><style>&lt;img src&gt;</style></svg>}

assert_equal(expected, Nokogiri::HTML5.fragment(input).to_html)
end

it "with math parent" do
input = %{<math><style>&lt;img src>}
expected = %{<math><style>&lt;img src&gt;</style></math>}

assert_equal(expected, Nokogiri::HTML5.fragment(input).to_html)
end
end
end if Nokogiri.uses_gumbo?

0 comments on commit 02572e8

Please sign in to comment.