Skip to content

Commit

Permalink
tidy: prepare to remove in-context recovery hack needed for libxml < …
Browse files Browse the repository at this point in the history
…2.13 (#3256)

**What problem is this PR intended to solve?**

This hack is not necessary with libxml 2.13 which improves fragment
recovery behavior.

- add a TODO to remind me to remove the hack once we no longer support
libxml 2.13 (system libs)
- add a test that asserts the correct behavior when using libxml >= 2.13

Closes #2092


**Have you included adequate test coverage?**

Yes. 


**Does this change affect the behavior of either the C or the Java
implementations?**

Sadly, the Java implementation still does not handle in-context fragment
parsing correctly, but that's out of scope for this improvement.
  • Loading branch information
flavorjones authored Jun 24, 2024
2 parents af98b73 + 966d5ca commit 75ab97d
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Nokogiri follows [Semantic Versioning](https://semver.org/), please see the [REA
* [CRuby] When compiling packaged libraries from source, allow users' `AR` and `LD` environment variables to set the archiver and linker commands, respectively. This augments the existing `CC` environment variable to set the compiler command. [#3165] @ziggythehamster
* [CRuby] The HTML5 parse methods accept a `:parse_noscript_content_as_text` keyword argument which will emulate the parsing behavior of a browser which has scripting enabled. [#3178, #3231] @stevecheckoway
* [CRuby] `HTML5::DocumentFragment.parse` and `.new` accept a `:context` keyword argument that is the parse context node or element name. Previously this could only be passed in as a positional argument to `.new` and not at all to `.parse`. @flavorjones
* [CRuby] The update to libxml v2.13 improves "in context" fragment parsing recovery. We removed our hacky workaround for recovery that led to silently-degraded functionality when parsing fragments with parse errors. Specifically, malformed XML fragments that used implicit namespace prefixes will now "link up" to the namespaces in the parent document or node, where previously they did not. [#2092] @flavorjones


### Fixed
Expand Down
3 changes: 3 additions & 0 deletions lib/nokogiri/xml/node.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1086,9 +1086,11 @@ def parse(string_or_io, options = nil)

error_count = document.errors.length
node_set = in_context(contents, options.to_i)

if document.errors.length > error_count
raise document.errors[error_count] unless options.recover?

# TODO: remove this block when libxml2 < 2.13 is no longer supported
if node_set.empty?
# libxml2 < 2.13 does not obey the +recover+ option after encountering errors during
# +in_context+ parsing, and so this horrible hack is here to try to emulate recovery
Expand All @@ -1115,6 +1117,7 @@ def parse(string_or_io, options = nil)
node_set = fragment.children
end
end

node_set
end

Expand Down
14 changes: 14 additions & 0 deletions test/xml/test_document_fragment.rb
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,20 @@ def test_dup_creates_mutable_tree
refute_nil(duplicate.at_css("b"))
end

def test_in_context_fragment_parsing_recovery
skip("This tests behavior in libxml 2.13") unless Nokogiri.uses_libxml?(">= 2.13.0")

# https://github.com/sparklemotion/nokogiri/issues/2092
context_xml = "<root xmlns:n='https://example.com/foo'></root>"
context_doc = Nokogiri::XML::Document.parse(context_xml)
invalid_xml_fragment = "<n:a><b></n:a>" # note missing closing tag for `b`
fragment = context_doc.root.parse(invalid_xml_fragment)

assert_equal("a", fragment.first.name)
assert_equal("n", fragment.first.namespace.prefix)
assert_equal("https://example.com/foo", fragment.first.namespace.href)
end

def test_for_libxml_in_context_fragment_parsing_bug_workaround
skip_unless_libxml2("valgrind tests should only run with libxml2")

Expand Down

0 comments on commit 75ab97d

Please sign in to comment.