Skip to content

Commit

Permalink
move prefix and ns into XPathVisitor (#3225)
Browse files Browse the repository at this point in the history
**What problem is this PR intended to solve?**

Refactoring the CSS parser internals a bit to separate concerns between
the parser and the xpath visitor.

- move prefix into the XPathVisitor (from the AST xpath_for method)
- move namespaces into the XPathVisitor (from the Parser constructor)
- update the CSS selector caching mechanism
- CSS.xpath_for now prefers keyword arguments, and warns on options hash


**Have you included adequate test coverage?**

This is mostly a refactor of internals, existing test coverage is mostly
sufficient, and I added some tests for edge cases and new attributes.


**Does this change affect the behavior of either the C or the Java
implementations?**

N/A
  • Loading branch information
flavorjones authored Jun 11, 2024
2 parents be379f4 + 99be08f commit 396fa3f
Show file tree
Hide file tree
Showing 11 changed files with 180 additions and 68 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,13 @@ Nokogiri follows [Semantic Versioning](https://semver.org/), please see the [REA
### Changed

* [CRuby] `Nokogiri::XML::CData.new` no longer accepts `nil` as the content argument, making `CData` behave like other character data classes (like `Comment` and `Text`). This change was necessitated by behavioral changes in the upcoming libxml 2.13.0 release. If you wish to create an empty CDATA node, pass an empty string. [#3156] @flavorjones
* The internal `CSS::XPathVisitor` class now accepts the xpath prefix and the context namespaces as constructor arguments. The `prefix:` and `ns:` keyword arguments to `CSS.xpath_for` cannot be specified if the `visitor:` keyword argument is also used. `CSS::XPathVisitor` now exposes `#builtins`, `#doctype`, `#prefix`, and `#namespaces` attributes.


### Deprecated

* The undocumented and unused method `Nokogiri::CSS.parse` is now deprecated and will generate a warning. The AST returned by this method is private and subject to change and removal in future versions of Nokogiri. This method will be removed in a future version of Nokogiri.
* Passing an options hash to `CSS.xpath_for` is now deprecated and will generate a warning. Use keyword arguments instead. This will become an error in a future version of Nokogiri.


## v1.16.5
Expand Down
30 changes: 23 additions & 7 deletions lib/nokogiri/css.rb
Original file line number Diff line number Diff line change
Expand Up @@ -70,17 +70,33 @@ def parse(selector) # :nodoc:
#
# Nokogiri::CSS.xpath_for("h1, h2, h3") # => ["//h1", "//h2", "//h3"]
#
def xpath_for(selector, options = {})
raise TypeError, "no implicit conversion of #{selector.inspect} to String" unless selector.respond_to?(:to_str)
def xpath_for(
selector, options = nil,
prefix: options&.delete(:prefix), visitor: options&.delete(:visitor), ns: options&.delete(:ns)
)
unless options.nil?
warn("Passing options as an explicit hash is deprecated. Use keyword arguments instead. This will become an error in a future release.", uplevel: 1, category: :deprecated)
end

raise(TypeError, "no implicit conversion of #{selector.inspect} to String") unless selector.respond_to?(:to_str)

selector = selector.to_str
raise Nokogiri::CSS::SyntaxError, "empty CSS selector" if selector.empty?
raise(Nokogiri::CSS::SyntaxError, "empty CSS selector") if selector.empty?

if visitor
raise ArgumentError, "cannot provide both :prefix and :visitor" if prefix
raise ArgumentError, "cannot provide both :ns and :visitor" if ns
end

visitor ||= begin
visitor_kw = {}
visitor_kw[:prefix] = prefix if prefix
visitor_kw[:namespaces] = ns if ns

prefix = options.fetch(:prefix, Nokogiri::XML::XPath::GLOBAL_SEARCH_PREFIX)
visitor = options.fetch(:visitor) { Nokogiri::CSS::XPathVisitor.new }
ns = options.fetch(:ns, {})
Nokogiri::CSS::XPathVisitor.new(**visitor_kw)
end

Parser.new(ns).xpath_for(selector, prefix, visitor)
Parser.new.xpath_for(selector, visitor)
end
end
end
Expand Down
8 changes: 6 additions & 2 deletions lib/nokogiri/css/node.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,12 @@ def accept(visitor)

###
# Convert this CSS node to xpath with +prefix+ using +visitor+
def to_xpath(prefix, visitor)
prefix = "." if ALLOW_COMBINATOR_ON_SELF.include?(type) && value.first.nil?
def to_xpath(visitor)
prefix = if ALLOW_COMBINATOR_ON_SELF.include?(type) && value.first.nil?
"."
else
visitor.prefix
end
prefix + visitor.accept(self)
end

Expand Down
6 changes: 3 additions & 3 deletions lib/nokogiri/css/parser.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# frozen_string_literal: true
#
# DO NOT MODIFY!!!!
# This file is automatically generated by Racc 1.7.3
# This file is automatically generated by Racc 1.8.0
# from Racc grammar file "parser.y".
#

Expand Down Expand Up @@ -470,12 +470,12 @@ def _reduce_23(val, _values, result)
end

def _reduce_24(val, _values, result)
result = Node.new(:ELEMENT_NAME, [[val[0], val[2]].compact.join(':')])
result = Node.new(:ELEMENT_NAME, [val[0], val[2]])
result
end

def _reduce_25(val, _values, result)
name = @namespaces.key?('xmlns') ? "xmlns:#{val[0]}" : val[0]
name = val[0]
result = Node.new(:ELEMENT_NAME, [name])

result
Expand Down
4 changes: 2 additions & 2 deletions lib/nokogiri/css/parser.y
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,9 @@ rule
;

namespaced_ident:
namespace '|' IDENT { result = Node.new(:ELEMENT_NAME, [[val[0], val[2]].compact.join(':')]) }
namespace '|' IDENT { result = Node.new(:ELEMENT_NAME, [val[0], val[2]]) }
| IDENT {
name = @namespaces.key?('xmlns') ? "xmlns:#{val[0]}" : val[0]
name = val[0]
result = Node.new(:ELEMENT_NAME, [name])
}
;
Expand Down
16 changes: 7 additions & 9 deletions lib/nokogiri/css/parser_extras.rb
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,9 @@ def without_cache(&block)
end
end

# Create a new CSS parser with respect to +namespaces+
def initialize(namespaces = {})
def initialize
@tokenizer = Tokenizer.new
@namespaces = namespaces
super()
super
end

def parse(string)
Expand All @@ -73,10 +71,10 @@ def next_token
end

# Get the xpath for +string+ using +options+
def xpath_for(string, prefix, visitor)
key = cache_key(string, prefix, visitor)
def xpath_for(string, visitor)
key = cache_key(string, visitor)
self.class[key] ||= parse(string).map do |ast|
ast.to_xpath(prefix, visitor)
ast.to_xpath(visitor)
end
end

Expand All @@ -86,9 +84,9 @@ def on_error(error_token_id, error_value, value_stack)
raise SyntaxError, "unexpected '#{error_value}' after '#{after}'"
end

def cache_key(query, prefix, visitor)
def cache_key(query, visitor)
if self.class.cache_on?
[query, prefix, @namespaces, visitor.config]
[query, visitor.config]
end
end
end
Expand Down
39 changes: 33 additions & 6 deletions lib/nokogiri/css/xpath_visitor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,18 @@ module DoctypeConfig
VALUES = [XML, HTML4, HTML5]
end

# The visitor configuration set via the +builtins:+ keyword argument to XPathVisitor.new.
attr_reader :builtins

# The visitor configuration set via the +doctype:+ keyword argument to XPathVisitor.new.
attr_reader :doctype

# The visitor configuration set via the +prefix:+ keyword argument to XPathVisitor.new.
attr_reader :prefix

# The visitor configuration set via the +namespaces:+ keyword argument to XPathVisitor.new.
attr_reader :namespaces

# :call-seq:
# new() → XPathVisitor
# new(builtins:, doctype:) → XPathVisitor
Expand All @@ -54,7 +66,12 @@ module DoctypeConfig
#
# [Returns] XPathVisitor
#
def initialize(builtins: BuiltinsConfig::NEVER, doctype: DoctypeConfig::XML)
def initialize(
builtins: BuiltinsConfig::NEVER,
doctype: DoctypeConfig::XML,
prefix: Nokogiri::XML::XPath::GLOBAL_SEARCH_PREFIX,
namespaces: nil
)
unless BuiltinsConfig::VALUES.include?(builtins)
raise(ArgumentError, "Invalid values #{builtins.inspect} for builtins: keyword parameter")
end
Expand All @@ -64,6 +81,8 @@ def initialize(builtins: BuiltinsConfig::NEVER, doctype: DoctypeConfig::XML)

@builtins = builtins
@doctype = doctype
@prefix = prefix
@namespaces = namespaces
end

# :call-seq: config() → Hash
Expand All @@ -72,7 +91,7 @@ def initialize(builtins: BuiltinsConfig::NEVER, doctype: DoctypeConfig::XML)
# a Hash representing the configuration of the XPathVisitor, suitable for use as
# part of the CSS cache key.
def config
{ builtins: @builtins, doctype: @doctype }
{ builtins: @builtins, doctype: @doctype, prefix: @prefix, namespaces: @namespaces }
end

# :stopdoc:
Expand Down Expand Up @@ -258,6 +277,14 @@ def visit_element_name(node)
else
"*[local-name()='#{node.value.first}']"
end
elsif node.value.length == 2 # has a namespace prefix
if node.value.first.nil? # namespace prefix is empty
node.value.last
else
node.value.join(":")
end
elsif @namespaces&.key?("xmlns") # apply the default namespace if it's declared
"xmlns:#{node.value.first}"
else
node.value.first
end
Expand All @@ -280,10 +307,10 @@ def validate_xpath_function_name(name)
end

def html5_element_name_needs_namespace_handling(node)
# if this is the wildcard selector "*", use it as normal
node.value.first != "*" &&
# if there is already a namespace (i.e., it is a prefixed QName), use it as normal
!node.value.first.include?(":")
# if there is already a namespace (i.e., it is a prefixed QName), use it as normal
node.value.length == 1 &&
# if this is the wildcard selector "*", use it as normal
node.value.first != "*"
end

def nth(node, options = {})
Expand Down
14 changes: 6 additions & 8 deletions lib/nokogiri/xml/searchable.rb
Original file line number Diff line number Diff line change
Expand Up @@ -243,16 +243,14 @@ def css_rules_to_xpath(rules, ns)
end

def xpath_query_from_css_rule(rule, ns)
visitor = Nokogiri::CSS::XPathVisitor.new(
builtins: Nokogiri::CSS::XPathVisitor::BuiltinsConfig::OPTIMAL,
doctype: document.xpath_doctype,
)
self.class::IMPLIED_XPATH_CONTEXTS.map do |implied_xpath_context|
CSS.xpath_for(rule.to_s, {
visitor = Nokogiri::CSS::XPathVisitor.new(
builtins: Nokogiri::CSS::XPathVisitor::BuiltinsConfig::OPTIMAL,
doctype: document.xpath_doctype,
prefix: implied_xpath_context,
ns: ns,
visitor: visitor,
})
namespaces: ns,
)
CSS.xpath_for(rule.to_s, visitor: visitor)
end.join(" | ")
end

Expand Down
30 changes: 25 additions & 5 deletions test/css/test_css.rb
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
Nokogiri::CSS::Parser.without_cache do
mock_visitor = Minitest::Mock.new
mock_visitor.expect(:accept, "injected-value", [Nokogiri::CSS::Node])
mock_visitor.expect(:prefix, "//")

result = Nokogiri::CSS.xpath_for("foo", visitor: mock_visitor)

Expand All @@ -21,17 +22,36 @@
end

it "accepts an options hash" do
assert_equal(
["./foo"],
Nokogiri::CSS.xpath_for("foo", { prefix: "./", visitor: Nokogiri::CSS::XPathVisitor.new }),
)
assert_output(nil, /Passing options as an explicit hash is deprecated/) do
assert_equal(
["./foo"],
Nokogiri::CSS.xpath_for("foo", { prefix: "./" }),
)
end

assert_output(nil, /Passing options as an explicit hash is deprecated/) do
assert_equal(
["./foo"],
Nokogiri::CSS.xpath_for("foo", { visitor: Nokogiri::CSS::XPathVisitor.new(prefix: "./") }),
)
end
end

it "accepts keyword arguments" do
assert_equal(
["./foo"],
Nokogiri::CSS.xpath_for("foo", prefix: "./", visitor: Nokogiri::CSS::XPathVisitor.new),
Nokogiri::CSS.xpath_for("foo", prefix: "./"),
)
assert_equal(
["./foo"],
Nokogiri::CSS.xpath_for("foo", visitor: Nokogiri::CSS::XPathVisitor.new(prefix: "./")),
)
end

it "does not accept both prefix and visitor" do
assert_raises(ArgumentError) do
Nokogiri::CSS.xpath_for("foo", prefix: "./", visitor: Nokogiri::CSS::XPathVisitor.new)
end
end

describe "error handling" do
Expand Down
12 changes: 6 additions & 6 deletions test/css/test_parser_cache.rb
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ def teardown

Nokogiri::CSS.xpath_for(@css)
Nokogiri::CSS.xpath_for(@css)
Nokogiri::CSS::Parser.new.xpath_for(@css, "//", Nokogiri::CSS::XPathVisitor.new)
Nokogiri::CSS::Parser.new.xpath_for(@css, "//", Nokogiri::CSS::XPathVisitor.new)
Nokogiri::CSS::Parser.new.xpath_for(@css, Nokogiri::CSS::XPathVisitor.new(prefix: "//"))
Nokogiri::CSS::Parser.new.xpath_for(@css, Nokogiri::CSS::XPathVisitor.new(prefix: "//"))

if cache_setting
assert_equal(1, Nokogiri::CSS::Parser.class_eval { @cache.count })
Expand Down Expand Up @@ -134,19 +134,19 @@ def test_cache_key_on_ns_prefix_and_visitor_config
Nokogiri::CSS.xpath_for("foo", prefix: ".//", ns: { "example" => "http://example.com/" })
Nokogiri::CSS.xpath_for(
"foo",
prefix: ".//",
ns: { "example" => "http://example.com/" },
visitor: Nokogiri::CSS::XPathVisitor.new(
builtins: Nokogiri::CSS::XPathVisitor::BuiltinsConfig::ALWAYS,
prefix: ".//",
namespaces: { "example" => "http://example.com/" },
),
)
Nokogiri::CSS.xpath_for(
"foo",
prefix: ".//",
ns: { "example" => "http://example.com/" },
visitor: Nokogiri::CSS::XPathVisitor.new(
builtins: Nokogiri::CSS::XPathVisitor::BuiltinsConfig::ALWAYS,
doctype: Nokogiri::CSS::XPathVisitor::DoctypeConfig::HTML5,
prefix: ".//",
namespaces: { "example" => "http://example.com/" },
),
)
assert_equal(5, cache.length)
Expand Down
Loading

0 comments on commit 396fa3f

Please sign in to comment.