Skip to content

Commit

Permalink
Always traverse top-down; don't skip siblings added during traversal.
Browse files Browse the repository at this point in the history
This removes the :transformers_breadth and :transformers_depth configs, which were poorly-named. All transformers now perform top-down
traversal.

This also fixes a bug where sibling nodes added by a transformer during
traversal were never traversed.

Fixes #90
Fixes #91
  • Loading branch information
rgrove committed May 18, 2014
1 parent 0ea5ba7 commit 6208727
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 43 deletions.
41 changes: 22 additions & 19 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,34 +9,37 @@ versioning standard. This release contains API and output changes that are
incompatible with previous releases, as indicated by the major version
increment.

Backwards-incompatible changes are prefixed with `[!]`.
[semver]:http://semver.org/

* [!] HTML is now parsed using Google's Gumbo HTML5 parser, which adheres to the
HTML5 parsing spec and behaves much more like modern browser parsers. As a
result, HTML output may differ in some ways from previous versions of
Sanitize.
### Backwards-incompatible changes

* [!] The `clean!` and `clean_document!` methods were removed, since they
weren't useful and tended to confuse people.
* HTML is now parsed using Google's Gumbo HTML5 parser, which adheres to the
HTML5 parsing spec and behaves much more like modern browser parsers than the
previous libxml2-based parser. As a result, HTML output may differ from that
of previous versions of Sanitize.

* [!] The `clean` method was renamed to `fragment` to more clearly indicate that
its intended use is to sanitize an HTML fragment.
* All transformers now traverse the document from the top down, starting with
the first node, then its first child, and so on. The `:transformers_breadth`
config has been removed, and old bottom-up transformers (the previous default)
may need to be rewritten.

* [!] The `clean_document` method was renamed to `document`.
* The `clean!` and `clean_document!` methods were removed, since they weren't
useful and tended to confuse people.

* [!] The `clean_node!` method was renamed to `node!`.
* The `clean` method was renamed to `fragment` to more clearly indicate that its
intended use is to sanitize an HTML fragment.

* [!] The `document` method now raises a `Sanitize::Error` if the `<html>`
element isn't whitelisted, rather than a `RuntimeError`. This error is also
now raised regardless of the `:remove_contents` config setting.
* The `clean_document` method was renamed to `document`.

* [!] The `:output` config has been removed. Output is now always HTML, not
XHTML.
* The `clean_node!` method was renamed to `node!`.

* [!] The `:output_encoding` config has been removed. Output is now always
UTF-8.
* The `document` method now raises a `Sanitize::Error` if the `<html>` element
isn't whitelisted, rather than a `RuntimeError`. This error is also now raised
regardless of the `:remove_contents` config setting.

[semver]:http://semver.org/
* The `:output` config has been removed. Output is now always HTML, not XHTML.

* The `:output_encoding` config has been removed. Output is now always UTF-8.


Version 2.2.0 (git)
Expand Down
51 changes: 27 additions & 24 deletions lib/sanitize.rb
Original file line number Diff line number Diff line change
Expand Up @@ -69,16 +69,13 @@ def self.node!(node, config = {})
def initialize(config = {})
@config = Config::DEFAULT.merge(config)

@transformers = {
:breadth => Array(@config[:transformers_breadth].dup),
:depth => Array(@config[:transformers]) + Array(@config[:transformers_depth])
}
@transformers = Array(@config[:transformers].dup)

# Default depth transformers. These always run at the end of the chain,
# after any custom transformers.
@transformers[:depth] << Transformers::CleanComment unless @config[:allow_comments]
# Default transformers always run at the end of the chain, after any custom
# transformers.
@transformers << Transformers::CleanComment unless @config[:allow_comments]

@transformers[:depth] <<
@transformers <<
Transformers::CleanCDATA <<
Transformers::CleanElement.new(@config)
end
Expand Down Expand Up @@ -133,11 +130,10 @@ def node!(node)

node_whitelist = Set.new

unless @transformers[:breadth].empty?
traverse_breadth(node) {|n| transform_node!(n, node_whitelist, :breadth) }
traverse(node) do |n|
transform_node!(n, node_whitelist)
end

traverse_depth(node) {|n| transform_node!(n, node_whitelist, :depth) }
node
end

Expand All @@ -150,15 +146,14 @@ def to_html(node)
)
end

def transform_node!(node, node_whitelist, mode)
@transformers[mode].each do |transformer|
def transform_node!(node, node_whitelist)
@transformers.each do |transformer|
result = transformer.call({
:config => @config,
:is_whitelisted => node_whitelist.include?(node),
:node => node,
:node_name => node.name.downcase,
:node_whitelist => node_whitelist,
:traversal_mode => mode
})

if result.is_a?(Hash) && result[:node_whitelist].respond_to?(:each)
Expand All @@ -169,18 +164,26 @@ def transform_node!(node, node_whitelist, mode)
node
end

# Performs breadth-first traversal, operating first on the root node, then
# traversing downwards.
def traverse_breadth(node, &block)
# Performs top-down traversal of the given node, operating first on the node
# itself, then traversing each child (if any) in order.
def traverse(node, &block)
block.call(node)
node.children.each {|child| traverse_breadth(child, &block) }
end

# Performs depth-first traversal, operating first on the deepest nodes in the
# document, then traversing upwards to the root.
def traverse_depth(node, &block)
node.children.each {|child| traverse_depth(child, &block) }
block.call(node)
child = node.child

while child do
prev = child.previous_sibling
traverse(child, &block)

if child.parent != node
# The child was unlinked or reparented, so traverse the previous node's
# next sibling, or the parent's first child if there is no previous
# node.
child = prev ? prev.next_sibling : node.child
else
child = child.next_sibling
end
end
end

class Error < StandardError; end
Expand Down

0 comments on commit 6208727

Please sign in to comment.