Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RubyConf] Create scrubber for replacing double breakpoints into paragraph nodes #284

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Active Record extensions for HTML sanitization are available in the [`loofah-act
* Add the _nofollow_ attribute to all hyperlinks.
* Add the _target=\_blank_ attribute to all hyperlinks.
* Remove _unprintable_ characters from text nodes.
* Modify _double breakpoints_ characters to paragraph nodes.
* Format markup as plain text, with (or without) sensible whitespace handling around block elements.
* Replace Rails's `strip_tags` and `sanitize` view helper methods.

Expand Down Expand Up @@ -235,6 +236,7 @@ doc.scrub!(:noopener) # adds rel="noopener" attribute to links
doc.scrub!(:noreferrer) # adds rel="noreferrer" attribute to links
doc.scrub!(:unprintable) # removes unprintable characters from text nodes
doc.scrub!(:targetblank) # adds target="_blank" attribute to links
doc.scrub!(:double_breakpoint) # removes double breakpoints to paragraph nodes
```

See `Loofah::Scrubbers` for more details and example usage.
Expand Down
52 changes: 52 additions & 0 deletions lib/loofah/scrubbers.rb
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,57 @@ def scrub(node)
end
end

#
# === scrub!(:double_breakpoint)
#
# +:double_breakpoint+ replaces double-break tags with closing/opening paragraph tags.
#
# double_breakpoint_markup = "<p>Some text here in a logical paragraph.<br><br>Some more text, apparently a second paragraph.</p>"
# Loofah.html5_fragment(messy_markup).scrub!(:double_breakpoint)
# => "<p>Some text here in a logical paragraph.</p><p>Some more text, apparently a second paragraph.</p>"
#
class DoubleBreakpoint < Scrubber
def initialize # rubocop:disable Lint/MissingSuper
@direction = :top_down
end

def scrub(node)
return CONTINUE unless (node.type == Nokogiri::XML::Node::ELEMENT_NODE) && (node.name == "p")

paragraph_with_break_point_nodes = node.xpath("//p[br[following-sibling::br]]")

paragraph_with_break_point_nodes.each do |paragraph_node|
new_paragraph = paragraph_node.add_previous_sibling("<p>").first

paragraph_node.children.each do |child|
remove_blank_text_nodes(child)
end

paragraph_node.children.each do |child|
# already unlinked
next if child.parent.nil?

if child.name == "br" && child.next_sibling.name == "br"
new_paragraph = paragraph_node.add_previous_sibling("<p>").first
child.next_sibling.unlink
child.unlink
else
child.parent = new_paragraph
end
end

paragraph_node.unlink
end

CONTINUE
end

private

def remove_blank_text_nodes(node)
node.unlink if node.text? && node.blank?
end
end
#
# A hash that maps a symbol (like +:prune+) to the appropriate Scrubber (Loofah::Scrubbers::Prune).
#
Expand All @@ -362,6 +413,7 @@ def scrub(node)
targetblank: TargetBlank,
newline_block_elements: NewlineBlockElements,
unprintable: Unprintable,
double_breakpoint: DoubleBreakpoint,
}

class << self
Expand Down
13 changes: 13 additions & 0 deletions test/integration/test_scrubbers.rb
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@ class IntegrationTestScrubbers < Loofah::TestCase
ENTITY_HACK_ATTACK_TEXT_SCRUB = "Hack attack!&lt;script&gt;alert('evil')&lt;/script&gt;"
ENTITY_HACK_ATTACK_TEXT_SCRUB_UNESC = "Hack attack!<script>alert('evil')</script>"

BREAKPOINT_FRAGMENT = "<p>Some text here in a logical paragraph.<br><br>Some more text, apparently a second paragraph.<br><br>Et cetera...</p>"
BREAKPOINT_RESULT = "<p>Some text here in a logical paragraph.</p><p>Some more text, apparently a second paragraph.</p><p>Et cetera...</p>"

context "scrubbing shortcuts" do
context "#scrub_document" do
it "is a shortcut for parse-and-scrub" do
Expand Down Expand Up @@ -225,6 +228,16 @@ def html5?
assert_equal doc, result
end
end

context ":double_breakpoint" do
it "replaces double line breaks with paragraph tags" do
doc = klass.parse("<html><body>#{BREAKPOINT_FRAGMENT}</body></html>")
result = doc.scrub!(:double_breakpoint)

assert_equal BREAKPOINT_RESULT, doc.xpath("/html/body").inner_html
assert_equal doc, result
end
end
end

context "#text" do
Expand Down
Loading