Skip to content

Version 3.0.0 (2014-06-21)

Compare
Choose a tag to compare
@rgrove rgrove released this 21 Jun 23:16
· 186 commits to main since this release

As of this version, Sanitize adheres strictly to the SemVer 2.0.0 versioning standard. This release contains API and output changes that are incompatible with previous releases, as indicated by the major version increment.

Backwards-incompatible changes

  • HTML is now parsed using Google's Gumbo HTML5 parser, which adheres to the HTML5 parsing spec and behaves much more like modern browser parsers than the previous libxml2-based parser. As a result, HTML output may differ from that of previous versions of Sanitize.
  • All transformers now traverse the document from the top down, starting with the first node, then its first child, and so on. The :transformers_breadth config has been removed, and old bottom-up transformers (the previous default) may need to be rewritten.
  • Sanitize's built-in configs are now deeply frozen to prevent people from modifying them (either accidentally or maliciously). To customize a built-in config, create a new copy using Sanitize::Config.merge(), like so:
Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
  :elements        => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
  :remove_contents => true
))
  • The clean! and clean_document! methods were removed, since they weren't useful and tended to confuse people.
  • The clean method was renamed to fragment to more clearly indicate that its intended use is to sanitize an HTML fragment.
  • The clean_document method was renamed to document.
  • The clean_node! method was renamed to node!.
  • The document method now raises a Sanitize::Error if the <html> element isn't whitelisted, rather than a RuntimeError. This error is also now raised regardless of the :remove_contents config setting.
  • The :output config has been removed. Output is now always HTML, not XHTML.
  • The :output_encoding config has been removed. Output is now always UTF-8.

Other changes

  • Added advanced CSS sanitization support using Crass, which is fully compliant with the CSS Syntax Module Level 3 parsing spec. The contents of whitelisted <style> elements and style attributes in HTML will be sanitized as CSS, or you can use the Sanitize::CSS class to manually sanitize CSS stylesheets or properties.
  • Added an :allow_doctype setting. When true, well-formed doctype definitions will be allowed in documents. When false (the default), doctype definitions will be removed from documents. Doctype definitions are never allowed in fragments, regardless of this setting.
  • Added the following elements to the relaxed config, in addition to various attributes: article, aside, body, data, div, footer, head, header, html, main, nav, section, span, style, title.
  • The :whitespace_elements config is now a Hash, and allows you to specify the text that should be inserted before and after these elements when they're removed. The old-style Array-based config value is still supported for backwards compatibility. @alperkokmen - #94
  • Unsuitable Unicode characters are now removed from HTML before it's parsed. #106
  • Fixed: Non-tag brackets in input like "1 > 2 and 2 < 1" are now parsed and escaped correctly in accordance with the HTML5 spec, becoming "1 &gt; 2 and 2 &lt; 1". #83
  • Fixed: Siblings added after the current node during traversal are now also traversed. In previous versions they were simply skipped. #91
  • Fixed: Nokogiri has been smacked and instructed to stop adding newlines after certain elements, because if people wanted newlines there they'd have put them there, dammit. #103
  • Fixed: Added a workaround for a libxml2 bug that caused an undesired content-type meta tag to be added to all documents with <head> elements. Nokogiri #1008