# Changelog All notable changes to Nokogumbo will be documented in this file. The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). ## [Unreleased] ### Added ### Changed ### Deprecated ### Removed ### Fixed ### Security ## [2.0.5] - 2021-03-19 ### Fixed - Support Mageia distros when libxml2/libxslt system libraries are install. #165 (Thank you, @pterjan!) ### Added - Forward-looking support for a version of Nokogiri that will provide HTML5 parsing. #171 ### Improved - Update extconf.rb to use Nokogiri v1.11's CPPFLAGS for more reliable installation. #163 ## [2.0.4] - 2020-11-27 ### Fixed - Fixed a bug where `Nokogiri::HTML5.fragment(nil)` would raise an error. Now it returns an empty `DocumentFragment` like it did in v2.0.2. - Fixed assertion failure when a tag immediately followed the UTF-8 BOM. ## [2.0.3] - 2020-11-21 ### Added - Limit enforced on number of attributes per element, defaulting to 400 and configurable with the `:max_attributes` argument. ### Fixed - Ignore UTF-8 byte order mark at the beginning of the input. - Fix content sniffing for Unicode strings. - Fixed crash where Ruby objects constructed in C can be garbage collected. ## [2.0.2] - 2019-11-19 ### Added - Support Ruby 2.6 ### Fixed - Fix assertion failures with nonstandard HTML tags. - Fix the handling of mis-nested formatting tags (the adoption agency algorithm). - Fix crash with zero-length HTML tags. ### Security - Prevent 1-byte buffer over read when constructing an error message about an unexpected EOF. ## [2.0.1] - 2018-11-11 ### Fixed - Fix line numbers on elements from `#line`. ## [2.0.0] - 2018-10-04 ### Added - Experimental support for errors (it was supported in 1.5.0 but undocumented). - Added proper HTML5 serialization. - Added option `:max_errors` to control the maximum number of errors reported by `#errors`. - Added option `:max_tree_depth` to control the maximum parse tree depth. - Line number support via `Nokogiri::XML::Node#line` as long as Nokogumbo has been compiled with libxml2 support. ### Changed - Integrated [Gumbo parser](https://github.com/google/gumbo-parser) into Nokogumbo. A system version will not be used. - The undocumented (but publicly mentioned) `:max_parse_errors` renamed to `:max_errors`; `:max_parse_errors` is deprecated and will go away - The various `#parse` and `#fragment` (and `Nokogiri.HTML5`) methods return `Nokogiri::HTML5::Document` and `Nokogiri::HTML5::DocumentFragment` classes rather than `Nokogiri::HTML::Document` and `Nokogiri::HTML::DocumentFragment`. - Changed the top-level API to more closely match Nokogiri's while maintaining backwards compatibility. The new APIs are * `Nokogiri::HTML5(html, url = nil, encoding = nil, **options, &block)` * `Nokogiri::HTML5.parse(html, url = nil, encoding = nil, **options, &block)` * `Nokogiri::HTML5::Document.parse(html, url = nil, encoding = nil, **options, &block)` * `Nokogiri::HTML5.fragment(html, encoding = nil, **options)` * `Nokogiri::HTML5::DocumentFragment.parse(html, encoding = nil, **options)` * `Nokogiri::HTML5::DocumentFragment.new(document, html = nil, ctx = nil)` * `Nokogiri::HTML5::Document#fragment(html = nil)` * `Nokogiri::XML::Node#fragment(html = nil)` In all cases, `html` can be a string or an `IO` object (something that responds to `#read`). The `url` parameter is entirely for error reporting, as in Nokogiri. The `encoding` parameter only signals what encoding `html` should have on input; the output `Document` or `DocumentFragment` will be in UTF-8. Currently, the only options supported are `:max_errors` which controls the maximum number of reported by `#errors`. - Minimum supported version of Ruby changed to 2.1. - Minimum supported version of Nokogiri changed to 1.8.0. - `Nokogiri::HTML5::DocumentFragment#errors` returns errors for the document fragment itself, not the underlying document. - The five XML namespaces described in the HTML spec, MathML, SVG, XLink, XML, and XMLNS, are now supported. Thus `<svg>` will create an `svg` element in the SVG namespace and `<math>` will create a `math` element in the MathML namespace. An attribute `xml:lang=en`, for example, will create a `lang` attribute in the XML namespace, **but only in foreign elements (i.e., those in the SVG or MathML namespaces)**. On HTML elements, this creates an attribute with the name `xml:lang`. This changes the `#xpath` and related APIs. - Calling `#to_xml` on a `Nokogiri::HTML5::Document` will produce XML output rather than HTML. ### Deprecated - `:max_parse_errors`; use `:max_errors` ### Fixed - Fixed documents failing to serialize (via `to_html`) if they contain certain `meta` elements that set the `charset`. - Documents are now properly marked as UTF-8 after parsing. - Fixed `Nokogiri::HTML5.fragment` reporting an error due to a missing `<!DOCTYPE html>`. - Fixed crash when input contains U+0000 NULL bytes and error reporting is enabled. ### Security - The most recent, released version of Gumbo has a [potential security issue](https://github.com/google/gumbo-parser/pull/375) that could result in a cross-site scripting vulnerability. This has been fixed by integrating Gumbo into Nokogumbo.