This is a fork of html5lib to parse XML documents. For a pure HTML5 parser, please use html5lib instead.
Basically, html5plus is amlost exactly the same as html5lib, except it is also able to parse simple XML documents:
- Like XML, self-closing tags, such as <div/>, are handled as the leaf nodes (this is the only reason this fork exists).
For example,
<div/>
<div>foo</div>
will be interpreted as follows in html5plus.
<div></div>
<div>foo</div>
On the other hand, htm5lib and many browsers will interpret it as follows:
<div>
<div>foo</div>
</div>
- Support processing instructions (a pull request was sent to html5lib).
- HtmlParser has an additional flag called cdataOK. It controls whether CDATA is always accepted, including the
http://www.w3.org/1999/xhtml
namespace. - Support the line number information (Node.lineNumber).
- Notice that it is not available in Text node and it broke the compatibility with
dart:html
.
Add this to your pubspec.yaml
(or create it):
dependencies:
html5plus: any
###Parsing HTML is easy!
import 'package:html5plus/parser.dart' show parse;
import 'package:html5plus/dom.dart';
main() {
var document = parse(
'<body>Hello world! <a href="www.html5rocks.com">HTML5 rocks!');
print(document.outerHtml);
}
###Parsing XML
import 'package:html5plus/parser.dart' show parse;
import 'package:html5plus/dom.dart';
main() {
var document = new HtmlParser(lowercaseElementName: false,
lowercaseAttrName: false, cdataOK: true)
.parse("""
<!process this>
<foo>Hello world! <important>XML rocks!</important>
<![CDATA here & there ]]>
</foo>
""");
for (final node in document.nodes)
print("$node");
}