Friendly and forgiving HTML5/XML5 parser that supports React JSX, and uses zero-copy techniques to allow parsing large files efficiently.
Most markup parsers convert a string of markup into a nested map(hash/dict) of keys and values, with each of these allocated as separate variables in memory. This means that a 10MB XML file may balloon to 100MB of memory.
A different technique would be to retain the original string and generate an index of string offsets. Because these offsets are just numbers they can be packed more efficiently (a tutorial on zero-copy approaches).
This software is beta and it doesn't yet work
- Fault tolerant like HTML5/XML5.
- Valueless-attributes like HTML5 / XML5 eg <input multiple type=file>
- Attribute values may be quoted (E.g. <tag "some key"=false/> ) or not
- React JSX attributes and in text (not executed of course, but they're parsed as distinct node types).
- Multiple root nodes. Doesn't care about well-formedness. GIGO.
- Minimising memory use through Zero-Copy techniques.
- Tiny, no dependencies, and can run in Web Workers (e.g. doesn't use DOM APIs).
- Safer by removing SGML cruft. No support for external DTD resolution, or nested entity expansion. Only default entities in XML, NCRs, and HTML5 named entities are supported.
- Lots of tests.
- Complete W3C DOM (at least for now) although we will follow their API naming conventions where reasonable.
- HTML5 implied tags (e.g. won't automatically create tags such as <html>, <head>, <tbody>, ...etc).
npm install xml-zero-lexer
npm install xml-zero-beautify
npm install whats-the-damage
(more packages to come, but i'm making it modular)
- Lexer (2.6KB no dependencies, minified and gzipped)
- Beautifier (4KB all dependencies, minified and gzipped)
- What's The Damage benchmarker that measures time/memory/CPU of scripts
- A W3C DOM-like API
- Editable XML (by way of making new strings and leaving the original untouched, so it's still immutable)