v7.0.0
Welcome to parse5@7.0.0! ✨ This is a huge release with many changes, features and fixes.
From an organisational perspective, the most important change is that parse5 is now maintained by a team, consisting of James (@43081j), Titus (@wooorm) and me (@fb55). We come from three projects that rely on parse5 — namely Cheerio, rehype, and Lit.
We need your support to continue the project! If you care about parse5, please support us financially on OpenCollective.
Headlining features of this release are ES Modules, TypeScript, and performance improvements: 7.0.0 is 45% faster than 6.0.1 with default options, and 167% faster with location information enabled (for the bench/perf
benchmark, on an M1 Mac). Version 7.0.0 is a revamp of every part of the library. There are too many changes to list them all here, so here is a high-level overview:
Breaking: ESM
All of parse5’s packages are now ECMAScript Modules. We are providing dual packages for parse5
and parse5-htmlparser2-tree-adapter
for now (see #418 and #496).
To migrate, please read this Gist on how to update. Note that private internals are no longer available; instead, everything that you need should be imported from the main package.
Implemented by @43081j in #351
Breaking: TypeScript
The codebase has been ported to TypeScript. This helped uncover a number of subtle logic bugs, such as dc4e269, b4b5d4a, or a0aff95. TypeScript also helps us refactor with confidence and a lot of the changes in this release would have been much harder to do without it.
To migrate, please remove @types/parse5*
as we now ship our own types.
Potentially breaking changes
- parse5 was caught up with the HTML spec, and parsing results might differ in edge-cases (#442, #451)
- The
parse5-serializer-stream
package was removed #481- To migrate, use the
serialize
function exported byparse5
.
- To migrate, use the
- The rewriting stream now splits very long text sections (#434) and doesn’t escape text in special tags anymore (#434). If you worked around these issues before, you might have to update your code.
- The htmlparser2 adapter now uses
domhandler
’s node interface (#327 by @TrySound)- The format of the tree nodes has changed slightly; eg. some previous properties are now getters and setters, and vice versa.
If you are using deep imports for any parts of the codebase, you will likely encounter some breakages:
Show internal changes
- The tokenizer now uses the state machine pattern from htmlparser2
5d7a780
(#362) - The token queue was replaced with callbacks (#404, #405, #419)
- The
OpenElementStack
now uses callbacks #429 - Mixins were removed (as part of #362)
- Location tracking now has a substantially lower overhead #402
getNextToken
was removed #461- The parser’s
_bootstrap
method was removed #384 - We now drop chunks from the tokenizer right after they are emitted #432
- The serializer is no longer a class; instead, different serializer functions call on each other #383
- parse5 now uses the
entities
module for encoding and decoding entities, sharing maintenance & optimisation work with projects such as htmlparser2 (2b92054
(#362), #486)entities
adopted a variant of parse5’s approach of decoding entities. As a result, decoding performance is equivalent, while memory consumption is slightly lower.
Other changes
- minor add hooks for stack events to tree adapter interface #385
- minor add support for fragments in
parse5-parser-stream
#487 - minor add
serializeOuter
(like.outerHTML
),scriptingEnabled
option #383 - patch fix parsing of
<<
in comments parsed wrongly as<!
(#326) - patch fix position of
endTag
for mixed-case foreign elements (#353) - patch fix end position of
html
,body
(#436) - docs: parse5 has a new documentation website at
parse5.js.org
#443
New Contributors
Thanks @anko, @TrySound, @samouri, @alan-agius4, and @pmdartus!
Full Changelog: v6.0.1...v7.0.0