Don't canonicalize during parsing #93

juliapath · 2021-08-20T11:27:59Z

My use case is editing documents written by humans, so I want to preserve the structure of the document as much as possible. Using withCanonicalize no CDATA sections should be preserved, however currently that is only the case if they are the only child of a node, as otherwise mergeTextNodes will delete them. As far as I can tell the only thing mergeTextNodes does during parsing is convert CDATA sections and character references to normal text nodes. Both of these should only be done in canonicalization. A legitamite purpose mergeTextNodes might fullfill here would be merging two consecutive actual text nodes, but I don't think the parser will create such a situation in the first place.

If this was accepted this should probably also be changed for htmlContent.

parsing should preserve CDATAs and character references. These should only be converted to text in canonicalization if desired.

Don't canonicalize during parsing

12d5d47

parsing should preserve CDATAs and character references. These should only be converted to text in canonicalization if desired.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't canonicalize during parsing #93

Don't canonicalize during parsing #93

juliapath commented Aug 20, 2021

Don't canonicalize during parsing #93

Are you sure you want to change the base?

Don't canonicalize during parsing #93

Conversation

juliapath commented Aug 20, 2021