Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't canonicalize during parsing #93

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

juliapath
Copy link

My use case is editing documents written by humans, so I want to preserve the structure of the document as much as possible. Using withCanonicalize no CDATA sections should be preserved, however currently that is only the case if they are the only child of a node, as otherwise mergeTextNodes will delete them. As far as I can tell the only thing mergeTextNodes does during parsing is convert CDATA sections and character references to normal text nodes. Both of these should only be done in canonicalization. A legitamite purpose mergeTextNodes might fullfill here would be merging two consecutive actual text nodes, but I don't think the parser will create such a situation in the first place.

If this was accepted this should probably also be changed for htmlContent.

parsing should preserve CDATAs and character references. These should only be converted to text in canonicalization if desired.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant