Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writer::new_with_indent() inserts white space into <![CDATA[ content tags #197

Closed
klausi opened this issue Feb 27, 2020 · 5 comments · Fixed by #254
Closed

Writer::new_with_indent() inserts white space into <![CDATA[ content tags #197

klausi opened this issue Feb 27, 2020 · 5 comments · Fixed by #254

Comments

@klausi
Copy link
Contributor

klausi commented Feb 27, 2020

Problem: the pretty print functionality should not insert spaces new lines before and after <![CDATA[ tags.

Example input:

<?xml version="1.0" encoding="UTF-8"?><jobs><job><jobid><![CDATA[00d46e4494e1]]></jobid></job></jobs>

Output with Writer::new_with_indent(out_file, b" "[0], 2);

<?xml version="1.0" encoding="UTF-8"?>
<jobs>
  <job>
    <jobid>
      <![CDATA[00d46e4494e1]]>
    </jobid>
  </job>
</jobs>

This has now changed the content of the jobid tag, which is bad.

Correct with xmllint --format:

<?xml version="1.0" encoding="UTF-8"?>
<jobs>
  <job>
    <jobid><![CDATA[00d46e4494e1]]></jobid>
  </job>
</jobs>

This has correctly preserved the content of the jobid tag.

@tafia
Copy link
Owner

tafia commented Mar 7, 2020

Thanks for opening the issue.

Do you have a specification to refer to? This is not obvious to me that the current behavior is wrong.

@klausi
Copy link
Contributor Author

klausi commented Mar 9, 2020

Could not find anything concrete in the XML specification, only http://xml.silmaril.ie/whitespace.html which talks about significant and insignificant whitespace.

A similar bug was reported to xml-js nashwaan/xml-js#14 where they also decided to not indent CDATA sections.

@klausi
Copy link
Contributor Author

klausi commented Mar 9, 2020

Another data point is the PHP DOMDocument output which formats the same way as xmllint and does not indent CDATA sections. https://3v4l.org/DviGR

@klausi
Copy link
Contributor Author

klausi commented Mar 9, 2020

Same with Python

import xml.dom.minidom

dom = xml.dom.minidom.parseString('<?xml version="1.0" encoding="UTF-8"?><jobs><job><jobid><![CDATA[00d46e4494e1]]></jobid></job></jobs>')
print(dom.toprettyxml())
<?xml version="1.0" ?>
<jobs>
    <job>
        <jobid><![CDATA[00d46e4494e1]]></jobid>
    </job>
</jobs>

Not sure if Python, PHP and xmllint all rely on the same XML parsing library, but I think these are pretty good indicators that quick-xml should do the same.

@cdwilson
Copy link

FWIW, Altova XMLSpy also reformats this input as @klausi suggested:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants