Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yq does not preserve XML doctype and declaration #1344

Closed
landure opened this issue Sep 16, 2022 · 4 comments
Closed

yq does not preserve XML doctype and declaration #1344

landure opened this issue Sep 16, 2022 · 4 comments
Labels

Comments

@landure
Copy link

landure commented Sep 16, 2022

Using yq to edit a xml file, the headers (doctype and xml declaration) are not preserved by the editing process.

Version of yq: 4.27.5
Operating system: Ubuntu 22.04 Impish Indri
Installed via: binary release (yq_linux_amd64.tar.gz)

Input Xml

/etc/iwatch/iwatch.yml:

<?xml version="1.0" ?>
<!DOCTYPE config SYSTEM "/etc/iwatch/iwatch.dtd" >

<config charset="utf-8">
  <guard email="iwatch@localhost" name="iWatch"/>
  <watchlist>
    <title>Operating System</title>
  </watchlist>
</config>

Command

contact_email="name@domain.com" sudo --preserve-env \
  yq --inplace --input-format='xml' --output-format='xml' 'eval' \
  'with(.config.watchlist ;
    . |= ( select(tag == "!!seq") // [ select( length > 0 ) ] )
  ) | with( .config.watchlist
            | select( any_c( ."title" == "Zimbra" ) | not) ;
      . |= . + [ {
          "title": "Zimbra",
          "contactpoint": { "+email": strenv(contact_email) }
        } ]
    )' '/etc/iwatch/iwatch.xml'

Actual behavior

<config charset="utf-8">
  <guard email="iwatch@localhost" name="iWatch"></guard>
  <watchlist>
    <title>Operating System</title>
  </watchlist>
  <watchlist>
    <title>Zimbra</title>
    <contactpoint email="name@domain.com"></contactpoint>
  </watchlist>
</config>

Expected behavior

<?xml version="1.0" ?>
<!DOCTYPE config SYSTEM "/etc/iwatch/iwatch.dtd" >

<config charset="utf-8">
  <guard email="iwatch@localhost" name="iWatch"></guard>
  <watchlist>
    <title>Operating System</title>
  </watchlist>
  <watchlist>
    <title>Zimbra</title>
    <contactpoint email="name@domain.com"></contactpoint>
  </watchlist>
</config>

Additional context
The doctype is used to add validation information to XML documents. It is a important feature. The xml declaration is mandatory according to the XML standard.

@joshcangit
Copy link

Yes, I think yq should preserve the prolog and doctype by default.

Only with an option e.g., --xml-no-prolog, should yq then remove these elements.

@mikefarah
Copy link
Owner

FYI - working on this

@mikefarah
Copy link
Owner

This will be updated in the next release v4.29.x will the following new flags:

--xml-proc-inst-prefix string   prefix for xml processing instructions (e.g. <?xml version="1"?>) (default "+p_")
--xml-directive-name string     name for xml directives (e.g. <!DOCTYPE thing cat>) (default "+directive")
--xml-skip-directives           skip over directives (e.g. <!DOCTYPE thing cat>)
--xml-skip-proc-inst            skip over process instructions (e.g. <?xml version="1"?>)

Note that I've also realised that there's a potential naming conflict with the current default attribute prefix of + (e.g. if you have an XML attribute named, 'content' then the attribute name will be '+content' which when encoding back to XML is ambiguous if it should be XML content, or an attribute named content)

From v4.30.x the new default attribute prefix will be +@ - hope that doesn't cause too much inconvenience. As part of the v.4.29 release I'll update the docs and CLI with a warning of the incoming change...

@mikefarah
Copy link
Owner

Fixed in 4.29.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants