Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for prefixed names in writing xml nodes #438

Open
eblondel opened this issue Jan 28, 2024 · 9 comments
Open

Support for prefixed names in writing xml nodes #438

eblondel opened this issue Jan 28, 2024 · 9 comments

Comments

@eblondel
Copy link

eblondel commented Jan 28, 2024

Dear all, i'm in the process to migrate from XML to xml2 a series of packages i'm maintaining. Although for some dealing with single namespace, the task is relatively easy (i started with geosapi, i'm now looking at packages where XML schemas include various namespaces, such as atom4R (as starter) to then move towards more complex ones like geometa.

In this context, the practice is to use prefixed element names instead of element names made of local parts only coupled with repeating xmlns attributes.

I didn't find the way to do that in xml2, although it's possible with XML with xmlOutputDOM / nameSpace argument. I tried with xml2::xml_set_namespace but it doesn't have any effect.

Is it a missing feature? If not, could you give me an example.

Thanks a lot
Emmanuel

@eblondel
Copy link
Author

Ok, I think I've actually figured out. I was induced in error by the the output of the print default method associated to xml2 nodes, that does not show the complete qname (prefix + local element name) when we use the xml2::xml_set_namespace, but is set then when coercing to character, or when saving the XML to a file.

It could be good to make the the prefixed name revealed through the print method.

Reproducible example:

require(xml2)
require(atom4R)
rootNamespaces <- sapply(atom4R::getAtomNamespaces(), function(x){x$getDefinition()})
names(rootNamespaces) = paste0("xmlns:", names(rootNamespaces))
rootXML = do.call("xml_new_root", c("entry", rootNamespaces))

#--> OUTPUT as print
#{xml_document}
#<entry xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

#--> OUTPUT as character
as(rootXML, "character")
#[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<entry xmlns:atom=\"http://www.w3.org/2005/Atom\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:dcterms=\"http://purl.org/dc/terms/\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"/>\n"

#set node namespace
rootXML = xml2::xml_set_namespace(rootXML, "atom")

#--> OUTPUT as print (doesn't show the prefixed name)
#{xml_document}
#<entry xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

#-> OUTPUT as character
as(rootXML, "character")
#[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<atom:entry xmlns:atom=\"http://www.w3.org/2005/Atom\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:dcterms=\"http://purl.org/dc/terms/\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"/>\n"

#validation

xsd <- atom4R::getAtomSchemas()
xml2::xml_validate(rootXML, schema = xsd)

@eblondel
Copy link
Author

The above xml2::xml_set_namespace seems to be needed for all dependent nodes (needed to make it valid). However this method requires to have the xml namespaces set already. See example:

#root node with prefix name
rootNamespaces <- sapply(atom4R::getAtomNamespaces(), function(x){x$getDefinition()})
names(rootNamespaces) = paste0("xmlns:", names(rootNamespaces))
rootXML = do.call("xml_new_root", c("entry", rootNamespaces))
rootXML = xml2::xml_set_namespace(rootXML, "atom")

#add child node with prefix name
wrapperNode = do.call("xml_new_root", c("id", list("xmlns:atom" = "http://www.w3.org/2005/Atom")))
wrapperNode = xml2::xml_set_namespace(wrapperNode, prefix = "atom")
xml2::xml_text(wrapperNode) <- as.character("my id")
rootXML %>% xml2::xml_add_child(wrapperNode)

The method xml2::xml_set_namespace is then the equivalent in XML package when one is building an xmlOutputDOM setting the nameSpace prefix. However in xml2 it has a side effect: since it requires to have the namespace defined on the node, it will propagate the namespace attribute to all nodes, making the xml more verbose and handling redundant namespace URIs (redundant since it's already defined with the prefix). See the output:

<?xml version="1.0" encoding="UTF-8"?>
<atom:entry xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <atom:id xmlns:atom="http://www.w3.org/2005/Atom">my id</atom:id>
</atom:entry>

I would like to produce then this output:

<?xml version="1.0" encoding="UTF-8"?>
<atom:entry xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <atom:id>my id</atom:id>
</atom:entry>

With the XML package this can be defined with the xmlOutputDOM nsURI argument. If declared, the namespace URI attribute will be set to the node. If we don't put it, while handling the nameSpace, we can then add nodes defined with prefixed names, without adding the redundant xmlns URI.

It seems that xml2::xml_set_namespace tightly couples prefix and uri handling for the node, while with XML this is decoupled with nameSpace and nsURI args inxmlOutputDOM

What would be the method with xml2 to avoid adding redundant xmlns namespace attributes?

@eblondel
Copy link
Author

eblondel commented Feb 1, 2024

I see that apparently prefixing manually the child element tag works. However i'd like to to know if there is a better practice with xml2 @jeroen would you have an advice for this? Thanks

@jeroen
Copy link
Member

jeroen commented Feb 2, 2024

You're asking a lot of things, it is difficult to follow :) Could you try to reduce the question to the most minimal example code/xml of what you're trying to accomplish ?

@eblondel
Copy link
Author

eblondel commented Feb 2, 2024

In xml2, you have xml2::xml_set_namespace to set the xml node ns. By adding this we build a complete xml qname made of prefix + local name; but this automatically set the full namespace definition. It is fine for the root, but not for child elements, for which we indeed want to the prefix, but we don't need to repeat the xmlns definition.

<?xml version="1.0" encoding="UTF-8"?>
<atom:entry xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <atom:id xmlns:atom="http://www.w3.org/2005/Atom">my id</atom:id>
</atom:entry>

I would like to produce then this output:

<?xml version="1.0" encoding="UTF-8"?>
<atom:entry xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <atom:id>my id</atom:id>
</atom:entry>

In XML we have this flexibility, by using nameSpace and nsURI arguments.
I don't find the equivalent in XML.

In other word we should be able to skip adding the xmlns attribute for child elements. As long we use the element prefix, we only one declaration of xmlns in the xml root.

@jeroen
Copy link
Member

jeroen commented Feb 2, 2024

Does the last section in this vignette maybe help? https://cran.r-project.org/web/packages/xml2/vignettes/modification.html

@eblondel
Copy link
Author

eblondel commented Feb 6, 2024

Thanks @jeroen it helps for simple fields (Cf. the SLD example in the vignette). For the simple elements it does skip the xmlns attribute, but when I have complex elements, thos are generated as xml node before adding them as child of the root. In that way, I end up with with this kind of output (see the xml comments inline):

<?xml version="1.0" encoding="UTF-8"?>
<atom:entry xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<!-- These are simple elements - xmlns attribute is skipped well (generated in the same as the xml2 sld example -->
  <atom:id>my-atom-entry</atom:id>
  <atom:updated>2024-02-06T19:11:01</atom:updated>
  <atom:title type="text">My Atom feed entry</atom:title>
  <atom:summary type="text">My Atom feed entry very comprehensive abstract</atom:summary>
<!-- These are complex elements, generated as xml node (setting namespace, to inherit prefix), adding them as child xmlns attribute is not skipped -->
  <atom:author xmlns:atom="http://www.w3.org/2005/Atom">
    <atom:name>John Doe</atom:name>
    <atom:uri>http://www.atomxml.com/johndoe</atom:uri>
    <atom:email>johndoe@atom4R.com</atom:email>
  </atom:author>
  <atom:author xmlns:atom="http://www.w3.org/2005/Atom">
    <atom:name>John Doe's sister</atom:name>
    <atom:uri>http://www.atomxml.com/johndoesister</atom:uri>
    <atom:email>johndoesister@atom4R.com</atom:email>
  </atom:author>
  <atom:contributor xmlns:atom="http://www.w3.org/2005/Atom">
    <atom:name>Contrib1</atom:name>
    <atom:uri>http://www.atomxml.com/contrib1</atom:uri>
    <atom:email>contrib1@atom4R.com</atom:email>
  </atom:contributor>
  <atom:contributor xmlns:atom="http://www.w3.org/2005/Atom">
    <atom:name>Contrib2</atom:name>
    <atom:uri>http://www.atomxml.com/contrib2</atom:uri>
    <atom:email>contrib2@atom4R.com</atom:email>
  </atom:contributor>
</atom:entry>

Somehow, when adding xml node as child, there should be a way (even internally) to avoid getting all xmlns definition when element has namespace prefix. With XML, I can control that, because the namespace setter (with nameSpace) and the handling or not of xmlns attribute (with nsURI) are decoupled.

On this example, it is a relatively flat XML, but in ISO 19115 and geometa, we want to avoid repeating the namespace URIs everywhere in the DOM.

Thanks in advance

@eblondel
Copy link
Author

Hello @jeroen any hint to avoid progapating namespace definitions in XML child nodes (added with xml2::xml_add_child) when namespace definitions are handled at root level only? or is it we miss some feature in xml2?

@jeroen
Copy link
Member

jeroen commented Mar 27, 2024

I'm sorry I don't know. Maybe you can ask Ivan Krylov ikrylov@disroot.org, the new maintainer of XML.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants