XML elements out of order after serializing #213

ansFourtyTwo · 2020-07-13T15:00:06Z

After creating DataClasses for the ReqIF schema with

xsdata https://www.omg.org/spec/ReqIF/20110401/reqif.xsd --package reqif.models --ns-struct

I tried to parse an example ReqIf file and serialize back to XML. It seems that internally elements are stored withing lists and serializing does not preserve the correct order of the elements.

This snippet is from the original document:

<reqif-xhtml:div>
<reqif-xhtml:strong>Titel</reqif-xhtml:strong>: 				Some title<reqif-xhtml:br/>
<reqif-xhtml:strong>Bearbeiter</reqif-xhtml:strong>: 			Some name<reqif-xhtml:br/>
...
</reqif-xhtml:div>

This is the corresponding section from the serialized document

<ns1:div xml:space="preserve">
<ns1:br xml:space="preserve"/><ns1:br xml:space="preserve"/><ns1:br xml:space="preserve"/><ns1:br xml:space="preserve"/><ns1:br xml:space="preserve"/><ns1:br xml:space="preserve"/><ns1:br xml:space="preserve"/><ns1:br xml:space="preserve"/><ns1:br xml:space="preserve"/>
<ns1:strong xml:space="preserve">Titel</ns1:strong>: 				Some title
<ns1:strong xml:space="preserve">Bearbeiter</ns1:strong>: 			Some name
...
</ns1:div>

This is the code I used for this back and forth parsing and serializing:

from pathlib import Path

from xsdata.formats.dataclass.parsers.xml import XmlParser
from xsdata.formats.dataclass.serializers.xml import XmlSerializer
from xsdata.formats.dataclass.parsers.config import ParserConfig
from reqif.models.omg_org_spec_req_if_20110401_reqif import ReqIf

config = ParserConfig(fail_on_unknown_properties=True)
parser = XmlParser(config=config)
reqif = parser.from_path(Path('./KuFu_Testfunktion_00270cf8.reqif'), ReqIf)

serializer = XmlSerializer(pretty_print=True, encoding='ascii')

with open('./KuFu_Testfunktion_out.reqif.reqif', 'w') as f:
    f.write(serializer.render(reqif))

The text was updated successfully, but these errors were encountered:

ansFourtyTwo · 2020-07-13T15:01:37Z

Another issue is that the namespace xml was never declared. Not in the original document, neither in the serialized document. Thus, xml:space="preserve" is not recognized as valid attribute. But this might be a seperate issue.

ansFourtyTwo · 2020-07-13T15:02:21Z

If you need a real *.reqif file for testing, let me know and I'll prepare one. The one I use contains some sensitive data.

tefra · 2020-07-13T15:18:50Z

yes please a sample xml would be great, it seems like something with the sequential flag is not generating properly maybe

ansFourtyTwo · 2020-07-13T15:34:26Z

Here is an example of such a *.reqif file:
example.reqif.txt

Please note, that I added the ".txt" extension, as otherwise Github wouldn't have let me upload the file.

tefra · 2020-07-13T17:13:27Z

Just for sanity reasons, you can grab the namespaces registry from the parser and pass it to the serializer.render method in order to use the same prefixes, easier to compare.

parser = XmlParser()
obj = parser.from_path(Path("example.reqif.txt"), ReqIf)

serializer = XmlSerializer(pretty_print=True, encoding="ascii")
output = serializer.render(obj, parser.namespaces)

The xml namespace I think it's implied and lxml is making sure it's omitted, I have to check with lxml, but I see the ns map being prepared correctly before passing it to lxml
All those xml:space="preserve" attributes are fixed value attributes and the serializer renders them anyway no matter what namespace they belong to, hrm maybe we could change that in the future through an option but anyway it's not important.
For the ordering let's take a look at how for example the DATATYPES is defined

      <xsd:element maxOccurs="1" minOccurs="0" name="DATATYPES">
        <xsd:complexType>
          <xsd:choice maxOccurs="unbounded" minOccurs="0">
            <xsd:element name="DATATYPE-DEFINITION-BOOLEAN" type="REQIF:DATATYPE-DEFINITION-BOOLEAN"/>
            <xsd:element name="DATATYPE-DEFINITION-DATE" type="REQIF:DATATYPE-DEFINITION-DATE"/>
            <xsd:element name="DATATYPE-DEFINITION-ENUMERATION" type="REQIF:DATATYPE-DEFINITION-ENUMERATION"/>
            <xsd:element name="DATATYPE-DEFINITION-INTEGER" type="REQIF:DATATYPE-DEFINITION-INTEGER"/>
            <xsd:element name="DATATYPE-DEFINITION-REAL" type="REQIF:DATATYPE-DEFINITION-REAL"/>
            <xsd:element name="DATATYPE-DEFINITION-STRING" type="REQIF:DATATYPE-DEFINITION-STRING"/>
            <xsd:element name="DATATYPE-DEFINITION-XHTML" type="REQIF:DATATYPE-DEFINITION-XHTML"/>
          </xsd:choice>
        </xsd:complexType>
      </xsd:element>

Any of the DATATYPE-DEFINITION-BOOLEAN, DATATYPE-DEFINITION-DATE, DATATYPE-DEFINITION-ENUMERATION etc etc can appear from 0 to unlimited times inside the DATATYPES element. The order is not restricted through a <xs:sequence> this implies that the order is not important, that's why the serializer is using the order of the fields as they are defined in the schema to build the xml tree.

Also since these fields can appear more than once they are generated as lists elements. The serializer is going through the values of each list before moving to the next field.

In a nutshell that's the normal behavior, I also run the output through xsd validation and passes without issues

parser = XmlParser()
obj = parser.from_path(Path("example.reqif.txt"), ReqIf)
serializer = XmlSerializer(pretty_print=True, encoding="ascii")
tree = serializer.render_tree(obj, parser.namespaces)
xmlschema_doc = etree.parse("schemas/reqif.xsd")
xmlschema = etree.XMLSchema(xmlschema_doc)
xmlschema.assertValid(tree)

Are you seeing any specific errors from the server side that's consuming these xml files?

ansFourtyTwo · 2020-07-14T08:08:18Z

@tefra
Thank you for the hint with passing parser.namespaces toserializer.render(). Makes the document more clear.

Regarding ordering, I think we talk of two different things. For the ordering of DATATYPES elements, I agree that ordering is not important.

The snippets I posted in the initial issue description however refer to XHTML elements, i.e. <div>, <strong> and <br/> elements. So the result is a (X)HTML formatted text. So ordering is important there and otherwise results in a differently formatted output.

The serialized may still be valid, but the result when opening the document looks different.

tefra · 2020-07-14T19:36:13Z

Oh from the sample this part right?

<xhtml:b>Titel</xhtml:b> : Test Titel  <xhtml:br/>
 <xhtml:b>Bearbeiter</xhtml:b> : Test Bearbeiter  <xhtml:br/>
 <xhtml:b>Abt./OE</xhtml:b> : Test Abteilung  <xhtml:br/>
 <xhtml:b>Telefon</xhtml:b> : Test Telefon  <xhtml:br/>
 <xhtml:b>E-Mail:</xhtml:b>  some.body@example.com  <xhtml:br/>
 <xhtml:b>Erstausgabe:</xhtml:b>  09.03.2020  <xhtml:br/>
 <xhtml:b>Datum &#196;nderungsstand</xhtml:b> : 09.03.2020  <xhtml:br/>
 <xhtml:b>&#196;nderungsstand</xhtml:b> : V1.0

in xsdata output all the br elements are rendered first....

     <ns1:br xml:space="preserve"/>
                  <ns1:br xml:space="preserve"/>
                  <ns1:br xml:space="preserve"/>
                  <ns1:br xml:space="preserve"/>
                  <ns1:br xml:space="preserve"/>
                  <ns1:br xml:space="preserve"/>
                  <ns1:br xml:space="preserve"/>
                  <ns1:b xml:space="preserve">Titel</ns1:b>: Test Titel
                  <ns1:b xml:space="preserve">Bearbeiter</ns1:b>: Test Bearbeiter
                  <ns1:b xml:space="preserve">Abt./OE</ns1:b>: Test Abteilung
                  <ns1:b xml:space="preserve">Telefon</ns1:b>: Test Telefon
                  <ns1:b xml:space="preserve">E-Mail:</ns1:b>some.body@example.com
                  <ns1:b xml:space="preserve">Erstausgabe:</ns1:b>09.03.2020
                  <ns1:b xml:space="preserve">Datum &#196;nderungsstand</ns1:b>: 09.03.2020
                  <ns1:b xml:space="preserve">&#196;nderungsstand</ns1:b>: V1.0

ansFourtyTwo · 2020-07-14T19:58:55Z

Yes, this is the part of concern.

tefra · 2020-07-18T17:37:30Z

The issue was that when a class has a mixed content field and a field that matches exactly an element qualified name, the exact match always had higher priority, which shouldn't happen.

I refactored a lot of the mixed content handling, you will need to re-generate your models because a new metadata key was introduced with name:mixed.

The sample you provided is now rendered correctly, give it a try from master and let me know if it works as expected in other use cases as well.

              <THE-VALUE>
                <xhtml:p xml:space="preserve"><xhtml:b xml:space="preserve">Titel</xhtml:b>
                  : Test Titel
                  <xhtml:br xml:space="preserve"/>
                  <xhtml:b xml:space="preserve">Bearbeiter</xhtml:b>: Test Bearbeiter
                  <xhtml:br xml:space="preserve"/>
                  <xhtml:b xml:space="preserve">Abt./OE</xhtml:b>: Test Abteilung
                  <xhtml:br xml:space="preserve"/>
                  <xhtml:b xml:space="preserve">Telefon</xhtml:b>: Test Telefon
                  <xhtml:br xml:space="preserve"/>
                  <xhtml:b xml:space="preserve">E-Mail:</xhtml:b>some.body@example.com
                  <xhtml:br xml:space="preserve"/>
                  <xhtml:b xml:space="preserve">Erstausgabe:</xhtml:b>09.03.2020
                  <xhtml:br xml:space="preserve"/>
                  <xhtml:b xml:space="preserve">Datum &#196;nderungsstand</xhtml:b>:
                  09.03.2020
                  <xhtml:br xml:space="preserve"/>
                  <xhtml:b xml:space="preserve">&#196;nderungsstand</xhtml:b>: V1.0</xhtml:p>
              </THE-VALUE>

Thank you for reporting @ansFourtyTwo, this issue helped to improve mixed content handling a lot!

ansFourtyTwo · 2020-07-21T08:13:54Z

Hi @tefra

The file I currently work with no looks good. Thank you very much. Once again, you are doing a great job. I decided to use xsdata for one of my projects now as we do a lot of XML parsing stuff. I hope at some point, I can dig deeper into your code and do some coding on my own somewhen.

All the best.

tefra · 2020-07-21T08:39:09Z

Thank you @ansFourtyTwo ,

This library is something I wanted to create since the original suds soap client stopped being maintained, It gives me great pleasure to see other people use it!

tefra added bug Something isn't working mixed content labels Jul 14, 2020

tefra closed this as completed in a0c4efe Jul 18, 2020

ansFourtyTwo mentioned this issue Aug 28, 2020

Incomplete XML serialization #238

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XML elements out of order after serializing #213

XML elements out of order after serializing #213

ansFourtyTwo commented Jul 13, 2020 •

edited

Loading

ansFourtyTwo commented Jul 13, 2020

ansFourtyTwo commented Jul 13, 2020

tefra commented Jul 13, 2020 •

edited

Loading

ansFourtyTwo commented Jul 13, 2020 •

edited

Loading

tefra commented Jul 13, 2020

ansFourtyTwo commented Jul 14, 2020 •

edited

Loading

tefra commented Jul 14, 2020

ansFourtyTwo commented Jul 14, 2020 via email •

edited

Loading

tefra commented Jul 18, 2020 •

edited

Loading

ansFourtyTwo commented Jul 21, 2020 •

edited

Loading

tefra commented Jul 21, 2020

XML elements out of order after serializing #213

XML elements out of order after serializing #213

Comments

ansFourtyTwo commented Jul 13, 2020 • edited Loading

ansFourtyTwo commented Jul 13, 2020

ansFourtyTwo commented Jul 13, 2020

tefra commented Jul 13, 2020 • edited Loading

ansFourtyTwo commented Jul 13, 2020 • edited Loading

tefra commented Jul 13, 2020

ansFourtyTwo commented Jul 14, 2020 • edited Loading

tefra commented Jul 14, 2020

ansFourtyTwo commented Jul 14, 2020 via email • edited Loading

tefra commented Jul 18, 2020 • edited Loading

ansFourtyTwo commented Jul 21, 2020 • edited Loading

tefra commented Jul 21, 2020

ansFourtyTwo commented Jul 13, 2020 •

edited

Loading

tefra commented Jul 13, 2020 •

edited

Loading

ansFourtyTwo commented Jul 13, 2020 •

edited

Loading

ansFourtyTwo commented Jul 14, 2020 •

edited

Loading

ansFourtyTwo commented Jul 14, 2020 via email •

edited

Loading

tefra commented Jul 18, 2020 •

edited

Loading

ansFourtyTwo commented Jul 21, 2020 •

edited

Loading