Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XSData cant handle elements order inside a repeating choice #296

Closed
hcw70 opened this issue Oct 15, 2020 · 11 comments
Closed

XSData cant handle elements order inside a repeating choice #296

hcw70 opened this issue Oct 15, 2020 · 11 comments
Labels
enhancement New feature or request

Comments

@hcw70
Copy link

hcw70 commented Oct 15, 2020

Similair to #262

XSD:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
    xmlns="http://test.de/order" targetNamespace="http://test.de/order">

    <xs:element name="order" type="TOrder"/>


    <xs:complexType name="TOrder">
        <xs:sequence minOccurs="0" maxOccurs="unbounded">
            <xs:choice>
                <xs:element name="elem1" type="TElem1"/>
                <xs:element name="elem1base" type="TElem1Base"/>
                <xs:element name="elem2" type="TElem2"/>
                <xs:element name="elem2base" type="TElem2Base"/>
            </xs:choice>
        </xs:sequence>
    </xs:complexType>

    <xs:complexType name="TElem1Base">

    </xs:complexType>
    <xs:complexType name="TElem1">
        <xs:complexContent>
            <xs:extension base="TElem1Base"></xs:extension>
        </xs:complexContent>

    </xs:complexType>

    <xs:complexType name="TElem2Base"></xs:complexType>
    <xs:complexType name="TElem2">
        <xs:complexContent>
            <xs:extension base="TElem2Base"></xs:extension>
        </xs:complexContent>
    </xs:complexType>
</xs:schema>

Instance:

<?xml version="1.0" encoding="UTF-8"?>
<order xmlns="http://test.de/order"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://test.de/order file:/home/hcw/tmp/xsdata-order/order.xsd">
  <elem1/>
  <elem2/>
  <elem1base/>
  <elem1/>
  <elem2/>
  <elem2base/>
</order>

gives:

("<?xml version='1.0' encoding='UTF-8'?>\n"
 '<ns0:order xmlns:ns0="http://test.de/order">\n'
 '  <ns0:elem1/>\n'
 '  <ns0:elem1base/>\n'
 '  <ns0:elem2/>\n'
 '  <ns0:elem2base/>\n'
 '  <ns0:elem1/>\n'
 '  <ns0:elem2/>\n'
 '</ns0:order>\n')

@tefra
Copy link
Owner

tefra commented Oct 15, 2020

Both instances the original and xsdata's output pass validation from lxml and xmlschema and a couple more online tools.

But I won't deny something is funky here:

  1. The empty complexTypes are generated as simple str fields elem2base: List[str]

Should it be a wildcard? I don't know yet, I haven't seen that before, maybe I need to check the w3c test suite for something similar, homework

2.Your instance is completely out of order but it passes validation.

An unbounded sequence of choice with maxOccurs=1 which is the default it's still a sequence of elements.
xsdata's output seems to be the more accurate. Even if the above statement is wrong and we practically
don't have a sequence of elements the correct output would be

<ns0:order xmlns:ns0="http://test.de/order">
  <ns0:elem1/>
  <ns0:elem1/>
  <ns0:elem1base/>
  <ns0:elem2/>
  <ns0:elem2/>
  <ns0:elem2base/>
</ns0:order>

But it passes the validation, which means the order of the fields in this case doesn't matter...

Now I am super curious what library gave you that output @hcw70,

@tefra
Copy link
Owner

tefra commented Oct 15, 2020

I used xjc from java to cheat a bit

 */
@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "TOrder", propOrder = {
    "elem1OrElem1BaseOrElem2"
})
public class TOrder {

    @XmlElements({
        @XmlElement(name = "elem1", type = TElem1 .class),
        @XmlElement(name = "elem1base", type = TElem1Base.class),
        @XmlElement(name = "elem2", type = TElem2 .class),
        @XmlElement(name = "elem2base", type = TElem2Base.class)
    })
    protected List<Object> elem1OrElem1BaseOrElem2;
}

How do we convert that to dataclasses 😳 It has wrapped the whole sequence as a list of objects that can accept those 4 elements.

The closest thing in xsdata implementations is the wildcard

content: List[object] = field(
    default_factory=list,
    metadata=dict(
        type="Wildcard",
        namespace="##any",
        min_occurs=0,
        max_occurs=9223372036854775807,
    ),
)

@tefra tefra changed the title XSData cant handle document order of elements with base types XSData cant handle elements order inside an unbounded sequence of choice Oct 15, 2020
@tefra
Copy link
Owner

tefra commented Oct 15, 2020

One strategy would be to actually convert it to a wildcard and finally start working on trying to match defined classes in the target namespace (I am postponing this for along time)

edit

Or a wildcard with a predefined set of accepted classes like xjc

content: List[object] = field(
    default_factory=list,
    metadata=dict(
        type="Wildcard",
        elements=set(xxx, yyy, zzz),
        namespace="##any",
        min_occurs=0,
        max_occurs=9223372036854775807,
    ),
)

@tefra tefra added the enhancement New feature or request label Oct 15, 2020
@tefra tefra changed the title XSData cant handle elements order inside an unbounded sequence of choice XSData cant handle elements order inside an unbounded choice Oct 16, 2020
@hcw70
Copy link
Author

hcw70 commented Oct 19, 2020

The actual problem i am having is that the order of elements is not preserved during iterating in the memory representation.
Our XSD definition relies on several places on the order of elements.

I have simplified the type structure towards this example, so having empty complex types may look weird.
Our actual XSD types are not empty btw, but the type structure is similar to my given example.

If i omit the intermediate complex types (see #262) it works well (after your fixed it 8-> ).

@tefra
Copy link
Owner

tefra commented Oct 19, 2020

Sequences were an easy fix, here we want to group fields with mixed types in the same list with different qualified names.

I have a prototype solution ready but I am facing various issues the most hard to crack is having multiple elements that refer to the same type, there is a need for a Generic wrapper with the tag name.

    dress_size_or_medium_dress_size_or_small_dress_size_or_smlx_size_or_xsmlx_size: List[Union[XsmlxsizeType, str, SmlxsizeType]] = field(
        default_factory=list,
        metadata={
            "name": "",
            "type": "Elements",
            "choices": (
                {
                    "name": "dressSize",
                    "type": Type[str],
                    "min_inclusive": "2",
                    "max_inclusive": "18",
                    "pattern": r"\d{1,2}",
                },
                {
                    "name": "mediumDressSize",
                    "type": Type[str],
                    "min_inclusive": "8",
                    "max_inclusive": "12",
                    "pattern": r"\d{1,2}",
                },
                {
                    "name": "smallDressSize",
                    "type": Type[str],
                    "min_inclusive": "2",
                    "max_inclusive": "6",
                    "pattern": r"\d{1}",
                },
                {
                    "name": "smlxSize",
                    "type": Type[SmlxsizeType],
                },
<sizes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:noNamespaceSchemaLocation="chapter09.xsd">
  <dressSize>06</dressSize>
  <mediumDressSize>12</mediumDressSize>
  <smallDressSize>6</smallDressSize>
  <smlxSize>extra
  large</smlxSize>
  <xsmlxSize>extra    small</xsmlxSize>
</sizes>

The resulting model, is quite ugly and the necessary logic in the parser/serialization quite complex, I am also looking at smarter solutions to maintain the order somehow but so far I haven't come up with a solution that doesn't break the no1 rule of xsdata, which is to generate models with no dependencies.

I am working on it 😄 , it will take some time as this is affected by some old design decisions.

@hcw70
Copy link
Author

hcw70 commented Oct 19, 2020

What about storing the document order in a special attribute ("documentOrder"), then iterate over a sort by that attribute on output?

Or keeping the elements in a linear list upon parsing?

@tefra tefra changed the title XSData cant handle elements order inside an unbounded choice XSData cant handle elements order inside a repeating choice Oct 25, 2020
@tefra
Copy link
Owner

tefra commented Oct 25, 2020

Hey @hcw70

I introduced a new compound field type Elements to handle repeating xs:choice. Give it a try #299

By default the new field type is disabled, use the cli option --compound-fields true to enabled it.

Let me know what you think.

@tefra tefra closed this as completed in 6afe163 Oct 27, 2020
@hcw70
Copy link
Author

hcw70 commented Oct 29, 2020

Hmm. Now had time to try this out.

The approach looks ok to lump different stuff into a single property looks ok.

What however is puzzling me is that the XSD-well defined names like "datatypes" / "packets" / "networks" are no longer present
but instead i have a single "datatypesOrpacketsOrnetworks" property which has nothing to do with my xsd anymore.

So from that i would conclude that one needs acutally both approaches:

  • Iterate over the sub-element stuff in-order (which is fullfilled by iterating over "datatypesOrpacketsOrnetworks".value)
  • have direct access to the first datatype (which may then be via datatypes[0]) (which is no longer possible).

The name of the syntetic attribute "datatypesOrpacketsOrnetworks" does not matter much (i would have called it maybe
orderedContent or content) but it may also be an "xsdata internal stuff" which is only used if i use an xsdata function to iterate through a subtree.

So maybe a "magic" attribute which is called datatype[] and returns the first instance of type "TDataType" from datatypesOrpacketsOrnetworks when called via datatype[0] ?

@tefra
Copy link
Owner

tefra commented Oct 29, 2020

I want xsdata and particularly the binding models to remain as simple as possible, simple dataclasses with strict types and some metadata with no dependencies, not even xsdata itself. A lot of inspiration comes from jaxb.

That's why I had to go with the compound field solution, (list). We can't have them both I am afraid, it's either order or separate fields, jaxbreference

Since it's a list of mixed type objects, we needed a wrapper to maintain the association between element and value, because in a repeating choice more than one element can have the same type.

<xs:choice maxOccurs="10">
   <xs:element name="a" type="xs:string" />
   <xs:element name="b" type="xs:string" />
</xs:choice>

I still want to work on that aspect and use the wrapper only when it's really necessary and not every time, that's an improvement already in progress. So in your case if your choice elements all have unique types you wont have to deal with the wrapper. #307

About the name yes the generator will try to concatenate them if the number of choices is up to three a_or_b_or_c, if there are more it will use the name choice, you can also try to set an alias in your config

names = []
choices = []
min_occurs = []
max_occurs = []
for attr in attrs:
target.attrs.remove(attr)
names.append(attr.local_name)
min_occurs.append(attr.restrictions.min_occurs)
max_occurs.append(attr.restrictions.max_occurs)
choices.append(self.build_attr_choice(attr))
name = "choice" if len(names) > 3 else "_Or_".join(names)
target.attrs.insert(

About the "magic" attribute I am afraid is out of the question it doesn't fit the philosophy of simple binding dataclasses.

@hcw70
Copy link
Author

hcw70 commented Oct 30, 2020

Ok, agreed, but maybe we can agree to have the name "choice" (or similar) per default (or make an config option for it) so i can inject my "magic" attribute and always stick to the same container attribute name to lookup my type?

With current solution to concat several names means that i get api changes if i add another option to an existing choice, which IMHO is not necessary, since from the XSD that would mean an upward compatible change.

@tefra
Copy link
Owner

tefra commented Oct 31, 2020

The name will default to choice if the number of elements is more than 3, otherwise with it will stick to the a_or_b_or_c pattern. It's what jaxb is doing as well and I kinda like it, in a few use cases I examined it made sense, the whole repeating choice elements as design.

You can always add an alias in your config, (Note to self maybe the aliases should be case insensitive)

<Config xmlns="http://pypi.org/project/xsdata" />
   ...
    <Aliases>
        <FieldName source="name1_Or_name2_Or_name3" target="content" />

If the api changes it's normal to expect the models to change as well.

Personally I try not to add logic in domain models, it's easier to maintain integrations and with mypy inspections you can always quickly detect issues or breaking changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants