Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML Attributes and Element values #379

Closed
TobiasNx opened this issue Jul 8, 2021 · 9 comments · Fixed by #394 or #406
Closed

XML Attributes and Element values #379

TobiasNx opened this issue Jul 8, 2021 · 9 comments · Fixed by #394 or #406
Assignees

Comments

@TobiasNx
Copy link
Contributor

TobiasNx commented Jul 8, 2021

As the example in #377 (comment) also shows attributes and elements of XML are all reconstructed as subfields. There is no documentation on this. Also if again encoded in XML the "new" structure is kept and the attributes are only kept as subfields.

This specific handling of xml should be documented.
Also:
Is there any way to reconstruct this correctly or at least build XML with attributes in metafacture?

In:

<roleTerm authority="marcrelator" type="text">Author</roleTerm>

FLUX

infile
| open-file
| decode-xml
| handle-generic-xml
| encode-xml
| write(FLUX_DIR + "result.xml")
;

[Same if you use a morph with _elseNested]

Out:

<roleTerm>
    <authority>marcrelator</authority>
    <type>text</type>
    <value>Author</value>
</roleTerm>
@blackwinter
Copy link
Member

Probably the same underlying issue as #336 (part 2). XML attributes are decoded into literals, so they can't be distinguished from actual elements downstream.

The XML decoder and encoder would have to agree on a way to preserve this information (similar to what JSON decoder and encoder do for array fields). Maybe @<name> (which would require escaping in the morph) or <name>@ or something like that (and ideally configurable).

@TobiasNx
Copy link
Contributor Author

The usual convention is that the value in between the two tags of an XML-field is always transformed in the literal named value and the attributes become literals with the same name as the attributes. Attributes and value are combined in one entity.

The two problems here are:

  1. There is no documentation of this transformation of values and attributes. This sh
    ould be a quick fix.

  2. The xml-encoder can't reconstruct the "old" structure. It should understand - at least optional - that elements called "value" should be the field values and the attributes are all the other fields in on entity. @blackwinter isn't this the convention you are looking for? Catmandu does something similar but using the fieldname content instead of value.

<roleTerm authority="marcrelator" type="text">Author</roleTerm>

->

role Term:
   authority: marcrelator
   type: text
   value: Author

Some transformation changes the value: Author to Creator and the value of authority to greatVocab:

->


role Term:
   authority: greatVocab
   type: text
   value: Creator

The encoder then should be able to transform to:

<roleTerm authority="greatVocab" type="text">Creator</roleTerm>

@blackwinter
Copy link
Member

The xml-encoder can't reconstruct the "old" structure. It should understand - at least optional - that elements called "value" should be the field values and the attributes are all the other fields in on entity.

Yes, I guess this implicit attribute handling should be possible as well: Treat all literals as attributes, except those named value (ideally, the "value" literal name would be configurable).

But SimpleXmlEncoder already has the concept of an attribute marker (~), it's just that GenericXmlHandler doesn't emit it (and it's hard-coded).

@blackwinter
Copy link
Member

This also mainly (only?) applies to streams produced by GenericXmlHandler. Not sure about the other XML handlers. And non-XML input streams that are to be encoded as XML output.

@dr0i
Copy link
Member

dr0i commented Sep 28, 2021

Reopened and assigned @TobiasNx for functional review.

@TobiasNx
Copy link
Contributor Author

TobiasNx commented Oct 7, 2021

Also there seems to be an sever (?)API break with the new handling of attributes and values if not setting any option at all!!!
The value tags are lost by default and it seems that some kind of other handling is different now too:

TobiasNx/notWorkingFlux@9fdffea?branch=9fdffea8fdc4dc7a8bc23ec4d8843690d978d33e&diff=split

Shouldn't be the default settings stay the same.

My initial request that there needs to be documentation about the handling of xml in metafacture and that it decodes/handles them as "fields" is still needed.

Also I did not see documentation on the attributeMarker but the testcases. Do I miss this?

@TobiasNx TobiasNx assigned blackwinter and dr0i and unassigned TobiasNx Oct 7, 2021
@blackwinter
Copy link
Member

Shouldn't be the default settings stay the same.

Yes indeed, this is a side effect of d6e68ff. @dr0i: Was this intentional? I certainly missed it :( (Initially SimpleXmlEncoder.DEFAULT_VALUE_TAG = "", now DefaultXmlPipe.DEFAULT_VALUE_TAG = "value")

TobiasNx added a commit that referenced this issue Oct 7, 2021
This sets the default value tag in the encoder to its former behaviour.
Reverts one change from: d6e68ff
@TobiasNx
Copy link
Contributor Author

TobiasNx commented Oct 7, 2021

Pascal and I teamed up and fixed this:

#406

@blackwinter blackwinter assigned TobiasNx and unassigned blackwinter Oct 7, 2021
@katauber katauber linked a pull request Oct 12, 2021 that will close this issue
@katauber
Copy link
Member

Closed with #406

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment