Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML to dictionary decoding #36

Open
dojeda opened this issue Feb 26, 2019 · 0 comments
Open

XML to dictionary decoding #36

dojeda opened this issue Feb 26, 2019 · 0 comments

Comments

@dojeda
Copy link

dojeda commented Feb 26, 2019

I would like to propose a change on the way XML data is decoded in pyxdf.

The current way that XML is decoded in pyxdf is to create defaultdict with a default factory constructor that creates a list. Attributes are not handled at all (which can be ok, but may restrict users) but the specification does not mention any restriction concerning the XML content.
More importantly, using a defaultdict with a list as a default creates an unnecessary deeper structure that has an important impact on how the 'info' header is handled. For example, my user code, and the pyxdf code has many of the following idiomatic examples:

self.nchns = int(xml['info']['channel_count'][0])
self.srate = round(float(xml['info']['nominal_srate'][0]))
logger.debug('  found stream ' + hdr['info']['name'][0])

In these examples, one has two systematically add [0] to get the element. This can get messy with deeper XML contents. For example, I have something like:

<info>
       <desc>
                <channels>
                        <channel>
                                <label>L1</label>
                        </channel>
                        <channel>
                                <label>L2</label>
                        </channel>
                        <channel>
                                <label>L3</label>
                        </channel>
                </channels>
        </desc>
</info>

To get the name of the first channel I need to do:

streams[1]['info']['desc'][0]['channels'][0]['channel'][0]

The main problem of converting XML to a dictionary is that this is a complicated task and it would ideally need a schema. Converters from XML to JSON exist, which can be useful here, but there is an ambiguity on the attributes because JSON does not support them. There are several "standard" conventions to solve this problem, with different benefits/disadvantages (see http://wiki.open311.org/JSON_and_XML_Conversion)

What I propose is to two ideas:

  • Let the user provide a function to convert XML to whatever format they choose. The user may have some knowledge on what to expect on their streams, but currently, they cannot handle the XML directly.
  • Change the current XML parsing in favor of another library that handles this well. XML parsing is hard and it would probably be sensible to let someone else do that. I propose to replace the current _load_xml function for one that adopts the Parker convention as follows:
from xmljson import parker  # https://github.com/sanand0/xmljson


def _load_xml(t):
    """Convert an attribute-less etree.Element into a dict."""
    return parker.data(t, preserve_root=True)

In this proposal, it would be easier to traverse the information dictionary. The complicated example mentioned above becomes:

streams[1]['info']['desc']['channels']['channel'][0] 

One point that needs discussion in my proposal is that the temporary per-stream data object (amongst other cases) would need to be updated; it would not need to index [0] everywhere to get the contents:

    class StreamData:
        """Temporary per-stream data."""
        def __init__(self, xml):
            """Init a new StreamData object from a stream header."""
            ...
            # number of channels
            self.nchns = int(xml['info']['channel_count'])
            # nominal sampling rate in Hz
            self.srate = round(float(xml['info']['nominal_srate']))
            # format string (int8, int16, int32, float32, double64, string)
            self.fmt = xml['info']['channel_format']

More importantly, if the user provides a function, it might become their responsibility to provide this information object, which complicates the usage. Maybe we could leave the current _load_xml but fill the "info" element of each stream with the provided user function (which defaults to the current one).

dojeda added a commit to dojeda/xdf that referenced this issue Feb 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant