Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would it be possible to get the attributes of the root element returned in the dict as well? #319

Open
mathijssh opened this issue Feb 2, 2023 · 2 comments

Comments

@mathijssh
Copy link

mathijssh commented Feb 2, 2023

I'm currently parsing an XML document in the format of

<root_element>
   <node id=1>
      ...
   </node>
   <node id=2>
      ...
   <node>
   ...
</root_element>

stream-parsing it in doc-prescribed format of

def handle_node(event, element):
    ...

I'm interested in all the nodes' data, including their attributes. When parsing this with xmltodict.parse(item_depth=2,item_callback=handle_event ...) I could not find the id attribute in the element dict. After some "debugging" I found that attribute hidden in the event tuple. Is this intended/desired behavior? I personally would prefer to have those attributes as keys in the element dict. Curious to hear any thoughts :)

@mpf82
Copy link

mpf82 commented Feb 2, 2023

If the item would contain the parent's (root) attributes as keys, then this would break if any node have the same name as a parent's attribute.

import xmltodict

xml = """
<root_element>
   <node id="1"><id x="foo"/></node>
   <node id="2"><id x="bar"/></node>
</root_element>
""".strip()

def handle_item(path, item):
    print("handle_item", path, "/", item)
    return True

xmltodict.parse(xml, item_depth=2, item_callback=handle_item)

Output:

handle_item [('root_element', None), ('node', {'id': '1'})] / {'id': {'@x': 'foo'}}
handle_item [('root_element', None), ('node', {'id': '2'})] / {'id': {'@x': 'bar'}}

Let's look at the first print handle_item [('root_element', None), ('node', {'id': '1'})] / {'id': {'@x': 'foo'}}

Should {'id': {'@x': 'foo'}} be {'id': 1} instead?

Accessing the last item in the root tuple should be easy enough and it will not break things.

@mathijssh
Copy link
Author

Thank you, indeed makes sense to prevent such a collision! And as long as the data of the attributes remains available to the user via the path tuple then it's great. Would it be an idea to point users to this in the docs?

Thanks again for the swift and clear reponse!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants