Streaming elements not directly below the root #32

lpil · 2018-01-31T13:28:11Z

Hello! First, thank you for this library. I'm very happy with the performance improvement over xmerl.

I have a large XML document I wish to stream, it looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<tns:Envelope xmlns:tns="urn:Service:Integration">
  <tns:Body>
    <tns:RangeResponse>
      <tns:RangeResult>
        <tns:Detail> ...more content here... </tns:Detail>
        <tns:Detail> ...more content here... </tns:Detail>
        <tns:Detail> ...more content here... </tns:Detail>
        <tns:Detail> ...more content here... </tns:Detail>
        <tns:Detail> ...more content here... </tns:Detail>
        <tns:Detail> ...more content here... </tns:Detail>

        ... and so on...

      </tns:RangeResult>
    </tns:RangeResponse>
  </tns:Body>
</tns:Envelope>

I wish to process each Detail element in turn, getting messages like so:

{'$gen_event', {xmlstreamelement, {xmlel, "Detail", [], ...}}}

However when I stream this document by passing each line to fxml_stream:parse/2 I get a single message containing the Body element, meaning I am parsing and processing the Detail elements eagerly.

Is it possible to instruct fast_xml to use RangeResult as the root?

Thanks,
Louis

The text was updated successfully, but these errors were encountered:

prefiks · 2018-01-31T14:31:35Z

No, this is not configurable at this moment

lpil · 2018-01-31T14:49:49Z

Thank you

thbar · 2024-12-07T12:27:15Z

Hello! I didn't realise in full initially, but this means that for any slightly nested XML document, like the one below where the actual data sits under the dataObjects node, we are actually holding the whole node in memory (in that case, it can be > 100MB - I've replaced the lengthy content with a fake ManyElementsHere element):

<?xml version="1.0" encoding="UTF-8"?>
<PublicationDelivery xmlns="http://www.netex.org.uk/netex" version="1.04:FR1-NETEX-1.6-1.8">
  <PublicationTimestamp>2024-12-06T07:27:50Z</PublicationTimestamp>
  <ParticipantRef>FR1_OFFRE</ParticipantRef>
  <dataObjects>
    <GeneralFrame id="FR1:GeneralFrame:NETEX_CALENDRIER-20241206T072750Z:LOC" version="1.8" dataSourceRef="FR1-OFFRE_AUTO">
      <ManyElementsHere>
        <Element></Element>
        <Element></Element>
        <Element></Element>
        <Element></Element>
        <Element></Element>
        <Element></Element>
      </ManyElementsHere>
    </GeneralFrame>
  </dataObjects>
</PublicationDelivery>

In my current use case (XML files > 100MB) this ends up being closer to a DOM parsing than a stream parsing, and as a result it puts a hard limit on maximum file size I can process...

I wonder how complicated it would be to programmatically let the user decide (via a function) if a message must be emitted (based on a business logic)? (in some cases I'm happy with the 2/3 upper nodes, in some cases diving into some nodes, but not into others, is what will make the parsing scalable).

Thanks!

mremond added the enhancement label Jan 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming elements not directly below the root #32

Streaming elements not directly below the root #32

lpil commented Jan 31, 2018 •

edited

Loading

prefiks commented Jan 31, 2018

lpil commented Jan 31, 2018

thbar commented Dec 7, 2024 •

edited

Loading

Streaming elements not directly below the root #32

Streaming elements not directly below the root #32

Comments

lpil commented Jan 31, 2018 • edited Loading

prefiks commented Jan 31, 2018

lpil commented Jan 31, 2018

thbar commented Dec 7, 2024 • edited Loading

lpil commented Jan 31, 2018 •

edited

Loading

thbar commented Dec 7, 2024 •

edited

Loading