I would like to write huge trees but don't retain the entire tree in memory #1031

skinkie · 2024-05-07T11:34:45Z

Ideally I would like to write out a tree where the data is added just in time. The proposal in #1030 has an increasing memory usage, which suggests that the tree is still being build completely in memory. I wanted to add some evidence. Please ignore the timing.

Using the generator method:

Materializing into a list first:

Ideally, I wish that the memory consumption wouldn't increase at all, and the data would just been written out as it would be provided. But I guess the graphs do give a clear view where we can make some improvements when writing out huge documents.

tefra · 2024-05-07T16:32:45Z

We need to fully support the Iterable type annotation for infinite generators in the data models, and the serializers.

The pr is a good first attempt @skinkie but it needs some more work

skinkie · 2024-05-10T23:11:55Z

Doing a 3.4GB file using generators, takes ~12GB of memory to write using LxmlEventwriter. XmlEventWriter does absolutely not take any memory while writing to disk, and it does it in a streaming fashion. I think this must be investigated, especially if LxmlEventWriter is the default. I rewrote my whole project to split up stuff because I was under the impression I couldn't get it stored in memory.

tefra · 2024-05-11T06:36:39Z

It's mentioned in a few places in the docs

https://xsdata.readthedocs.io/en/latest/data_binding/xml_serializing/#alternative-writers
https://xsdata.readthedocs.io/en/latest/api/formats/dataclass/serializers/writers/lxml/#xsdata.formats.dataclass.serializers.writers.lxml.LxmlEventWriter
https://xsdata.readthedocs.io/en/latest/api/formats/dataclass/serializers/writers/native/#xsdata.formats.dataclass.serializers.writers.native.XmlEventWriter

For normal use cases, the lxml writer is always faster, 3.4GB xml is not very common 😄

skinkie · 2024-05-11T08:48:31Z

@tefra it is mentioned that there are alternatives, but not the characteristics of the two.

tefra · 2024-10-20T16:11:53Z

Hey @skinkie give this pr #1082 a try.

skinkie · 2024-10-20T16:24:50Z

Hey @skinkie give this pr #1082 a try.

Is there a reason that #1082 would address specifically this issue? Obviously gonna test the Iterable stuff ;-)

tefra · 2024-10-20T17:19:30Z

It will allow you to use generators without mypy errors, what else did you have in mind?

skinkie · 2024-10-20T18:56:39Z

@tefra the generators will resolve the memory consumption, not the peak afterwards. I think this is the difference between the LxmlEventwriter vs XmlEventWriter.

tefra · 2024-10-21T16:55:10Z

Have you pin-pointed where the peak happens? Can you share your benchmark script?

tefra added the enhancement New feature or request label May 7, 2024

tefra mentioned this issue Oct 20, 2024

Add cli config to use generic collections #1082

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I would like to write huge trees but don't retain the entire tree in memory #1031

I would like to write huge trees but don't retain the entire tree in memory #1031

skinkie commented May 7, 2024 •

edited

Loading

tefra commented May 7, 2024

skinkie commented May 10, 2024 •

edited

Loading

tefra commented May 11, 2024

skinkie commented May 11, 2024

tefra commented Oct 20, 2024

skinkie commented Oct 20, 2024

tefra commented Oct 20, 2024

skinkie commented Oct 20, 2024

tefra commented Oct 21, 2024

I would like to write huge trees but don't retain the entire tree in memory #1031

I would like to write huge trees but don't retain the entire tree in memory #1031

Comments

skinkie commented May 7, 2024 • edited Loading

tefra commented May 7, 2024

skinkie commented May 10, 2024 • edited Loading

tefra commented May 11, 2024

skinkie commented May 11, 2024

tefra commented Oct 20, 2024

skinkie commented Oct 20, 2024

tefra commented Oct 20, 2024

skinkie commented Oct 20, 2024

tefra commented Oct 21, 2024

skinkie commented May 7, 2024 •

edited

Loading

skinkie commented May 10, 2024 •

edited

Loading