-
-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I would like to write huge trees but don't retain the entire tree in memory #1031
Comments
We need to fully support the The pr is a good first attempt @skinkie but it needs some more work |
Doing a 3.4GB file using generators, takes ~12GB of memory to write using LxmlEventwriter. XmlEventWriter does absolutely not take any memory while writing to disk, and it does it in a streaming fashion. I think this must be investigated, especially if LxmlEventWriter is the default. I rewrote my whole project to split up stuff because I was under the impression I couldn't get it stored in memory. |
It's mentioned in a few places in the docs https://xsdata.readthedocs.io/en/latest/data_binding/xml_serializing/#alternative-writers For normal use cases, the lxml writer is always faster, 3.4GB xml is not very common 😄 |
@tefra it is mentioned that there are alternatives, but not the characteristics of the two. |
It will allow you to use generators without mypy errors, what else did you have in mind? |
@tefra the generators will resolve the memory consumption, not the peak afterwards. I think this is the difference between the LxmlEventwriter vs XmlEventWriter. |
Have you pin-pointed where the peak happens? Can you share your benchmark script? |
Ideally I would like to write out a tree where the data is added just in time. The proposal in #1030 has an increasing memory usage, which suggests that the tree is still being build completely in memory. I wanted to add some evidence. Please ignore the timing.
Using the generator method:
Materializing into a list first:
Ideally, I wish that the memory consumption wouldn't increase at all, and the data would just been written out as it would be provided. But I guess the graphs do give a clear view where we can make some improvements when writing out huge documents.
The text was updated successfully, but these errors were encountered: