-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing big XML files with "lxml.objectify.fromstring" returns an error #87
Comments
So I have found a workaround for this by first writing the xml to a file and then reading it into memory. This means that it doesn't have to have them both in memory at the same time. Is this an acceptable solution?
|
Hey @antonhagg, I think you are right, we need to do something to limit our resource requirements. I'll take a look at your code, thanks! |
Maybe we could try to create a StringIO object and , if we see that the file is really big, we write it to disk. Then we parse with |
Sounds like a good idea, won't have time to do anything until August. So if anyone else is up for the job, feel free. =) |
@antonhagg I had a go at it, will you please test to see if current code in master works for you now? |
Since "folder download" is not in the 0.5.1 release, I will have to add that first. |
@antonhagg The code has not been released yet. Are you able to install from git head? that is, with |
This is mainly related to #78 where an xml- file can grow quite big (in my case its around 500 mb and contains 779917 files and 90361 folders). But I guess this could happen otherwise too.
Anyway, there is an option to use a custom parser with the option "huge_tree" (http://stackoverflow.com/questions/11850345/using-python-lxml-etree-for-huge-xml-files). Would this be an option or is there another way of parsing large xml-files, for example in chunks?
The text was updated successfully, but these errors were encountered: