-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance #394
Comments
I work with XML files constantly and ran into this exact issue earlier this year as well. XML2 takes roughly a minute to extract data from a ~350kb-1.5mb xml file into a dataframe. For comparison I can process 600 files in the same amount of time by reading the file as a single column table with |
I use the paws package to work with S3, e.g. list objects in a bucket. As this took quite a lot of time I did some profiling and noticed most of the time is spend in parsing the XML response (it uses/used
as_list()
). I created a PR (paws-r/paws#621) that improves the performance quite a bit but is still really slow (like 90% of the time is spend in parsing).To further improve the performance without trying to use/abuse xpath further, it is probably easier to improve the performance of xml2 in general.
The text was updated successfully, but these errors were encountered: