Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance #394

Open
mgirlich opened this issue Jun 1, 2023 · 1 comment
Open

Improve performance #394

mgirlich opened this issue Jun 1, 2023 · 1 comment

Comments

@mgirlich
Copy link
Contributor

mgirlich commented Jun 1, 2023

I use the paws package to work with S3, e.g. list objects in a bucket. As this took quite a lot of time I did some profiling and noticed most of the time is spend in parsing the XML response (it uses/used as_list()). I created a PR (paws-r/paws#621) that improves the performance quite a bit but is still really slow (like 90% of the time is spend in parsing).
To further improve the performance without trying to use/abuse xpath further, it is probably easier to improve the performance of xml2 in general.

@hadley hadley added the upkeep maintenance, infrastructure, and similar label Oct 30, 2023
@D3SL
Copy link

D3SL commented Nov 8, 2023

I work with XML files constantly and ran into this exact issue earlier this year as well. XML2 takes roughly a minute to extract data from a ~350kb-1.5mb xml file into a dataframe. For comparison I can process 600 files in the same amount of time by reading the file as a single column table with fread(), reformatting each row with stringr, flattening the table to a JSON string, converting it to a json and then back to a table, and then going through a series of unnest_wider and unnest_longer operations to populate parent data to child nodes.

@hadley hadley removed the upkeep maintenance, infrastructure, and similar label Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants