Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed of JSON nested list reader #157

Closed
alamb opened this issue Apr 26, 2021 · 1 comment
Closed

Improve speed of JSON nested list reader #157

alamb opened this issue Apr 26, 2021 · 1 comment
Labels
arrow Changes to the arrow crate

Comments

@alamb
Copy link
Contributor

alamb commented Apr 26, 2021

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-11002

The code that reads in nested lists in rust/arrow/src/json/reader.rs does an extra copy (via Vec::clone) that caused 20% slowdown in a benchmark compared to not cloning.

The goal of this ticket would be to improve the performance of reading JSON in this case, likely by avoiding the clone

More details can be found here:

apache/arrow#8938 (review)

As [~nevi_me] says:
{quote}
I suspect the main perf loss is from having to peek into JSON values in order to make the nesting work.
By this, I mean that if we have {"a": [_, _, ]}, we extract a values into a Vec, i.e. [, _, _].
By extracting values, we are able to then use the reader to read &[Value] without caring about its key (a).
The downside of this approach is that we have to clone values to get Vec, as I couldn't find an alternative
{quote}

@alamb alamb added the arrow Changes to the arrow crate label Apr 26, 2021
@tustvold
Copy link
Contributor

Closed by #3479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

No branches or pull requests

2 participants