A better way to resize the buffer for the snappy encode/decode #6276
Labels
enhancement
Any new improvement worthy of a entry in the changelog
parquet
Changes to the parquet crate
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In my use case where reading data from a big parquet file on disk, the snappy decoding costs too much cpu resources in my flamegraph. And digging into the codebase, I find that
resize
is used to initialize the buffer for the decoding destination buffer, butresize
shows a bad performance when the expected size is large enough.Here is the code location about the
resize
:arrow-rs/parquet/src/compression.rs
Lines 211 to 226 in 25d39c1
And here is an example showing the bad performance of
resize
a vector:https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5321aaeda859ea9f664ae7952ade2fb6
And the example's output says:
It proves
resize
a vector to 10000 costs too much more time thanvec!
macro orset_len
.Describe the solution you'd like
In the example above,
set_len
shows a good performance, so I guessresize
can be replaced withset_len
if the capacity is enough:Although
unsafe
block is introduced here, it is actually safe considering:Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: