-
Notifications
You must be signed in to change notification settings - Fork 807
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Preserve dictionary encoding when decoding parquet into Arrow arrays,…
… 60x perf improvement (#171) (#1180) * Preserve dictionary encoding from parquet (#171) * Use OffsetBuffer::into_array for dictionary * Fix and test handling of empty dictionaries Don't panic if missing dictionary page * Use ArrayRef instead of Arc<ArrayData> * Update doc comments * Add integration test Tweak RecordReader buffering logic * Add benchmark * Set write batch size in parquet fuzz tests Fix bug in column writer with small page sizes * Fix test_dictionary_preservation * Add batch_size comment
- Loading branch information
Showing
14 changed files
with
1,615 additions
and
258 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.