Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Exposed parquet indexed page filtering to FileReader #1216

Merged
merged 5 commits into from
Aug 15, 2022
Merged

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Aug 8, 2022

This PR adds a new parameter to FileReader::new containing an option to perform page-level filter pushdown.

These filters reduce CPU-bounded work when deserializing.

Note that page-level filtering is not yet supported for nested types.

@codecov
Copy link

codecov bot commented Aug 9, 2022

Codecov Report

Merging #1216 (8070ff8) into main (2a12d17) will decrease coverage by 0.07%.
The diff coverage is 83.43%.

❗ Current head 8070ff8 differs from pull request most recent head a17d981. Consider uploading reports for the commit a17d981 to get more accurate results

@@            Coverage Diff             @@
##             main    #1216      +/-   ##
==========================================
- Coverage   83.21%   83.14%   -0.08%     
==========================================
  Files         358      358              
  Lines       37224    37471     +247     
==========================================
+ Hits        30976    31154     +178     
- Misses       6248     6317      +69     
Impacted Files Coverage Δ
src/io/parquet/read/mod.rs 100.00% <ø> (ø)
src/io/parquet/read/statistics/dictionary.rs 44.73% <ø> (ø)
src/io/parquet/read/indexes/primitive.rs 33.52% <28.57%> (-1.19%) ⬇️
src/io/parquet/read/indexes/mod.rs 81.12% <80.32%> (+3.70%) ⬆️
src/io/parquet/read/row_group.rs 96.55% <91.30%> (-3.45%) ⬇️
src/io/parquet/read/file.rs 92.37% <100.00%> (+1.74%) ⬆️
src/io/parquet/read/indexes/binary.rs 88.00% <100.00%> (ø)
src/io/parquet/read/indexes/boolean.rs 100.00% <100.00%> (ø)
src/io/parquet/read/indexes/fixed_len_binary.rs 100.00% <100.00%> (ø)
src/io/parquet/read/deserialize/binary/nested.rs 77.31% <0.00%> (-5.10%) ⬇️
... and 23 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@jorgecarleitao jorgecarleitao force-pushed the filter_index branch 3 times, most recently from 8070ff8 to b5f2067 Compare August 10, 2022 17:03
@jorgecarleitao jorgecarleitao merged commit 3b29c82 into main Aug 15, 2022
@jorgecarleitao jorgecarleitao deleted the filter_index branch August 15, 2022 19:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant