Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Made reading of IPC dictionaries lazy #971

Merged
merged 1 commit into from
Apr 30, 2022
Merged

Made reading of IPC dictionaries lazy #971

merged 1 commit into from
Apr 30, 2022

Conversation

jorgecarleitao
Copy link
Owner

This PR moves the reading of IPC dictionaries to when they are needed.

If a file has dictionaries, currently we read them to memory when reading the files' metadata. This causes metadata reading quite expensive if dictionaries exist.

This PR makes reading of dictionaries happen when the first IPC record batch is requested (via .next or poll for sync and async respectively), which allows users that just want to read the metadata to not have to read dictionaries.

The other advantage of this approach is that it allows projection push down to be applied to dictionaries, where we skip dictionaries that are not needed (follow-up PR).

@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Apr 30, 2022
@codecov
Copy link

codecov bot commented Apr 30, 2022

Codecov Report

Merging #971 (224872f) into main (3a3e41b) will decrease coverage by 0.02%.
The diff coverage is 75.30%.

@@            Coverage Diff             @@
##             main     #971      +/-   ##
==========================================
- Coverage   71.53%   71.51%   -0.03%     
==========================================
  Files         355      355              
  Lines       19607    19626      +19     
==========================================
+ Hits        14026    14035       +9     
- Misses       5581     5591      +10     
Impacted Files Coverage Δ
src/io/flight/mod.rs 0.00% <0.00%> (ø)
src/io/ipc/read/file_async.rs 57.02% <73.33%> (-3.85%) ⬇️
src/io/ipc/read/reader.rs 75.79% <80.00%> (-1.29%) ⬇️
src/compute/arithmetics/time.rs 26.60% <0.00%> (+0.91%) ⬆️
src/bitmap/utils/slice_iterator.rs 87.93% <0.00%> (+1.72%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3a3e41b...224872f. Read the comment docs.

@jorgecarleitao jorgecarleitao force-pushed the lazy_dict branch 2 times, most recently from 48902a5 to 1f44563 Compare April 30, 2022 07:52
@jorgecarleitao jorgecarleitao merged commit 53d246d into main Apr 30, 2022
@jorgecarleitao jorgecarleitao deleted the lazy_dict branch April 30, 2022 10:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant