-
Notifications
You must be signed in to change notification settings - Fork 224
Fixed using lower limit than size of first parquet row group #1046
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR!
I agree that there is an issue here. I left a comment as I think we should avoid advancing the iterator on try_new
. Is there any way around that?
src/io/parquet/read/file.rs
Outdated
reader, | ||
schema, | ||
groups_filter, | ||
metadata.row_groups.clone(), | ||
chunk_size, | ||
limit, | ||
); | ||
let current_row_group = row_groups.next().transpose()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider something different here - this causes try_new
to be O(N)
since it advances the iterator.
Codecov Report
@@ Coverage Diff @@
## main #1046 +/- ##
==========================================
- Coverage 81.36% 81.29% -0.07%
==========================================
Files 360 363 +3
Lines 34386 34651 +265
==========================================
+ Hits 27978 28170 +192
- Misses 6408 6481 +73
Continue to review full report at Codecov.
|
@jorgecarleitao This alternative instead looks at the current_row_group, which if None should not update the amount of remaining rows as no rows will be read on that iteration! This would still work if given a empty row group, but most importantly its None on initialization for reading the first one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great solution to this! Thanks a lot again.
(minor fmt error - let me know if you would like me to fix it)
Fixed :) |
If the first row-group of parquet data had more rows than the limit, no data would be returned even if the chunk size was available in that limit and first row group