-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] Support using offset index in ParquetRecordBatchStream when pu… #3616
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3616 +/- ##
==========================================
- Coverage 86.03% 86.02% -0.01%
==========================================
Files 300 300
Lines 56253 56456 +203
==========================================
+ Hits 48395 48564 +169
- Misses 7858 7892 +34
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Thanks @Ted-Jiang. I think it may be better to just pass the option to |
Thanks for your advice! @thinkharderdev , IMOP, if we want to read
the reason why i put the index in If i am wrong or misunderstand plz correct me 😊? |
Correct, the advantage of what |
@thinkharderdev Thanks! If am right, we should make the read index api into
I prefer keep read page_index in
because it can reduce the code change , |
I think we may be talking about different things :). I'm saying the code to fetch the indexes already exists in arrow-rs so we don't need to duplicate the code in datafusion. You can just construct the
|
oh! i miss this part 😂 , using the order version arrow-rs |
…shing down RowFilter. Signed-off-by: yangjiang <yangjiang@ebay.com>
232429f
to
c849a30
Compare
@thinkharderdev Sorry for the mistake, i think its ready. |
This is modifying submodules, is this intentional? |
update the submodules for using the new added test parquet files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, only a minor suggestion.
Thank you for working on this, do you intend to work on hooking up the column index as well?
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
yes, i plan to do using |
Benchmark runs are scheduled for baseline = 451e441 and contender = 87faf86. 87faf86 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great -- thank you @Ted-Jiang and @tustvold
…shing down RowFilter.
Signed-off-by: yangjiang yangjiang@ebay.com
Which issue does this PR close?
Closes #3456
Rationale for this change
enable read
page index
inParquetScanOptions
If true, the reader will read pageIndex, If exit:
RowSelector
before read the file (like row-group pruning avoid I/O)RowSelector
.What changes are included in this PR?
update submodule
testing
andparquet-testing
Are there any user-facing changes?