Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable column_page_reader read specific row ranges record #1792

Closed
Tracked by #1749
Ted-Jiang opened this issue Jun 5, 2022 · 5 comments
Closed
Tracked by #1749

Enable column_page_reader read specific row ranges record #1792

Ted-Jiang opened this issue Jun 5, 2022 · 5 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@Ted-Jiang
Copy link
Member

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
(This section helps Arrow developers understand the context and why for this feature, in addition to the what)

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@Ted-Jiang Ted-Jiang added the enhancement Any new improvement worthy of a entry in the changelog label Jun 5, 2022
@tustvold
Copy link
Contributor

tustvold commented Jun 6, 2022

I would recommend the approach I described #1791 (review) critically, a page does not know what constitutes a row. This highly non-trivial logic is handled in RecordReader, which will therefore need to drive the interpretation of any row-based filtering.

@Ted-Jiang Ted-Jiang changed the title Enable column_page_reader read specific row ranges record Add function for row alignment with page mask Jun 6, 2022
@Ted-Jiang Ted-Jiang changed the title Add function for row alignment with page mask Enable column_page_reader read specific row ranges record Jun 6, 2022
@Ted-Jiang
Copy link
Member Author

I would recommend the approach I described #1791 (review) critically, a page does not know what constitutes a row. This highly non-trivial logic is handled in RecordReader, which will therefore need to drive the interpretation of any row-based filtering.

Thanks ❤️

tustvold added a commit that referenced this issue Jul 7, 2022
* Stub API for parquet record skipping

* Update parquet/src/arrow/record_reader/mod.rs

Co-authored-by: Yang Jiang <jiangyang381@163.com>

* Remove empty google.protobuf.rs

* Replace todo with nyi_err

* Update doc comment

Co-authored-by: Yang Jiang <jiangyang381@163.com>
@jeffbski-sketch
Copy link

I was just wondering what the status and ETA of this item is. I see maybe there is a non-public partial implementation already in the code. What's left to do for this? Thanks!

@tustvold
Copy link
Contributor

I intend to polish up the remaining bits and pieces in time for the next release - #2382

@jeffbski-sketch
Copy link

That's fantastic! Thanks for all you do.

tustvold added a commit to tustvold/arrow-rs that referenced this issue Aug 16, 2022
tustvold added a commit that referenced this issue Aug 17, 2022
* Make filter APIs public (#1792)

* Update parquet/src/arrow/arrow_reader/mod.rs

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

3 participants