Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

does arrow support parquet column index feathure? #12851

Closed
FANNG1 opened this issue Apr 11, 2022 · 2 comments
Closed

does arrow support parquet column index feathure? #12851

FANNG1 opened this issue Apr 11, 2022 · 2 comments

Comments

@FANNG1
Copy link

FANNG1 commented Apr 11, 2022

parquet has columnIndex to support page skiping (https://github.com/apache/parquet-format/blob/master/PageIndex.md), does arrow support it , and if not, any plan to support it?

@westonpace
Copy link
Member

Regarding the C++ implementation (and by extension the python, R, and Ruby extensions): parquet-C++, the parquet library that is part of (and used by) arrow-c++, does have some support for serializing and deserializing these structures.

However, Arrow's readers and writers for parquet do not (to the best of my knowledge) support using these indices for filter pushdown and do not have support for writing indices.

Arrow is an open source project and so "any plan to support it" usually boils down to whether there is someone motivated enough with enough time to tackle the feature. It is something I think would be a great addition.

Adding the feature to the C++ implementation is tracked in PARQUET-1404 and ARROW-10158 There was an attempt to implement this referenced by those JIRA tickets but, unfortunately, it appears that work may have been abandoned. There is a related mailing list discussion here.

Adding the feature to the Rust implementation is tracked here apache/datafusion#847

@FANNG1
Copy link
Author

FANNG1 commented Apr 13, 2022

we're doing technical investigation about parquet and surprised to see arrow had support it,thanks for your detailed and useful reply。

@FANNG1 FANNG1 closed this as completed Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants