-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why Parquet
is a part of Arrow
?
#1715
Comments
Parquet
is a part of `Apache Arrow?Parquet
is a part of Arrow
?
@alamb Look forward to your opinion. |
My understanding is that the parquet project is a separate top level ASF project. https://projects.apache.org/committee.html?arrow https://projects.apache.org/committee.html?parquet
Yes that would be fine -- right now they are in the same repo as the same people maintain them and it lowers the maintenance burden to have them in the same repo. I would personally not be opposed to separating them I think it is a similar setup to the C++ implementation https://github.com/apache/arrow/tree/master/cpp which has arrow and parquet in the same foramt
I don't disagree -- the reason I am helping with both is that we need both in our project. Also, I think fast conversion between |
@alamb And i have a question, if parquet-rs is only schema compatible or full functional support as parquet-mr? |
We intend to be fully functionally compatible with parquet-mr, please do file feature requests if you find any areas where we aren't. I think we're mostly there, aside from page index support. As for performance comparison, I would be disappointed if we aren't significantly faster reading to arrow, but I do not have any benchmarks to verify this. Again I would be very interested in areas where we are slower. FWIW reading to arrow is likely faster than the row level APIs, especially for byte arrays where columnar decoding makes a huge difference I have not spent much time optimising the write path, and there is likely a lot of low hanging fruit. As for why this is part of arrow, the best argument I can give is that most users want an interface to parquet that is performant and easy to understand. Arrow provides this. Whilst some users may want to integrate at a lower level, and deal with all the complexities of Dremel, encodings, etc... most users will want this to be handled for them |
We've been using I think that #1718 will blow |
I think the question has been answered so closing, feel free to reopen if I'm mistaken |
Which part is this question about
I find this description in the README of parquet crate:
Describe your question
Apache/parquet-rs
?https://github.com/jorgecarleitao/parquet2
Additional context
No.
The text was updated successfully, but these errors were encountered: