-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Donation to Apache Arrow or Moving Build to Rust Parquet #10
Comments
The reason this repository exists is packaging thrift compiled files for parquet-format so we can bundle parquet-format source without requiring users to build the format themselves. The pain point was not just compiling parquet format but also compiling thrift - Rust code generation was still in unreleased branch. In order to reduce the complexity and improve user experience with the parquet crate we opted in for having a separate repository that would contain generated code. You have to take into account other repositories that are not published on crates.io. For example, I use this repository for my own projects, where I have been working on improvements to Parquet reads and writes. You technically don't need to keep track of parquet versions: changes are accumulative with each version and backward compatible. You could simply inline the code, so you would not deal with a separate repository. Every time you want to update the format, you just update the file, recompile the code, check in the code, and add a note to README saying that the current format version is 2.x.0. It used to be that the problem of arrow-cpp/parquet-cpp approach was pushing thrift gen step onto a user; not sure if this has changed. |
Like you mentioned above, IMO the versioning could be a problem since arrow enforce all crates to have a single version which conflicts with what we are using right now. I'm not sure how that works. That being said, do you have a list of issues that are caused because arrow is out of sync with this crate? This would be useful for reference purpose. I remember ARROW-6255 which is mainly because the semver used by arrow is incorrect. |
@nevi-me @andygrove for now I've added both of you to the repo so you can help merging PRs in case they got ignored. |
Thanks, that should help. Do we also need access to push new releases on crates.io? |
Yes good point @nevi-me . Also gave permission to you and @andygrove there. |
We found a resolution |
I would like to propose donation this repository and crate to Apache Arrow, mainly so that it can live next to
parquet-rs
.What I'm unsure of is how versioning could continue, as we have 2 options:
parquet-format
versions (it's also why I opened Format 2.7.0 #8 and Format 2.8.0 #9 ) separatelyparquet
andarrow
versions, with a note on the README as to whichparquet-format
version is supported. I don't know if cargo would allow yanking all the2.x.0
versions and replacing them with a1.0.0
(Arrow's likely next release in a few months).An alternative is to import
parquet.thrift
and start building it withinparquet
, effectively abandoning this crate. I looked at crates.io to see which other crates depend on this crate, andparquet
is the only user. An alternative parquet crate (amadeus-parquet) addedapache/parquet-format
as a git submodule, so it builds the format file internally.Arrow CPP also takes this alternative approach (apache/arrow@1a3b17b).
Any thoughts @sunchao @sadikovi @liurenjie1024 @andygrove @paddyhoran?
Relevant Apache Arrow JIRA: https://issues.apache.org/jira/browse/ARROW-6256
The text was updated successfully, but these errors were encountered: