Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Donation to Apache Arrow or Moving Build to Rust Parquet #10

Closed
nevi-me opened this issue Jan 27, 2020 · 6 comments
Closed

Comments

@nevi-me
Copy link
Collaborator

nevi-me commented Jan 27, 2020

I would like to propose donation this repository and crate to Apache Arrow, mainly so that it can live next to parquet-rs.

What I'm unsure of is how versioning could continue, as we have 2 options:

  1. Continue with tracking the parquet-format versions (it's also why I opened Format 2.7.0 #8 and Format 2.8.0 #9 ) separately
  2. Track parquet and arrow versions, with a note on the README as to which parquet-format version is supported. I don't know if cargo would allow yanking all the 2.x.0 versions and replacing them with a 1.0.0 (Arrow's likely next release in a few months).

An alternative is to import parquet.thrift and start building it within parquet, effectively abandoning this crate. I looked at crates.io to see which other crates depend on this crate, and parquet is the only user. An alternative parquet crate (amadeus-parquet) added apache/parquet-format as a git submodule, so it builds the format file internally.

Arrow CPP also takes this alternative approach (apache/arrow@1a3b17b).

Any thoughts @sunchao @sadikovi @liurenjie1024 @andygrove @paddyhoran?

Relevant Apache Arrow JIRA: https://issues.apache.org/jira/browse/ARROW-6256

@sadikovi
Copy link
Collaborator

The reason this repository exists is packaging thrift compiled files for parquet-format so we can bundle parquet-format source without requiring users to build the format themselves. The pain point was not just compiling parquet format but also compiling thrift - Rust code generation was still in unreleased branch. In order to reduce the complexity and improve user experience with the parquet crate we opted in for having a separate repository that would contain generated code.

You have to take into account other repositories that are not published on crates.io. For example, I use this repository for my own projects, where I have been working on improvements to Parquet reads and writes.

You technically don't need to keep track of parquet versions: changes are accumulative with each version and backward compatible. You could simply inline the code, so you would not deal with a separate repository. Every time you want to update the format, you just update the file, recompile the code, check in the code, and add a note to README saying that the current format version is 2.x.0.

It used to be that the problem of arrow-cpp/parquet-cpp approach was pushing thrift gen step onto a user; not sure if this has changed.

@sunchao
Copy link
Owner

sunchao commented Jan 27, 2020

Like you mentioned above, IMO the versioning could be a problem since arrow enforce all crates to have a single version which conflicts with what we are using right now. I'm not sure how that works.

That being said, do you have a list of issues that are caused because arrow is out of sync with this crate? This would be useful for reference purpose. I remember ARROW-6255 which is mainly because the semver used by arrow is incorrect.

@sunchao
Copy link
Owner

sunchao commented Jan 4, 2021

@nevi-me @andygrove for now I've added both of you to the repo so you can help merging PRs in case they got ignored.

@nevi-me
Copy link
Collaborator Author

nevi-me commented Jan 5, 2021

Thanks, that should help. Do we also need access to push new releases on crates.io?

@sunchao
Copy link
Owner

sunchao commented Jan 5, 2021

Yes good point @nevi-me . Also gave permission to you and @andygrove there.

@nevi-me
Copy link
Collaborator Author

nevi-me commented Jan 8, 2021

We found a resolution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants