Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace parquet-tools with parquet-avro #55

Merged
merged 1 commit into from
Feb 14, 2023

Conversation

sekikn
Copy link
Contributor

@sekikn sekikn commented Nov 28, 2022

The newer version of parquet-tools is not available via Maven Central as it was deprecated in Parquet 1.12.0 and removed from source in 1.12.3 (PARQUET-2020). This PR replaces it with another module so that we can catch up with the recent version of Parquet later.

The newer version of parquet-tools is not available via Maven Central
as it was deprecated in Parquet 1.12.0 and removed from source
in 1.12.3 (PARQUET-2020). This PR replaces it with another module
so that we can catch up with the recent version of Parquet later.
@civitaspo
Copy link
Owner

I'm so sorry for the sooooooo late response. I understand the background to change the 'parquet-tools' but I cannot understand the 'parquet-avro' dependency is required. Please explain more.

@sekikn
Copy link
Contributor Author

sekikn commented Jan 23, 2023

Thank you for the comment.

I cannot understand the 'parquet-avro' dependency is required. Please explain more.

Parquet itself doesn't provide native in-memory object model, except for the org.apache.parquet.example.data and org.apache.parquet.tools.read packages. The former is not for production use as its name suggests, and the latter is deprecated since v1.12.0 as described above.

So when handling Parquet records as Java objects programmatically, we have the following three options generally (of course, we can also define new object model ourselves, but I'd like to avoid it due to its maintenance effort):

I chose parquet-avro here simply because I'm more familiar with it than others. But if you prefer Protobuf or Thrift, I can revise the PR to use either of them instead of Avro.

@civitaspo
Copy link
Owner

I see. Thanks for the explanation. I prefer protobuf, but the current usecase is for tests, so I don't care.

Thank you for your contribution.

@civitaspo civitaspo merged commit 55c60de into civitaspo:master Feb 14, 2023
@sekikn sekikn deleted the replace-parquet-tools branch February 15, 2023 05:08
@civitaspo civitaspo mentioned this pull request Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants