Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): expose compression in write_deltalake and optimize #1812

Conversation

ion-elgreco
Copy link
Collaborator

Description

I have currently not exposed the compression level to either PyArrow.dataset.write_dataset or optimize. For now, I just aligned the default compression levels between PyArrow and Parquet-rs. Let me know if I should also add the compression levels as well.

Also need to wait on parquet crate v48 for the FromStr trait to work

Related Issue(s)

@github-actions github-actions bot added the binding/python Issues for the Python package label Nov 7, 2023
@djouallah
Copy link

does this fix this issue ?
#1772

@ion-elgreco
Copy link
Collaborator Author

@djouallah no that's a different topic

@ion-elgreco ion-elgreco marked this pull request as ready for review November 19, 2023 08:19
Copy link
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, as we go more and more into optimizing things, would it makes sense to expose parquet options as a dict or typed structure?

On the other hand, I was expecting something like that to be available in pyarrow, but could not find anything on a first glance.

@ion-elgreco
Copy link
Collaborator Author

Generally speaking, as we go more and more into optimizing things, would it makes sense to expose parquet options as a dict or typed structure?

On the other hand, I was expecting something like that to be available in pyarrow, but could not find anything on a first glance.

Yeah this was my plan to do in a refactor later because writer properties are used in merge, optimize and write, and not consistently.

Do you think it's better to do this in one go? Then I'll refactor those integrations in this pr

@roeap
Copy link
Collaborator

roeap commented Nov 19, 2023

Do you think it's better to do this in one go? Then I'll refactor those integrations in this pr

While we are still fairly liberal with evolving our APIs (especially in rust) - 😆 - if we agree that this is the right way forward I would vote for doing it now, rather then having a intermediate API that we know we will brake anyhow...

@ion-elgreco
Copy link
Collaborator Author

@roeap Ok, putting it back in draft then :P Will try to harmonize these apis somewhere this week

@ion-elgreco ion-elgreco marked this pull request as draft November 19, 2023 21:29
@ion-elgreco
Copy link
Collaborator Author

@roeap here is the feat to harmonize these apis #1980

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose compression options in optimize and write_deltalake
3 participants