-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): expose compression in write_deltalake and optimize #1812
feat(python): expose compression in write_deltalake and optimize #1812
Conversation
does this fix this issue ? |
@djouallah no that's a different topic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, as we go more and more into optimizing things, would it makes sense to expose parquet options as a dict or typed structure?
On the other hand, I was expecting something like that to be available in pyarrow, but could not find anything on a first glance.
Yeah this was my plan to do in a refactor later because writer properties are used in merge, optimize and write, and not consistently. Do you think it's better to do this in one go? Then I'll refactor those integrations in this pr |
While we are still fairly liberal with evolving our APIs (especially in rust) - 😆 - if we agree that this is the right way forward I would vote for doing it now, rather then having a intermediate API that we know we will brake anyhow... |
@roeap Ok, putting it back in draft then :P Will try to harmonize these apis somewhere this week |
Description
I have currently not exposed the compression level to either PyArrow.dataset.write_dataset or optimize. For now, I just aligned the default compression levels between PyArrow and Parquet-rs. Let me know if I should also add the compression levels as well.
Also need to wait on parquet crate v48 for the FromStr trait to work
Related Issue(s)