Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python write_deltalake fails if pyarrow table contains binary columns #1167

Merged
merged 3 commits into from
Mar 6, 2023

Conversation

rbushri
Copy link
Contributor

@rbushri rbushri commented Feb 20, 2023

Description

Python write_deltalake fails if pyarrow table contains binary columns ending with 0x5c

Related Issue(s)

Python write_deltalake fails if pyarrow table contains binary columns ending with 0x5c #1146

@github-actions github-actions bot added the binding/python Issues for the Python package label Feb 20, 2023
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I tested the same data in PySpark, and they don't write stats. So we could also disable statistics collection for binary columns in the first place. I suppose not many people are using inequality filters on binary data. But this change should be sufficient for now.

python/deltalake/writer.py Outdated Show resolved Hide resolved
@wjones127
Copy link
Collaborator

Also, could you add a quick unit test based on your example in python/tests/test_writer.py?

@wjones127
Copy link
Collaborator

@rbushri are you wiling to add that unit test?

@wjones127 wjones127 self-requested a review March 6, 2023 05:23
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this @rbushri!

@wjones127 wjones127 merged commit 901292c into delta-io:main Mar 6, 2023
chitralverma pushed a commit to chitralverma/delta-rs that referenced this pull request Mar 17, 2023
…delta-io#1167)

# Description
Python write_deltalake fails if pyarrow table contains binary columns
ending with 0x5c

# Related Issue(s)
Python write_deltalake fails if pyarrow table contains binary columns
ending with 0x5c delta-io#1146

---------

Co-authored-by: rbushrian <rbushrian@akamai.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants