Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python write_deltalake fails if pyarrow table contains binary columns ending with 0x5c #1146

Closed
sfilimonov-exos opened this issue Feb 13, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@sfilimonov-exos
Copy link

Environment

Delta-rs version: 0.7.0

Environment:

  • OS: OS X or Ubuntu 22.04

Bug

What happened: write_deltalake fails when trying to add file to the transaction because it cannot encode some binary values.

Exception ignored in: 'pyarrow._dataset._filesystemdataset_write_visitor'
Traceback (most recent call last):
  File "/Users/abc/repos/study_python/venv/lib/python3.9/site-packages/deltalake/writer.py", line 229, in visitor
    json.dumps(stats, cls=DeltaJSONEncoder),
  File "/usr/local/homebrew/Cellar/python@3.9/3.9.16/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 234, in dumps
    return cls(
  File "/usr/local/homebrew/Cellar/python@3.9/3.9.16/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/homebrew/Cellar/python@3.9/3.9.16/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/Users/abc/repos/study_python/venv/lib/python3.9/site-packages/deltalake/writer.py", line 315, in default
    return obj.decode("unicode_escape")
UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 1: \ at end of string

How to reproduce it:

import pyarrow

from deltalake.writer import write_deltalake

value = b'\x00\\'
pa_table = pyarrow.Table.from_pydict({'field_one': [value]})

write_deltalake("some_table_1234", pa_table)
@sfilimonov-exos sfilimonov-exos added the bug Something isn't working label Feb 13, 2023
rbushri pushed a commit to rbushri/delta-rs that referenced this issue Feb 20, 2023
wjones127 pushed a commit to rbushri/delta-rs that referenced this issue Mar 6, 2023
wjones127 added a commit that referenced this issue Mar 6, 2023
…#1167)

# Description
Python write_deltalake fails if pyarrow table contains binary columns
ending with 0x5c

# Related Issue(s)
Python write_deltalake fails if pyarrow table contains binary columns
ending with 0x5c #1146

---------

Co-authored-by: rbushrian <rbushrian@akamai.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
chitralverma pushed a commit to chitralverma/delta-rs that referenced this issue Mar 17, 2023
…delta-io#1167)

# Description
Python write_deltalake fails if pyarrow table contains binary columns
ending with 0x5c

# Related Issue(s)
Python write_deltalake fails if pyarrow table contains binary columns
ending with 0x5c delta-io#1146

---------

Co-authored-by: rbushrian <rbushrian@akamai.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants