Skip to content

Commit

Permalink
Python write_deltalake fails if pyarrow table contains binary columns (
Browse files Browse the repository at this point in the history
…#1167)

# Description
Python write_deltalake fails if pyarrow table contains binary columns
ending with 0x5c

# Related Issue(s)
Python write_deltalake fails if pyarrow table contains binary columns
ending with 0x5c #1146

---------

Co-authored-by: rbushrian <rbushrian@akamai.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
  • Loading branch information
3 people authored Mar 6, 2023
1 parent 860e272 commit 901292c
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 1 deletion.
2 changes: 1 addition & 1 deletion python/deltalake/writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,7 @@ def __enforce_append_only(
class DeltaJSONEncoder(json.JSONEncoder):
def default(self, obj: Any) -> Any:
if isinstance(obj, bytes):
return obj.decode("unicode_escape")
return obj.decode("unicode_escape", "backslashreplace")
elif isinstance(obj, date):
return obj.isoformat()
elif isinstance(obj, datetime):
Expand Down
10 changes: 10 additions & 0 deletions python/tests/test_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -725,3 +725,13 @@ def test_partition_overwrite_with_wrong_partition(
mode="overwrite",
partition_filters=[("p999", "=", "1")],
)


def test_handles_binary_data(tmp_path: pathlib.Path):
value = b"\x00\\"
table = pa.Table.from_pydict({"field_one": [value]})
write_deltalake(tmp_path, table)

dt = DeltaTable(tmp_path)
out = dt.to_pyarrow_table()
assert table == out

0 comments on commit 901292c

Please sign in to comment.