Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): add schema conversion of FixedSizeBinaryArray and FixedSizeListType #2005

Merged
merged 6 commits into from
Jan 7, 2024

Conversation

balbok0
Copy link
Contributor

@balbok0 balbok0 commented Dec 31, 2023

Description

Map FixedSizeBinaryType to BinaryArray, since Delta does not support fixed arrays.

Related Issue(s)

None.

Documentation

N/A

Minimal Example

I've noticed this error when doing subsequent calls to like so:

import deltalake as dl
import pyarrow as pa

schema = pa.schema([
    ("field_a", pa.binary(4)),
    # To simulate fix, switch this line to: ("field_a", pa.binary()),
])
table = pa.Table.from_pylist(
    [
        {"field_a": val.to_bytes(4, "little")}
        for val in range(0, 100)
    ],
    schema=schema
)

# This works
dl.write_deltalake(
    "bad_table",
    data=table,
    mode="append",
)

# This fails
dl.write_deltalake(
    "bad_table",
    data=table,
    mode="append",
)

with error:

ValueError: Schema of data does not match table schema
Data schema:
field_a: fixed_size_binary[4]
Table Schema:
field_a: binary

@github-actions github-actions bot added the binding/python Issues for the Python package label Dec 31, 2023
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@balbok0 balbok0 changed the title Fix schema conversion of FixedSizeBinaryArray feat(python): Fix schema conversion of FixedSizeBinaryArray Dec 31, 2023
@balbok0 balbok0 changed the title feat(python): Fix schema conversion of FixedSizeBinaryArray fix(python): Fix schema conversion of FixedSizeBinaryArray Dec 31, 2023
@balbok0 balbok0 changed the title fix(python): Fix schema conversion of FixedSizeBinaryArray fix(python): fix schema conversion of FixedSizeBinaryArray Dec 31, 2023
@ion-elgreco
Copy link
Collaborator

@balbok0 can you extend the tests also to include this fixedSizedBinary

@balbok0
Copy link
Contributor Author

balbok0 commented Jan 1, 2024

I changed one test case to include a check for fixed binary. I think it should be enough (unit test-wise), since it's a really small change, but let me know if you would like more checks, just in case.

For more-integration like, is there a test case with repeated write to a tmp_path or existing_table that I can reference? I tried looking for it, but couldn't find any.

@balbok0
Copy link
Contributor Author

balbok0 commented Jan 1, 2024

Actually, I just found the same exact issue with pa.FixedSizeListType, so bundled a fix with this PR.

@balbok0 balbok0 changed the title fix(python): fix schema conversion of FixedSizeBinaryArray fix(python): fix schema conversion of FixedSizeBinaryArray and FixedSizeListType Jan 1, 2024
@ion-elgreco ion-elgreco changed the title fix(python): fix schema conversion of FixedSizeBinaryArray and FixedSizeListType feat(python): add schema conversion of FixedSizeBinaryArray and FixedSizeListType Jan 1, 2024
@ion-elgreco
Copy link
Collaborator

You need to set the type hints properly in the functions and also define the stubs

@balbok0
Copy link
Contributor Author

balbok0 commented Jan 3, 2024

Hi, sorry for being slow on resolving issues. I've fixed a typo on one of the tests I added. Just to be sure:

  • I've noticed a lot of currently existing tests failing/erroring. I assume these are not related?
  • Is it ok to skip the integration tests in this case?

@ion-elgreco
Copy link
Collaborator

@balbok0 no the tests are related, I suggest you execute it locally to debug. Decimal is hit as a false positive while doing the schema change, so the logic needs fixing

@balbok0
Copy link
Contributor Author

balbok0 commented Jan 7, 2024

Ok, switched it to type instead of isinstance, which will not include DecimalType from pyarrow. As far as I tested locally tests should pass.

Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! @balbok0

@ion-elgreco ion-elgreco merged commit a86cf66 into delta-io:main Jan 7, 2024
23 checks passed
r3stl355 pushed a commit to r3stl355/delta-rs that referenced this pull request Jan 10, 2024
…SizeListType (delta-io#2005)

# Description
Map `FixedSizeBinaryType` to `BinaryArray`, since Delta does not support
fixed arrays.

# Related Issue(s)
None.

# Documentation
N/A

# Minimal Example
I've noticed this error when doing subsequent calls to like so:
```
import deltalake as dl
import pyarrow as pa

schema = pa.schema([
    ("field_a", pa.binary(4)),
    # To simulate fix, switch this line to: ("field_a", pa.binary()),
])
table = pa.Table.from_pylist(
    [
        {"field_a": val.to_bytes(4, "little")}
        for val in range(0, 100)
    ],
    schema=schema
)

# This works
dl.write_deltalake(
    "bad_table",
    data=table,
    mode="append",
)

# This fails
dl.write_deltalake(
    "bad_table",
    data=table,
    mode="append",
)
```
with error:
```
ValueError: Schema of data does not match table schema
Data schema:
field_a: fixed_size_binary[4]
Table Schema:
field_a: binary
```

---------

Co-authored-by: Jakub Filipek <jakub@overland.ai>
Co-authored-by: Jakub Filipek <jfilipek@proton.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants