Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_delta merge issue: Generic DeltaTable error: Unable to convert expression to string #20597

Closed
2 tasks done
aldder opened this issue Jan 7, 2025 · 9 comments · Fixed by delta-io/delta-rs#3130
Closed
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@aldder
Copy link

aldder commented Jan 7, 2025

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
import deltalake

print(pl.__version__)
print(deltalake.__version__)


pl.DataFrame({
    'id': ['a', 'b', 'c'],
    'val': [4, 5, 6]
}).write_delta(
    'tmp',
    mode='overwrite'
)

print('create done')

pl.DataFrame({
    'id': ['a', 'b', 'c', 'd'],
    'val': [4.1, 5, 6.1, 7]
}).write_delta(
    'tmp',
    mode='merge',
    delta_merge_options={
        'predicate': 'tgt.id = src.id',
        'source_alias': 'src',
        'target_alias': 'tgt'
    }
).when_matched_update_all(
).when_not_matched_insert_all(
).execute()

print('merge done')

Log output

1.19.0
0.23.1
create done
Traceback (most recent call last):
  File "c:\Users\xxxxxx\test_pl_dl.py", line 31, in <module>
    ).execute()
      ^^^^^^^^^
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python312\Lib\site-packages\deltalake\table.py", line 1838, in execute
    metrics = self._table.merge_execute(self._builder)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_internal.DeltaError: Generic DeltaTable error: Unable to convert expression to string

Issue description

the error is self-speaking.
Also, I don't know if this is useful information but it seems to happen with every data type in the source dataframe (int, float, datetime, etc...)

Expected behavior

merge operation successfully completed

Installed versions

--------Version info---------
Polars:              1.19.0
Index type:          UInt32
Platform:            Windows-11-10.0.22631-SP0
Python:              3.12.8 (tags/v3.12.8:2dc476b, Dec  3 2024, 19:30:04) [MSC v.1942 64 bit (AMD64)]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               5.5.0
azure.identity       <not installed>
boto3                1.35.90
cloudpickle          3.1.0
connectorx           <not installed>
deltalake            0.23.1
fastexcel            <not installed>
fsspec               2024.12.0
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           3.10.0
nest_asyncio         1.6.0
numpy                2.2.1
openpyxl             3.1.5
pandas               2.2.3
pyarrow              18.1.0
pydantic             2.10.4
pyiceberg            <not installed>
sqlalchemy           2.0.36
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
None
@aldder aldder added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jan 7, 2025
@toby01234
Copy link

toby01234 commented Jan 8, 2025

Same isssue for me, just started happening today 2025-01-08, but it was because I upgraded my environment. I reverted environment and the merge works again. Working env is python3.11 with requirements.txt containing:

adlfs==2024.7.0
aiohappyeyeballs==2.3.5
aiohttp==3.10.3
aiosignal==1.3.1
annotated-types==0.7.0
anyio==4.6.2.post1
async-timeout==4.0.3
attrs==24.2.0
azure-core==1.30.2
azure-datalake-store==0.0.53
azure-identity==1.17.1
azure-storage-blob==12.21.0
azure-storage-file-datalake==12.16.0
boto3==1.34.151
botocore==1.34.151
build==1.2.1
certifi==2024.7.4
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
cryptography==43.0.0
deltalake==0.21.0
duckdb==1.0.0
fastapi==0.115.3
frozenlist==1.4.1
fsspec==2024.6.1
greenlet==3.0.3
h11==0.14.0
idna==3.7
iniconfig==2.0.0
isodate==0.6.1
Jinja2==3.1.4
jmespath==1.0.1
MarkupSafe==2.1.5
msal==1.30.0
msal-extensions==1.2.0
multidict==6.0.5
numpy==2.0.1
packaging==24.1
pandas==2.2.2
pip-tools==7.4.1
pluggy==1.5.0
polars==1.14.0
portalocker==2.10.1
pyarrow==17.0.0
pycparser==2.22
pydantic==2.9.2
pydantic_core==2.23.4
PyJWT==2.9.0
pyodbc==5.1.0
pyproject_hooks==1.1.0
pytest==8.1.1
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.1
requests==2.32.3
s3transfer==0.10.2
six==1.16.0
sniffio==1.3.1
SQLAlchemy==2.0.31
sqlparams==6.0.1
starlette==0.41.0
typing_extensions==4.12.2
tzdata==2024.1
urllib3==2.2.2
uvicorn==0.32.0
yarl==1.9.4

@ion-elgreco
Copy link
Contributor

This is likely related to the interaction with view types.

I will look into this over the weekend and make a PR for delta-rs.

In the mean time, you can use DeltaTable.merge and do df.to_arrow

@HectorPascual
Copy link

In our experience, the issue was with deltalake (delta-rs) version.

Polars 1.19.0 worked with deltalake version 0.22.3, but didn't work (with same error as above) with 0.23.0.

@toby01234
Copy link

toby01234 commented Jan 14, 2025

This is likely related to the interaction with view types.

I will look into this over the weekend and make a PR for delta-rs.

In the mean time, you can use DeltaTable.merge and do df.to_arrow

@ion-elgreco - Could you give a simple example of that proposed workaround for a merge? My current merge is like so:

df.write_delta(
    tgt_table_path
  , storage_options=storage_options
  , mode='merge'
  , delta_merge_options={
      "predicate"    : "s.TxnDate = t.TxnDate and s.fdTxnKey = t.fdTxnKey"
    , "source_alias" : "s"
    , "target_alias" : "t"
  }
  , delta_write_options={
        "partition_by" : 'TxnDate'
  },
).when_matched_update_all().when_not_matched_insert_all().execute()

I've tried @HectorPascual's combination of polars and deltalake versions and still get that generic decoding error.

Also, polars rocks.

@ion-elgreco
Copy link
Contributor

@Bidek56
Copy link
Contributor

Bidek56 commented Jan 14, 2025

This works fine.

pl.DataFrame({
    'id': ['a', 'b', 'c', 'd'],
    'val': [41, 51, 61, 71]
}).write_delta(
    tgt_table_path,
    mode='merge',
    delta_merge_options={
        "predicate": "src.id == tgt.id",
        "source_alias": "src",
        "target_alias": "tgt"
    }
).when_matched_update_all(
).when_not_matched_insert_all(
).when_not_matched_by_source_delete(
).execute()

@Bidek56
Copy link
Contributor

Bidek56 commented Jan 14, 2025

This works fine.

pl.DataFrame({
    'id': ['a', 'b', 'c', 'd'],
    'val': [41, 51, 61, 71]
}).write_delta(
    tgt_table_path,
    mode='merge',
    delta_merge_options={
        "predicate": "src.id = tgt.id",
        "source_alias": "src",
        "target_alias": "tgt"
    }
).when_not_matched_insert_all(
).when_matched_update_all(
).when_not_matched_by_source_update(updates = {"y": "0"}
).execute()

@ion-elgreco
Copy link
Contributor

Fix should be in the next release for deltalake :)

@aldder
Copy link
Author

aldder commented Jan 16, 2025

I can confirm it is working now, with deltalake==0.24.0 🎉

@aldder aldder closed this as completed Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants