Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overwrite creates new file #960

Closed
0xdarkman opened this issue Nov 27, 2022 · 2 comments
Closed

Overwrite creates new file #960

0xdarkman opened this issue Nov 27, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@0xdarkman
Copy link

0xdarkman commented Nov 27, 2022

version: delta v0.6.3
environment: cloud, azure

mode overwrite used.
no partition by option used

I see new file being created each time after write:
1-...
2-...
...
6-...

I would expect same file being overwritten so I would expect to see:

1-...

only.

import pyarrow as pa
from deltalake.writer import write_deltalake

storage_options = { "AZURE_STORAGE_ACCOUNT_NAME": account_name, "AZURE_STORAGE_ACCOUNT_KEY": account_key, }

table_path = "abfss://CONTAINERNAME@STORAGEACCOUNT.dfs.core.windows.net/TABLE_NAME"

dt = DeltaTable(table_path, storage_options=storage_options)

tb = pa.Table.from_pandas(df, preserve_index=False)

write_deltalake(table_or_uri=dt, data=tb, mode="overwrite")

@0xdarkman 0xdarkman added the bug Something isn't working label Nov 27, 2022
@0xdarkman 0xdarkman changed the title delta v0.6.3 I do not use partitionBy when I write delta table. I do use mode overwrite. Although, I see new file being created each time after write. 1-... 2-... ... 6-... I would expect same file being overwritten. import pyarrow as pa from deltalake.writer import write_deltalake storage_options = { "AZURE_STORAGE_ACCOUNT_NAME": account_name, "AZURE_STORAGE_ACCOUNT_KEY": account_key, } table_path = "abfss://CONTAINERNAME@STORAGEACCOUNT.dfs.core.windows.net/TABLE_NAME" dt = DeltaTable(table_path, storage_options=storage_options) write_deltalake(table_or_uri=dt, df=df, mode="overwrite") tb = pa.Table.from_pandas(df, preserve_index=False) write_deltalake(table_or_uri=dt, data=tb, mode="overwrite") Overwrite creates new file Nov 27, 2022
@0xdarkman
Copy link
Author

delta_table.vacuum(retention_hours=0, enforce_retention_duration=False, dry_run=False)

I have to perform vacuum after all in order to delete non latest version files.

@0xdarkman 0xdarkman reopened this Nov 28, 2022
@wjones127
Copy link
Collaborator

Hi @0xdarkman, this is intentional and part of the Delta protocol. ACID guarantees and time travel would not work without it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants