Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support deletionVector field in transaction log #928

Closed
craustin opened this issue Nov 9, 2022 · 3 comments
Closed

Support deletionVector field in transaction log #928

craustin opened this issue Nov 9, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@craustin
Copy link

craustin commented Nov 9, 2022

Environment

Delta-rs version: v0.6.3

Binding: Python

Environment:

  • Cloud provider: None
  • OS: Debian v10
  • Other:

Bug

What happened:
I get 100+ warnings about unsupported field deletionVector when initializing a DeltaTable:

[2022-11-09T21:46:45Z WARN  deltalake::action::parquet_read] Unexpected field name `deletionVector` for remove action: Row { fields: [("path", Str("message_ts_date=2022-11-04/part-00002-ac7d1020-bc3b-4309-a866-0da09fdc1f84.c000.snappy.parquet")), ("deletionTimestamp", Long(1667640938528)), ("dataChange", Bool(false)), ("extendedFileMetadata", Bool(true)), ("partitionValues", MapInternal(Map { entries: [(Str("message_ts_date"), Str("2022-11-04"))] })), ("size", Long(35050718)), ("tags", MapInternal(Map { entries: [(Str("MAX_INSERTION_TIME"), Str("1667640789000002")), (Str("INSERTION_TIME"), Str("1667640789000002")), (Str("MIN_INSERTION_TIME"), Str("1667640789000002")), (Str("OPTIMIZE_TARGET_SIZE"), Str("268435456"))] })), ("deletionVector", Null)] }

What you expected to happen:
No warnings.

How to reproduce it:
I produced the Delta table w/ Azure Databricks. Looks like deletionVector is a recent addition to the spec:
delta-io/delta@16dad5a

[In my case, deletionVector is always Null, so this is likely not indicative of a real issue, just warning spam.]

@craustin craustin added the bug Something isn't working label Nov 9, 2022
@roeap
Copy link
Collaborator

roeap commented Nov 10, 2022

I do remember something similar coming up in the past, essentially we throw warnings when we encounter unknown fields in the delta log. This definately needs to be fixed.

@dennyglee
Copy link
Collaborator

FYI, I just added #929, so we can start off by at least checking the protocol and reporting it back as an initial first step (i.e., get a message denoting an error). For more information about deletion vectors, please refer to #1367

@george-zubrienko
Copy link
Contributor

george-zubrienko commented Jun 8, 2023

This problem is back when upgrading from delta 2.3 to delta 2.4 with delta-rs (python) 0.8.1.

UPD. It appears only table with specific features have this problem. Currently we only see these warning for tables with automatic schema merge enabled (thus they do merges)

UPD2. We noticed this only happens when a table written by 2.3 delta jar is updated by 2.4 jar with either merge or schema/data override. Re-creating the table in question from 2.4 jar Spark runtime resolves the issue :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants