You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Java and Python have a different approach here. I don't have all the historical context, but prior to Iceberg V2 tables, there was no such thing as operations:
I think this is a good thing to validate against.
This should happen in the _commit method of the _SnapshotProducer. Similar to Java:
We refresh the table, so we have the latest snapshots. We check from the startingSnapshotId to the current-snapshot-id if any snapshots were added. If this is the case, we want to _validate() if there are any conflicts.
- When doing an `Append`: Adding new data
- All okay: `{Append,Replace,Overwrite,Delete}`, don't affect the operation, and we can just append
- When doing a `Replace`: Replacing existing data (eg. compaction)
- Ok: Append
- Not ok: Replace, Overwrite, Delete. We should fail, and later we can see if there is any overlap (eg compare if they touch the same partitions).
- When doing a `Overwrite`: Adding and deleting data
- Not ok: Append, Replace, Overwrite, Delete. We should fail, and later we can see if there is any overlap (eg compare if they touch the same partitions).
- When doing a `Delete`
- Not ok: Append, Replace, Overwrite, Delete. We should fail, and later we can see if there is any overlap (eg compare if they touch the same partitions/predicate). We should also take into account the difference between MoR and CoW.
Let's only do the very simple cases at first, so we can add ones one by one to keep the PR within reasonable size.
Once we have this in place, we can also do automatic retries: #269
The text was updated successfully, but these errors were encountered:
Feature Request / Improvement
Java and Python have a different approach here. I don't have all the historical context, but prior to Iceberg V2 tables, there was no such thing as operations:
I think this is a good thing to validate against.
This should happen in the
_commit
method of the_SnapshotProducer
. Similar to Java:startingSnapshotId
to thecurrent-snapshot-id
if any snapshots were added. If this is the case, we want to_validate()
if there are any conflicts.There's also a small section on conflict resolution.
Let's only do the very simple cases at first, so we can add ones one by one to keep the PR within reasonable size.
Once we have this in place, we can also do automatic retries: #269
The text was updated successfully, but these errors were encountered: