Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protocol compliant commits for Delta operations #593

Closed
roeap opened this issue May 2, 2022 · 0 comments
Closed

Protocol compliant commits for Delta operations #593

roeap opened this issue May 2, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@roeap
Copy link
Collaborator

roeap commented May 2, 2022

Description

When committing transactions to the Delta log, make sure we never end up in an inconsistent or corrupt state.

Use Case

When performing mutating operations against a Delta table we have to make sure we never corrupt the table / data.

@roeap roeap added the enhancement New feature or request label May 2, 2022
roeap added a commit that referenced this issue Apr 7, 2023
# Description

This PR adds a `ConflictChecker` struct for conflict resolution in cases
of concurrent commit failures. The implementation is heavily inspired by
the [reference
implementation](https://github.com/delta-io/delta/blob/fe36a53f3c70c5f9c9b5052c12cd1703f495da97/core/src/main/scala/org/apache/spark/sql/delta/ConflictChecker.scala).
So far we have most tests from spark that specifically target conflict
resolution covered.

Working on this I thought a bit about what we may consider going
forward, as we move through the protocol versions :). In the end we
could end up with three main structs that are involved in validating a
commit.

* The existing `DataChecker`, which validates and potentially mutates
data when writing data files to disk. (Currently supports invariants)
* The upcoming `ConflictChecker`, which checks if a commit can be
re-tried in case of commit conflicts.
* A new `CommitChecker`, which does a-priory validation of the commit
itself (e.g. append only and other rules covered by tests in
[spark](https://github.com/delta-io/delta/blob/master/core/src/test/scala/org/apache/spark/sql/delta/OptimisticTransactionSuite.scala))

My hope is to get this PR merged right after we release `0.8.0`, so
there is some time to fill some holes and fully leverage the new feature
for `0.9.0`.

If folks agree, I would open some issues and start work on some
follow-ups..

## Follow-ups
* Extend `ConflictChecker` support conflict resolution for streaming
transactions
* Implement `CommitChecker`
* Deprecate old commit function.
* Extend `DataChecker`. 
* Consolidate record batch writer implementations.

# Related Issue(s)

part of #593 

# Documentation

<!---
Share links to useful documentation
--->

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
@rtyler rtyler closed this as completed Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants