Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce schema evolution on RecordBatchWriter #2024

Merged
merged 1 commit into from
Feb 5, 2024

Conversation

rtyler
Copy link
Member

@rtyler rtyler commented Jan 3, 2024

This commit introduces the WriteMode enum and the ability to specify writes which should enable schema
evolution
.

The result of this is a new metaData action added to the transaction log with the write which reflects the updated schema

There are some caveats however such as all writes must include non-nullable columns.

Fixes #1386

Sponsored-by: Raft, LLC.

@github-actions github-actions bot added binding/rust Issues for the Rust crate crate/core labels Jan 3, 2024
Copy link

github-actions bot commented Jan 3, 2024

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@rtyler rtyler changed the title Introduce schema evolution on RecordBatchWriter feat: introduce schema evolution on RecordBatchWriter Jan 3, 2024
@rtyler rtyler marked this pull request as ready for review January 3, 2024 20:56
@rtyler rtyler added this to the Rust v0.17 milestone Jan 3, 2024
@michaelsauget
Copy link

Very excited to see this feature merged into the lib, this is the last blocker to adopting delta-rs for processing small to medium datasets outside of Spark! :)

This commit introduces the `WriteMode` enum and the ability to specify
writes which should enable [schema
evolution](https://delta.io/blog/2023-02-08-delta-lake-schema-evolution/).

The result of this is a new `metaData` action added to the transaction
log with the write which reflects the updated schema

There are some caveats however such as all writes must include non-nullable columns.

This change does not modify the Write operation which has a datafusion
dependency. Unfortunately we have some redundancy in API surface insofar
that the writer in src/operations/ just performs parquet writes. The
Write operation however requires datafusion and wiull actually effect
transaction log writes.

Fixes delta-io#1386

Sponsored-by: Raft, LLC.
let second_data = serde_json::json!(
{
"id" : 1,
"name" : "Ion"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe : )

Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work Tyler! :) We do likely need to port some stuff over to creates/core/src/operations/writer for it to become available to the operations directly as well

@rtyler rtyler merged commit 105fb5d into delta-io:main Feb 5, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Schema evolution mergeSchema support
3 participants