-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: introduce schema evolution on RecordBatchWriter #2024
Conversation
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
72babcc
to
1dc1cd2
Compare
Very excited to see this feature merged into the lib, this is the last blocker to adopting delta-rs for processing small to medium datasets outside of Spark! :) |
This commit introduces the `WriteMode` enum and the ability to specify writes which should enable [schema evolution](https://delta.io/blog/2023-02-08-delta-lake-schema-evolution/). The result of this is a new `metaData` action added to the transaction log with the write which reflects the updated schema There are some caveats however such as all writes must include non-nullable columns. This change does not modify the Write operation which has a datafusion dependency. Unfortunately we have some redundancy in API surface insofar that the writer in src/operations/ just performs parquet writes. The Write operation however requires datafusion and wiull actually effect transaction log writes. Fixes delta-io#1386 Sponsored-by: Raft, LLC.
1dc1cd2
to
b603d59
Compare
let second_data = serde_json::json!( | ||
{ | ||
"id" : 1, | ||
"name" : "Ion" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work Tyler! :) We do likely need to port some stuff over to creates/core/src/operations/writer for it to become available to the operations directly as well
This commit introduces the
WriteMode
enum and the ability to specify writes which should enable schemaevolution.
The result of this is a new
metaData
action added to the transaction log with the write which reflects the updated schemaThere are some caveats however such as all writes must include non-nullable columns.
Fixes #1386
Sponsored-by: Raft, LLC.