-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Protocol Change Request] Type Widening table feature #2624
Conversation
b9f6727
to
78307b2
Compare
Co-authored-by: Ryan Johnson <ryan.johnson@databricks.com>
Co-authored-by: Ryan Johnson <ryan.johnson@databricks.com>
Co-authored-by: Bart Samwel <bart.samwel@databricks.com>
Can add `[Protocol Change Request] in the title. I added the issue template for this, but for a silly reason its not kicking in - https://github.com/delta-io/delta/blob/master/.github/ISSUE_TEMPLATE/protocol-rfc.md Can you update the description based on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
## Description This change introduces the `typeWidening` delta table feature, allowing to widen the type of existing columns and fields in a delta table using the `ALTER TABLE CHANGE COLUMN TYPE` or `ALTER TABLE REPLACE COLUMNS` commands. The table feature is introduced as `typeWidening-dev` during implementation and is available in testing only. For now, only byte -> short -> int are supported. Other changes will require support in the Spark parquet reader that will be introduced in Spark 4.0 Type widening feature request: #2622 Type Widening protocol RFC: #2624 A new test suite `DeltaTypeWideningSuite` is created, containing: - `DeltaTypeWideningAlterTableTests`: Covers applying supported and unsupported type changes on partitioned columns, non-partitioned columns and nested fields - `DeltaTypeWideningTableFeatureTests`: Covers adding the `typeWidening` table feature ## This PR introduces the following *user-facing* changes The table feature is available in testing only, there's no user-facing changes as of now. The type widening table feature will introduce the following changes: - Adding the `typeWidening` via a table property: ``` ALTER TABLE t SET TBLPROPERTIES (‘delta.enableTypeWidening' = true) ``` - Apply a widening type change: ``` ALTER TABLE t CHANGE COLUMN int_col TYPE long ``` or ``` ALTER TABLE t REPLACE COLUMNS int_col TYPE long ``` Note: both ALTER TABLE commands reuse the existing syntax for setting a table property and applying a type change, no new SQL syntax is being introduced by this feature. Closes #2645 GitOrigin-RevId: 2ca0e6b22ec24b304241460553547d0d4c6026a2
#### Which Delta project/connector is this regarding? -Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) This change is part of the type widening table feature. Type widening feature request: #2622 Type Widening protocol RFC: #2624 It introduces metadata to record information about type changes that were applied using `ALTER TABLE`. This metadata is stored in table schema, as specified in https://github.com/delta-io/delta/pull/2624/files#diff-114dec1ec600a6305fe7117bed7acb46e94180cdb1b8da63b47b12d6c40760b9R28 For example, changing a top-level column `a` from `int` to `long` will update the schema to include metadata: ``` { "name" : "a", "type" : "long", "nullable" : true, "metadata" : { "delta.typeChanges": [ { "tableVersion": 1, "fromType": "integer", "toType": "long" }, { "tableVersion": 5, "fromType": "integer", "toType": "long" } ] } } ``` - A new test suite `DeltaTypeWideningMetadataSuite` is created to cover methods handling type widening metadata. - Tests covering adding metadata to the schema when running `ALTER TABLE CHANGE COLUMN` are added to `DeltaTypeWideningSuite` Closes #2708 GitOrigin-RevId: cdbb7589f10a8355b66058e156bb7d1894268f4d
This PR includes changes from #2708 which isn't merged yet. The changes related only to dropping the table feature are in commit e2601a6 ## Description This change is part of the type widening table feature. Type widening feature request: #2622 Type Widening protocol RFC: #2624 It adds the ability to remove the type widening table feature by running the `ALTER TABLE DROP FEATURE` command. Before dropping the table feature, traces of it are removed from the current version of the table: - Files that were written before the latest type change and thus contain types that differ from the current table schema are rewritten using an internal `REORG TABLE` operation. - Metadata in the table schema recording previous type changes is removed. ## How was this patch tested? - A new set of tests are added to `DeltaTypeWideningSuite` to cover dropping the table feature with tables in various states: with/without files to rewrite or metadata to remove. ## Does this PR introduce _any_ user-facing changes? The table feature is available in testing only, there's no user-facing changes as of now. When the feature is available, this change enables the following user action: - Drop the type widening table feature: ``` ALTER TABLE t DROP FEATURE typeWidening ``` This succeeds immediately if no version of the table contains traces of the table feature (= no type changes were applied in the available history of the table. Otherwise, if the current version contains traces of the feature, these are removed: files are rewritten if needed and type widening metadata is removed from the table schema. Then, an error `DELTA_FEATURE_DROP_WAIT_FOR_RETENTION_PERIOD` is thrown, telling the user to retry once the retention period expires. If only previous versions contain traces of the feature, no action is applied on the table, and an error `DELTA_FEATURE_DROP_HISTORICAL_VERSIONS_EXIST` is thrown, telling the user to retry once the retention period expires.
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [x] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description This change is part of the type widening table feature. Type widening feature request: #2622 Type Widening protocol RFC: #2624 It adds automatic type widening as part of schema evolution in MERGE INTO: - During resolution of the `DeltaMergeInto` plan, when merging the target and source schema to compute the schema after evolution, we keep the wider source type when type widening is enabled on the table. - When updating the table schema at the beginning of MERGE execution, metadata is added to the schema to record type changes. ## How was this patch tested? - A new test suite `DeltaTypeWideningSchemaEvolutionSuite` is added to cover type evolution in MERGE ## This PR introduces the following *user-facing* changes The table feature is available in testing only, there are no user-facing changes as of now. When automatic schema evolution is enabled in MERGE and the source schema contains a type that is wider than the target schema: With type widening disabled: the type in the target schema is not changed. the ingestion behavior follows the `storeAssignmentPolicy` configuration: - LEGACY: source values that overflow the target type are stored as `null` - ANSI: a runtime check is injected to fail on source values that overflow the target type. - STRICT: the MERGE operation fails during analysis. With type widening enabled: the type in the target schema is updated to the wider source type. The MERGE operation always succeeds: ``` -- target: key int, value short -- source: key int, value int MERGE INTO target USING source ON target.key = source.key WHEN MATCHED THEN UPDATE SET * ``` After the MERGE operation, the target schema is `key int, value int`.
#### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [X] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description This change is part of the type widening table feature. Type widening feature request: #2622 Type Widening protocol RFC: #2624 It adds automatic type widening as part of schema evolution in INSERT. During resolution, when schema evolution and type widening are enabled, type differences between the input query and the target table are handled as follows: - If the type difference qualifies for automatic type evolution: the input type is left as is, the data will be inserted with the new type and the table schema will be updated in `ImplicitMetadataOperation` (already implemented as part of MERGE support) - If the type difference doesn't qualify for automatic type evolution: the current behavior is preserved: a cast is added from the input type to the existing target type. ## How was this patch tested? - Tests are added to `DeltaTypeWideningAutomaticSuite` to cover type evolution in INSERT ## This PR introduces the following *user-facing* changes The table feature is available in testing only, there's no user-facing changes as of now. When automatic schema evolution is enabled in INSERT and the source schema contains a type that is wider than the target schema: With type widening disabled: the type in the target schema is not changed. A cast is added to the input to insert to match the expected target type. With type widening enabled: the type in the target schema is updated to the wider source type. ``` -- target: key int, value short -- source: key int, value int INSERT INTO target SELECT * FROM source ``` After the INSERT operation, the target schema is `key int, value int`.
Protocol Change Request
Description of the protocol change
This protocol change is part of the proposed Type Widening table feature, see feature request: #2622
Protocol RFC issue: #2623
The type widening table feature covers changing the type of existing columns or nested fields in a Delta table without having to rewrite the data.
Willingness to contribute
The Delta Lake Community encourages protocol innovations. Would you or another member of your organization be willing to contribute this feature to the Delta Lake code base?