-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Widen the type of existing columns or fields without rewriting the table #2622
Open
3 of 8 tasks
Comments
This was referenced Feb 9, 2024
vkorukanti
pushed a commit
that referenced
this issue
Feb 29, 2024
## Description This change introduces the `typeWidening` delta table feature, allowing to widen the type of existing columns and fields in a delta table using the `ALTER TABLE CHANGE COLUMN TYPE` or `ALTER TABLE REPLACE COLUMNS` commands. The table feature is introduced as `typeWidening-dev` during implementation and is available in testing only. For now, only byte -> short -> int are supported. Other changes will require support in the Spark parquet reader that will be introduced in Spark 4.0 Type widening feature request: #2622 Type Widening protocol RFC: #2624 A new test suite `DeltaTypeWideningSuite` is created, containing: - `DeltaTypeWideningAlterTableTests`: Covers applying supported and unsupported type changes on partitioned columns, non-partitioned columns and nested fields - `DeltaTypeWideningTableFeatureTests`: Covers adding the `typeWidening` table feature ## This PR introduces the following *user-facing* changes The table feature is available in testing only, there's no user-facing changes as of now. The type widening table feature will introduce the following changes: - Adding the `typeWidening` via a table property: ``` ALTER TABLE t SET TBLPROPERTIES (‘delta.enableTypeWidening' = true) ``` - Apply a widening type change: ``` ALTER TABLE t CHANGE COLUMN int_col TYPE long ``` or ``` ALTER TABLE t REPLACE COLUMNS int_col TYPE long ``` Note: both ALTER TABLE commands reuse the existing syntax for setting a table property and applying a type change, no new SQL syntax is being introduced by this feature. Closes #2645 GitOrigin-RevId: 2ca0e6b22ec24b304241460553547d0d4c6026a2
This was referenced Mar 1, 2024
allisonport-db
pushed a commit
that referenced
this issue
Mar 7, 2024
#### Which Delta project/connector is this regarding? -Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) This change is part of the type widening table feature. Type widening feature request: #2622 Type Widening protocol RFC: #2624 It introduces metadata to record information about type changes that were applied using `ALTER TABLE`. This metadata is stored in table schema, as specified in https://github.com/delta-io/delta/pull/2624/files#diff-114dec1ec600a6305fe7117bed7acb46e94180cdb1b8da63b47b12d6c40760b9R28 For example, changing a top-level column `a` from `int` to `long` will update the schema to include metadata: ``` { "name" : "a", "type" : "long", "nullable" : true, "metadata" : { "delta.typeChanges": [ { "tableVersion": 1, "fromType": "integer", "toType": "long" }, { "tableVersion": 5, "fromType": "integer", "toType": "long" } ] } } ``` - A new test suite `DeltaTypeWideningMetadataSuite` is created to cover methods handling type widening metadata. - Tests covering adding metadata to the schema when running `ALTER TABLE CHANGE COLUMN` are added to `DeltaTypeWideningSuite` Closes #2708 GitOrigin-RevId: cdbb7589f10a8355b66058e156bb7d1894268f4d
vkorukanti
pushed a commit
that referenced
this issue
Mar 15, 2024
This PR includes changes from #2708 which isn't merged yet. The changes related only to dropping the table feature are in commit e2601a6 ## Description This change is part of the type widening table feature. Type widening feature request: #2622 Type Widening protocol RFC: #2624 It adds the ability to remove the type widening table feature by running the `ALTER TABLE DROP FEATURE` command. Before dropping the table feature, traces of it are removed from the current version of the table: - Files that were written before the latest type change and thus contain types that differ from the current table schema are rewritten using an internal `REORG TABLE` operation. - Metadata in the table schema recording previous type changes is removed. ## How was this patch tested? - A new set of tests are added to `DeltaTypeWideningSuite` to cover dropping the table feature with tables in various states: with/without files to rewrite or metadata to remove. ## Does this PR introduce _any_ user-facing changes? The table feature is available in testing only, there's no user-facing changes as of now. When the feature is available, this change enables the following user action: - Drop the type widening table feature: ``` ALTER TABLE t DROP FEATURE typeWidening ``` This succeeds immediately if no version of the table contains traces of the table feature (= no type changes were applied in the available history of the table. Otherwise, if the current version contains traces of the feature, these are removed: files are rewritten if needed and type widening metadata is removed from the table schema. Then, an error `DELTA_FEATURE_DROP_WAIT_FOR_RETENTION_PERIOD` is thrown, telling the user to retry once the retention period expires. If only previous versions contain traces of the feature, no action is applied on the table, and an error `DELTA_FEATURE_DROP_HISTORICAL_VERSIONS_EXIST` is thrown, telling the user to retry once the retention period expires.
This was referenced Mar 19, 2024
tdas
pushed a commit
that referenced
this issue
Mar 22, 2024
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [x] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description This change is part of the type widening table feature. Type widening feature request: #2622 Type Widening protocol RFC: #2624 It adds automatic type widening as part of schema evolution in MERGE INTO: - During resolution of the `DeltaMergeInto` plan, when merging the target and source schema to compute the schema after evolution, we keep the wider source type when type widening is enabled on the table. - When updating the table schema at the beginning of MERGE execution, metadata is added to the schema to record type changes. ## How was this patch tested? - A new test suite `DeltaTypeWideningSchemaEvolutionSuite` is added to cover type evolution in MERGE ## This PR introduces the following *user-facing* changes The table feature is available in testing only, there are no user-facing changes as of now. When automatic schema evolution is enabled in MERGE and the source schema contains a type that is wider than the target schema: With type widening disabled: the type in the target schema is not changed. the ingestion behavior follows the `storeAssignmentPolicy` configuration: - LEGACY: source values that overflow the target type are stored as `null` - ANSI: a runtime check is injected to fail on source values that overflow the target type. - STRICT: the MERGE operation fails during analysis. With type widening enabled: the type in the target schema is updated to the wider source type. The MERGE operation always succeeds: ``` -- target: key int, value short -- source: key int, value int MERGE INTO target USING source ON target.key = source.key WHEN MATCHED THEN UPDATE SET * ``` After the MERGE operation, the target schema is `key int, value int`.
tdas
pushed a commit
that referenced
this issue
Mar 25, 2024
#### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [X] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description This change is part of the type widening table feature. Type widening feature request: #2622 Type Widening protocol RFC: #2624 It adds automatic type widening as part of schema evolution in INSERT. During resolution, when schema evolution and type widening are enabled, type differences between the input query and the target table are handled as follows: - If the type difference qualifies for automatic type evolution: the input type is left as is, the data will be inserted with the new type and the table schema will be updated in `ImplicitMetadataOperation` (already implemented as part of MERGE support) - If the type difference doesn't qualify for automatic type evolution: the current behavior is preserved: a cast is added from the input type to the existing target type. ## How was this patch tested? - Tests are added to `DeltaTypeWideningAutomaticSuite` to cover type evolution in INSERT ## This PR introduces the following *user-facing* changes The table feature is available in testing only, there's no user-facing changes as of now. When automatic schema evolution is enabled in INSERT and the source schema contains a type that is wider than the target schema: With type widening disabled: the type in the target schema is not changed. A cast is added to the input to insert to match the expected target type. With type widening enabled: the type in the target schema is updated to the wider source type. ``` -- target: key int, value short -- source: key int, value int INSERT INTO target SELECT * FROM source ``` After the INSERT operation, the target schema is `key int, value int`.
tdas
pushed a commit
that referenced
this issue
Apr 24, 2024
## Description Expose the type widening table feature outside of testing and set its preview user-facing name: typeWidening-preview (instead of typeWidening-dev used until now). Feature description: #2622 The type changes that are supported for not are `byte` -> `short` -> `int`. Other types depend on Spark changes which are going to land in Spark 4.0 and will be available once Delta picks up that Spark version. ## How was this patch tested? Extensive testing in `DeltaTypeWidening*Suite`. ## Does this PR introduce _any_ user-facing changes? User facing changes were already covered in PRs implementing this feature. In short, it allows: - Adding the type widening table feature (using a table property) ``` ALTER TABLE t SET TBLPROPERTIES (‘delta.enableTypeWidening = true); ``` - Manual type changes: ``` ALTER TABLE t CHANGE COLUMN col TYPE INT; ``` - Automatic type changes via schema evolution: ``` CREATE TABLE target (id int, value short); CREATE TABLE source (id int, value in); SET spark.databricks.delta.schema.autoMerge.enabled = true; INSERT INTO target SELECT * FROM source; -- value now has type int in target ``` - Dropping the table feature which rewrites data to make the table reading by all readers: ``` ALTER TABLE t DROP FEATURE 'typeWidening' ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Feature request
Which Delta project/connector is this regarding?
Overview
Delta tables currently lack the ability to change the type of a column or nested field after the table was created. Changing type then currently requires copying the while table over, whether that's by actually creating a copy of the table or doing the copy to the new type in place using e.g. column mapping.
This feature request targets specifically widening type changes. In the case of widening change, we are guaranteed that all values present in files that were written before the type change can be promoted to the new, wider type without the risk of overflow or precision loss.
In particular, the following type changes can be supported:
Motivation
The type of a column or field is mostly fixed once the table has been created: we only allow setting a column or field to nullable.
The type of a column can become too narrow to store the required values in the lifetime of a table, for example:
The only way to handle these situations today is to manually rewrite the table to add a new column with the type wanted and copy the data to the new column. This can be expensive for large tables that must be rewritten and will conflict with every concurrent operation.
Further details
Design Doc: https://docs.google.com/document/d/1KIqf6o6JMD7e8aMrGlUROSwTfzYeW4NCIZVAUMW_-Tc
Willingness to contribute
The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?
The text was updated successfully, but these errors were encountered: