Skip to content

Commit

Permalink
Update Delta Protocol for Identity column
Browse files Browse the repository at this point in the history
The ability to have a column that is auto incrementing and generates integer values is a highly requested feature. This is a well established feature in existing data warehouses (such as Oracle, Redshift, ...). Not having this basic functionality makes it difficult for users to migrate from their existing DWs to Delta Lake.

Hence, we propose to support identity columns in Delta Lake. As this change requires to update Delta protocol, this PR updates `PROTOCOL.md` to describe the Identity column support in the transaction log layer. We will work on the user facing feature after the new protocol format is accepted.

Closes delta-io#904

Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
GitOrigin-RevId: 76d8f070c914539b3fe5a4aeb4147d10e7a4cfe8
  • Loading branch information
mengtong-db authored and jbguerraz committed Jul 6, 2022
1 parent 644496a commit f53b493
Showing 1 changed file with 21 additions and 0 deletions.
21 changes: 21 additions & 0 deletions PROTOCOL.md
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,26 @@ When the table property `delta.appendOnly` is set to `true`:
- The value of `delta.generationExpression` SHOULD be parsed as a SQL expression.
- Writers MUST enforce that any data writing to the table satisfy the condition `(<value> <=> <generation expression>) IS TRUE`. `<=>` is the NULL-safe equal operator which performs an equality comparison like the `=` operator but returns `TRUE` rather than NULL if both operands are `NULL`

## Identity Columns

Delta supports defining Identity columns on Delta tables. Delta will generate unique values for Identity columns when users do not explicitly provide values for them when writing to such tables . The `metadata` for a column in the table schema MAY contain the following keys for Identity column properties
- `delta.identity.start`: Starting value for the Identity column. This is a long type value. It should not be changed after table creation.
- `delta.identity.step`: Increment to the next Identity value. This is a long type value. It cannot be set to 0. It should not be changed after table creation.
- `delta.identity.highWaterMark`: The highest value generated for the Identity column. This is a long type value. When `delta.identity.step` is positive (negative), this should be the largest (smallest) value in the column.
- `delta.identity.allowExplicitInsert`: True if this column allows explicitly inserted values. This is a boolean type value. It should not be changed after table creation.

When `delta.identity.allowExplicitInsert` is true, writers should meet the following requirements:
- Users should be allowed to provide their own values for Identity columns.

When `delta.identity.allowExplicitInsert` is false, writers should meet the following requirements:
- Users should not be allowed to provide their own values for Identity columns.
- Delta should generate values that satisfy the following requirements
- The new value does not already exist in the column.
- The new value should satisfy `value = start + k * step` where k is a non-negative integer.
- The new value should be higher than `delta.identity.highWaterMark`. When `delta.identity.step` is positive (negative), the new value should be the greater (smaller) than `delta.identity.highWaterMark`.
- Overflow when calculating generated Identity values should be detected and such writes should not be allowed.
- `delta.identity.highWaterMark` should be updated to the new highest value when the write operation commits.

## Writer Version Requirements

The requirements of the writers according to the protocol versions are summarized in the table below. Each row inherits the requirements from the preceding row.
Expand All @@ -493,6 +513,7 @@ Writer Version 2 | - Support [`delta.appendOnly`](#append-only-tables)<br>- Supp
Writer Version 3 | Enforce:<br>- `delta.checkpoint.writeStatsAsJson`<br>- `delta.checkpoint.writeStatsAsStruct`<br>- `CHECK` constraints
Writer Version 4 | - Support Change Data Feed<br>- Support [Generated Columns](#generated-columns)
Writer Version 5 | Respect [Column Mapping](#column-mapping)
Writer Version 6 | Support [Identity Columns](#identity-columns)

# Requirements for Readers

Expand Down

0 comments on commit f53b493

Please sign in to comment.