Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protocol update for Table Features #1450

Closed
wants to merge 11 commits into from
Closed

Conversation

xupefei
Copy link
Collaborator

@xupefei xupefei commented Oct 20, 2022

Description

This PR proposes a change to the Delta Protocol to accommodate Table Features discussed in [Feature Request] Table Features to specify the features needed to read/write to a table #1408.

TOC is updated using doctoc.

How was this patch tested?

Not needed.

@xupefei xupefei marked this pull request as ready for review October 26, 2022 13:22
@xupefei
Copy link
Collaborator Author

xupefei commented Nov 30, 2022

Hey @tdas and @allisonport-db, could you review this PR?

PROTOCOL.md Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
@xupefei xupefei requested review from tdas and removed request for scottsand-db and allisonport-db December 13, 2022 13:30
@xupefei
Copy link
Collaborator Author

xupefei commented Dec 13, 2022

Hey @scottsand-db and @allisonport-db, for some reason Github removed you two from the reviewer list after I re-requested a review from @tdas. Please add yourself back :)

allisonport-db pushed a commit that referenced this pull request Dec 16, 2022
This PR implements Table Features proposed in the feature request (#1408) and the PROTOCOL doc (#1450).

This PR implements the basic functionality, including
- The protocol structure and necessary APIs
- Protocol upgrade logic
- Append-only feature ported to Table Features
- Protocol upgrade path
- User-facing APIs, such as allowing referencing features manually
- Partial test coverage

Not covered by this PR:
- Adapt all features
- Full test coverage
- Make `DESCRIBE TABLE` show referenced features
- Enable table clone and time travel paths

Table Features support starts from reader protocol version `3` and writer version `7`. When supported, features can be **referenced** by a protocol by placing a `DeltaFeatureDescriptor` into the protocol's `readerFeatures` and/or `writerFeatures`.

A feature can be one of two types: writer-only and reader-writer. The first type means that only writers must care about such a feature, while the latter means that in addition to writers, readers must also be aware of the feature to read the data correctly. We now have the following features released:

- `appendOnly`: legacy, writer-only
- `invariants`: legacy, writer-only
- `checkConstriants`: legacy, writer-only
- `changeDataFeed`: legacy, writer-only
- `generatedColumns`: legacy, writer-only
- `columnMapping`: legacy, reader-writer
- `identityColumn`: legacy, writer-only
- `deletionVector`: native, reader-writer

Some examples of the `protocol` action:

```scala
// Valid protocol. Both reader and writer versions are capable.
Protocol(
  minReaderVersion = 3,
  minWriterVersion = 7,
  readerFeatures = {(columnMapping,enabled), (changeDataFeed,enabled)},
  writerFeatures = {(appendOnly,enabled), (columnMapping,enabled), (changeDataFeed,enabled)})

// Valid protocol. Only writer version is capable. "columnMapping" is implicitly enabled by readers.
Protocol(
  minReaderVersion = 2,
  minWriterVersion = 7,
  readerFeatures = None,
  writerFeatures = {(columnMapping,enabled)})

// Invalid protocol. Reader version does enable "columnMapping" implicitly.
Protocol(
  minReaderVersion = 1,
  minWriterVersion = 7,
  readerFeatures = None,
  writerFeatures = {(columnMapping,enabled)})
```

When reading or writing a table, clients MUST respect all enabled features.

Upon table creation, the system assigns the table a minimum protocol that satisfies all features that are **automatically enabled** in the table's metadata. This means the table can be on a "legacy" protocol with both `readerFeatures` and `writerFeatures` set to `None` (if all active features are legacy, which is the current behavior) or be on a Table Features-capable protocol with all active features explicitly referenced in `readerFeatures` and/or `writerFeatures` (if one of the active features is Table Features-native or the user has specified a Table Features-capable protocol version).

It's possible to upgrade an existing table to support table features. The update can be either for writers or for both readers and writers. During the upgrade, the system will explicitly reference all legacy features that are implicitly supported by the old protocol.

Users can mark a feature to be required by a table by using the following commands:
```sql
-- for an existing table
ALTER TABLE table_name SET TBLPROPERTIES (delta.feature.featureName = 'enabled')
-- for a new table
CREATE TABLE table_name ... TBLPROPERTIES (delta.feature.featureName = 'enabled')
-- for all new tables
SET spark.databricks.delta.properties.defaults.feature.featureName = 'enabled'
```
When some features are set to `enabled` in table properties and some others in Spark sessions, the final table will enable all features defined in two places:
```sql
SET spark.databricks.delta.properties.defaults.feature.featureA = 'enabled';
CREATE TABLE table_name ... TBLPROPERTIES (delta.feature.featureB = 'enabled')
--- 'table_name' will have 'featureA' and 'featureB' enabled.
```
Closes #1520

GitOrigin-RevId: 2b05f397b24e57f1804761b3242a0f29098a209c
Copy link
Collaborator

@bart-samwel bart-samwel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a bunch of nits!

PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
@scottsand-db scottsand-db self-requested a review December 19, 2022 19:11
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments/questions.

vkorukanti pushed a commit that referenced this pull request Dec 29, 2022
This change addresses feedback in #1450 (comment) to remove all uses of `TableFeatureDescriptor` in the codebase because that `status` field in the descriptor can only be `enabled` which can be omitted. Therefore we could simplify store feature names as `Set[String]` in the protocol and assume all listed features are `enabled`.

`Protocol` action before this change:
```json
{
  "protocol":{
    "readerVersion":3,
    "writerVersion":7,
    "readerFeatures":[
      {"name":"columnMapping","status":"enabled"}
    ],
    "writerFeatures":[
      {"name":"columnMapping","status":"enabled"},
      {"name":"identityColumns","status":"enabled"}
    ]
  }
}
```
`Protocol` action after this change:
```json
{
  "protocol":{
    "readerVersion":3,
    "writerVersion":7,
    "readerFeatures":["columnMapping"],
    "writerFeatures":["columnMapping","identityColumns"]
  }
}
```

GitOrigin-RevId: 445601bbaebf39f1cd41f8257ece7aeaff48e101
@xupefei xupefei requested review from scottsand-db, tdas and bart-samwel and removed request for tdas, scottsand-db and bart-samwel January 3, 2023 16:53
@xupefei xupefei requested review from bart-samwel and scottsand-db and removed request for scottsand-db and bart-samwel January 5, 2023 12:44
Copy link
Contributor

@tdas tdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Will merge

PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
PROTOCOL.md Outdated Show resolved Hide resolved
@xupefei xupefei deleted the tf-protocol branch March 3, 2023 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants