Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Resolve inconsistencies between the V2 Checkpoint specification and implementation #2249

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 8 additions & 10 deletions PROTOCOL.md
Original file line number Diff line number Diff line change
Expand Up @@ -648,20 +648,18 @@ The schema of `sidecar` action is as follows:

Field Name | Data Type | Description | optional/required
-|-|-|-
fileName | String | Name of the sidecar file (not a path). The file must reside in the _delta_log/_sidecars directory. | required
path | String | URI-encoded path to the sidecar file. Because sidecar files must always reside in the table's own _delta_log/_sidecars directory, implementations are encouraged to store only the file's name (without scheme or parent directories). | required
sizeInBytes | Long | Size of the sidecar file. | required
modificationTime | Long | The time this logical file was created, as milliseconds since the epoch. | required
type | String | Type of sidecar. Valid values are: "fileaction". This could be extended in future to allow different kinds of sidecars. | required
tags|`Map[String, String]`|Map containing any additional metadata about the checkpoint sidecar file. | optional

The following is an example `sidecar` action:
```json
{
"sidecar":{
"fileName": "016ae953-37a9-438e-8683-9a9a4a79a395.parquet",
"path": "016ae953-37a9-438e-8683-9a9a4a79a395.parquet",
"sizeInBytes": 2304522,
"modificationTime": 1512909768000,
"type": "fileaction",
"tags": {}
}
}
Expand All @@ -673,14 +671,14 @@ It describes the details about the checkpoint. It has the following schema:

Field Name | Data Type | Description | optional/required
-|-|-|-
flavor|`String`|The flavor of the V2 checkpoint. Allowed values: "flat".| required
version|`Long`|The checkpoint version.| required
tags|`Map[String, String]`|Map containing any additional metadata about the v2 spec checkpoint.| optional

E.g.
```json
{
"checkpointMetadata":{
"flavor":"flat",
"version":1,
"tags":{}
}
}
Expand Down Expand Up @@ -1124,18 +1122,18 @@ the `_last_checkpoint` file, so that readers don't have to read the V2 Checkpoin

E.g. showing the content of V2 spec checkpoint:
```
{"checkpointMetadata":{"flavor":"flat","tags":{}}}
{"checkpointMetadata":{"version":364475,"tags":{}}}
{"metaData":{...}}
{"protocol":{...}}
{"txn":{"appId":"3ba13872-2d47-4e17-86a0-21afd2a22395","version":364475}}
{"txn":{"appId":"3ae45b72-24e1-865a-a211-34987ae02f2a","version":4389}}
{"sidecar":{"path":"3a0d65cd-4056-49b8-937b-95f9e3ee90e5.parquet","sizeInBytes":2341330,"modificationTime":1512909768000,"type":"fileaction","tags":{}}
{"sidecar":{"path":"016ae953-37a9-438e-8683-9a9a4a79a395.parquet","sizeInBytes":8468120,"modificationTime":1512909848000,"type":"fileaction","tags":{}}
{"sidecar":{"path":"3a0d65cd-4056-49b8-937b-95f9e3ee90e5.parquet","sizeInBytes":2341330,"modificationTime":1512909768000,"tags":{}}
{"sidecar":{"path":"016ae953-37a9-438e-8683-9a9a4a79a395.parquet","sizeInBytes":8468120,"modificationTime":1512909848000,"tags":{}}
```

Another example of a v2 spec checkpoint without sidecars:
```
{"checkpointMetadata":{"flavor":"flat","tags":{}}}
{"checkpointMetadata":{"version":364475,"tags":{}}}
{"metaData":{...}}
{"protocol":{...}}
{"txn":{"appId":"3ba13872-2d47-4e17-86a0-21afd2a22395","version":364475}}
Expand Down
Loading