-
Notifications
You must be signed in to change notification settings - Fork 563
Description
_latest.manifest poses problems because it is a mutable file and we read it with multiple get_range requests. These two things are incompatible because they can lead to inconsistent views of the file, as found in #1356.
I propose we make two changes:
- In addition to writing
_latest.manifest, write a JSON file_latest_version.jsonthat points to the current latest version, and - Zero-pad the manifest files so they can be efficiently listed on object storage. Because of backwards-compatibility concerns, we'll want to put this behind a feature flag and wait until later to make it activated by default.
Stage 1: Replacing _latest.manifest with _latest_version.json
Because _latest_version.json will be a small file, we can read it with a single request, avoiding any consistency issues. At minimum, we will need two fields: one for version and one for manifest_path. The latter will provide a migration path to change the format of the manifest paths.
{
"version": 24,
"manifest_path": "24.manifest",
"manifest_size": 2342501,
"reader_flags": 4,
"writer_flags": 5
}The new read path will be:
- Read
_latest_version.json, check whether we support thereader_flagslisted. - Get the correct manifest:
- If looking for the latest, list the
_versionsdirectory, and choose the one with the highest version. (Later, zero-padding will optimize this call) - If looking for a particular version, go straight to that version.
The write path will be:
- Commit the manifest file
- Overwrite the
_latest_version.json - Overwrite the
_latest.manifestfile
Backwards compatibility
Currently, readers will look for the _latest.manifest file. They will also be checking for the reader_flags in that file, so they will be able to detect if something changes in the flags.
Stage 2: Padding the manifest file names
The _latest_version.json isn't guaranteed to point to the real latest version. It will be written after the manifest is committed, so if the most recent writer is late or crashed, there could be a newer version. To detect that, we'd like to list the _versions directory. On S3 and GCS, we can list starting at a specific key and sorting lexicographically. But in order to have manifest files sort lexicographically, they need to be padded with zeros.
(Now, if a table is running cleanup_old_versions() regularly, it's not obvious there will be that many versions to look through. It's possible just listing, parsing the versions, and choosing the highest one would be fine for most tables.)
We can add a feature flag for zero padding. This is important on the write side, because writers need to all be committing to the same location; we can't have some writers trying to commit a zero-padded manifest and some committing to an old-style one.
We can make this opt-in at first and then after a while can make it default on new tables.
Backwards compatibility
This is a very hard breaking change, since older readers would not know anything about the new format. They will be able to load _latest.manifest, but time travel will be broken.
Only readers that are new enough will detect the reader flags and know they can't read it. So we should let this feature hang out deactivated in the background for a bit before we making it more public and even default.