forked from cockroachdb/pebble
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
db: track files' pre-zeroing largest sequence numbers
When there exist no keys beneath a compaction's key range, Pebble performs sequence number zeroing. This is an optimization that allows for easier detection of new user keys during iteration. This commit introduces a new FileMetadata field LargestSeqNumAbsolute that provides an upper bound on the sequence numbers of a sstables' keys before they were zeroed. This is useful as a stable upper bound on the recency of an sstable's keys. In this commit we use this new upper bound to provide an alternative solution to the interaction between delete-only compactions and sequence number zeroing. Previously any compaction that zeroed sequence numbers and overlapped a delete-only compaction hint was required to clear the conflicting hints to ensure a delete-only compaction did not accidentally drop a table containing keys more recent than any of the hint's constituent tombstones. This interaction was a bit indirect and subtle. Encoding the pre-zeroing sequence number on the file metadata is more direct, and will allow us to use the sequence number for recency ordering of sstables' keys more generally, including in cockroachdb#2112. When the database is closed and re-opened, the new field LargestSeqNumAbsolute is initialized to LargestSeqNum for all existing sstables. This means that LargestSeqNumAbsolute only provides an upper bound of a sstables' keys' sequence numbers over the lifetime of the database instance in the current process. This is sufficient for many use cases, including delete-only compaction accounting. The reason this is sufficient in the delete-only compaction use case is subtle. The problem we're seeking to avoid is a range tombstone [start,end)#n deleting a key k#m such that s ≤ k < e and m ≥ n. Because of the sequence number invariant, the range tombstone can never fall beneath the key k that it does not delete within the LSM. However, our in-memory delete-only compaction hints are not atomically updated with transformations of the LSM. They represent the state of the LSM at a single instant when the table stats collector observed range deletion(s) within a particular file. This stale view of the LSM is what necessitates a mechanism like LargestSeqNumAbsolute to avoid erroroneous applications of a deletion hint. After a database restart, none of the previous instance's in-memory delete-only compactions hints exist. The table stats collector must re-populate the hints by scanning range deletions in sstables in the background. However, because the sequence number invariant prevents inversion of sequence numbers across process restarts, any hints we construct from the LSM will be correct with respect to that view of the LSM.
- Loading branch information
Showing
14 changed files
with
212 additions
and
178 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.