-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: change FileMetadata (Version) consistency check for L0 files to allow flushed sstables that have overlapping seqnums #587
Comments
@itsbilal This is something that it would be good to look at in this milestone. @sumeerbhola Do you have thoughts on what the new consistency checks should look like? Perhaps something like: overlapping seqnum ranges in L0 sstables are allowed if the sstables do not overlap in key space. Is that sufficient? |
Sumeer confirmed the invariant @petermattis mentioned would work. Thinking about this more, perfectly implementing an overlap seqnum range detector (and extending each seqnum range on both ends as necessary) and then checking for the key range invariant in all those ranges would be a bit complex in what is otherwise a quick ordering check. Is it okay enough to just check the "adjacent" (in largest seqnum order) SSTs for not having overlapping key ranges if they have overlapping seqnum ranges? It wouldn't be a perfect check, but it'd catch the obvious errors. Another, crazier idea I had was to build in some sort of pebble version detection, and relax invariants only if the version is higher than the current version. Presumably we'd also need to avoid overwriting higher version numbers in OPTIONS files. |
Not sure how you're using "adjacent" here. Since we already have L0 tables sorted by largest seqnum, I was imagining a loop over the sstables finding contiguous sets of tables which overlap in their seqnums. For each set we'd verify that the tables within that set do not overlap in their key ranges. Is this the complexity you are referring to? Yes, definitely more complex than the existing checks, but not horribly so.
I'm not sure how this would work, so it sounds scary to me. |
Changes Manifest.CheckOrdering to allow L0 SSTables to overlap in sequence numbers as long as every set of SSTables overlapping in sequence numbers have no key range overlaps within them. Implemented for better compatibility with future versions of Pebble which could support partitioned flushes. Fixes cockroachdb#587.
Changes Manifest.CheckOrdering to allow L0 SSTables to overlap in sequence numbers. Implemented for better compatibility with future versions of Pebble which could support partitioned flushes and L0 sublevels. Fixes cockroachdb#587.
Changes Manifest.CheckOrdering to allow L0 SSTables to overlap in sequence numbers. Implemented for better compatibility with future versions of Pebble which could support partitioned flushes and L0 sublevels. Fixes cockroachdb#587.
Changes Manifest.CheckOrdering to allow L0 SSTables to overlap in sequence numbers. Implemented for better compatibility with future versions of Pebble which could support partitioned flushes and L0 sublevels. Fixes #587.
The Pebble prototype to improve import performance using L0 sub-levels for compaction (and other compaction heuristics, in https://github.com/sumeerbhola/pebble/tree/sublevel and https://github.com/sumeerbhola/cockroach/tree/sublevel) showed very promising results (about 3.5x faster for a large TPCC import, compared to RocksDB -- import finished in ~65 min).
One of the changes this prototype relies on is splitting sstables being flushed based on split points provided by the
L0SubLevels
data-structure to avoid generating wide L0 files that block a sub-level from accepting more files. Without this change, the wide files overwhelm the ability to do narrow (in key space, and consequently bytes) compactions out of L0 into Lbase, and to do narrow intra-L0 compactions.This change means flushed files have overlapping seqnums, which runs afoul of the current consistency checking (for the prototype I temporarily disabled it https://github.com/sumeerbhola/pebble/blob/sublevel/internal/manifest/version.go#L607-L631). Since we plan to support such flush splitting in the very near future, it is worthwhile to make the first production version of Pebble be compatible if it encounters such files. Also, we should look into how RocksDB, which historically had weaker consistency checking, handles such files -- if it permits them, our compatibility story is better.
I've attached a MANIFEST from one of the import runs that we can use, in addition to unit testing, to check that the consistency checking is compatible with the future Pebble.
MANIFEST1.gz
The text was updated successfully, but these errors were encountered: