Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: change FileMetadata (Version) consistency check for L0 files to allow flushed sstables that have overlapping seqnums #587

Closed
sumeerbhola opened this issue Mar 25, 2020 · 3 comments · Fixed by #591
Assignees

Comments

@sumeerbhola
Copy link
Collaborator

The Pebble prototype to improve import performance using L0 sub-levels for compaction (and other compaction heuristics, in https://github.com/sumeerbhola/pebble/tree/sublevel and https://github.com/sumeerbhola/cockroach/tree/sublevel) showed very promising results (about 3.5x faster for a large TPCC import, compared to RocksDB -- import finished in ~65 min).

One of the changes this prototype relies on is splitting sstables being flushed based on split points provided by the L0SubLevels data-structure to avoid generating wide L0 files that block a sub-level from accepting more files. Without this change, the wide files overwhelm the ability to do narrow (in key space, and consequently bytes) compactions out of L0 into Lbase, and to do narrow intra-L0 compactions.

This change means flushed files have overlapping seqnums, which runs afoul of the current consistency checking (for the prototype I temporarily disabled it https://github.com/sumeerbhola/pebble/blob/sublevel/internal/manifest/version.go#L607-L631). Since we plan to support such flush splitting in the very near future, it is worthwhile to make the first production version of Pebble be compatible if it encounters such files. Also, we should look into how RocksDB, which historically had weaker consistency checking, handles such files -- if it permits them, our compatibility story is better.

I've attached a MANIFEST from one of the import runs that we can use, in addition to unit testing, to check that the consistency checking is compatible with the future Pebble.
MANIFEST1.gz

@petermattis
Copy link
Collaborator

@itsbilal This is something that it would be good to look at in this milestone.

@sumeerbhola Do you have thoughts on what the new consistency checks should look like? Perhaps something like: overlapping seqnum ranges in L0 sstables are allowed if the sstables do not overlap in key space. Is that sufficient?

@itsbilal
Copy link
Member

Sumeer confirmed the invariant @petermattis mentioned would work.

Thinking about this more, perfectly implementing an overlap seqnum range detector (and extending each seqnum range on both ends as necessary) and then checking for the key range invariant in all those ranges would be a bit complex in what is otherwise a quick ordering check.

Is it okay enough to just check the "adjacent" (in largest seqnum order) SSTs for not having overlapping key ranges if they have overlapping seqnum ranges? It wouldn't be a perfect check, but it'd catch the obvious errors.

Another, crazier idea I had was to build in some sort of pebble version detection, and relax invariants only if the version is higher than the current version. Presumably we'd also need to avoid overwriting higher version numbers in OPTIONS files.

@petermattis
Copy link
Collaborator

Is it okay enough to just check the "adjacent" (in largest seqnum order) SSTs for not having overlapping key ranges if they have overlapping seqnum ranges? It wouldn't be a perfect check, but it'd catch the obvious errors.

Not sure how you're using "adjacent" here. Since we already have L0 tables sorted by largest seqnum, I was imagining a loop over the sstables finding contiguous sets of tables which overlap in their seqnums. For each set we'd verify that the tables within that set do not overlap in their key ranges. Is this the complexity you are referring to? Yes, definitely more complex than the existing checks, but not horribly so.

Another, crazier idea I had was to build in some sort of pebble version detection, and relax invariants only if the version is higher than the current version. Presumably we'd also need to avoid overwriting higher version numbers in OPTIONS files.

I'm not sure how this would work, so it sounds scary to me.

itsbilal added a commit to itsbilal/pebble that referenced this issue Mar 27, 2020
Changes Manifest.CheckOrdering to allow L0 SSTables to overlap
in sequence numbers as long as every set of SSTables overlapping
in sequence numbers have no key range overlaps within them.

Implemented for better compatibility with future versions of
Pebble which could support partitioned flushes.

Fixes cockroachdb#587.
itsbilal added a commit to itsbilal/pebble that referenced this issue Mar 30, 2020
Changes Manifest.CheckOrdering to allow L0 SSTables to overlap
in sequence numbers.

Implemented for better compatibility with future versions of
Pebble which could support partitioned flushes and L0 sublevels.

Fixes cockroachdb#587.
itsbilal added a commit to itsbilal/pebble that referenced this issue Mar 30, 2020
Changes Manifest.CheckOrdering to allow L0 SSTables to overlap
in sequence numbers.

Implemented for better compatibility with future versions of
Pebble which could support partitioned flushes and L0 sublevels.

Fixes cockroachdb#587.
itsbilal added a commit that referenced this issue Mar 30, 2020
Changes Manifest.CheckOrdering to allow L0 SSTables to overlap
in sequence numbers.

Implemented for better compatibility with future versions of
Pebble which could support partitioned flushes and L0 sublevels.

Fixes #587.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants