-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: use delete-only compaction hints to do excises #3926
base: master
Are you sure you want to change the base?
Conversation
e473e96
to
6110932
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r1, all commit messages.
Reviewable status: 1 of 8 files reviewed, 6 unresolved discussions (waiting on @itsbilal)
compaction.go
line 2404 at r2 (raw file):
continue } if d.cmp(h.end, f.Smallest.UserKey) <= 0 || !h.canDeleteOrExcise(d.cmp, f, snapshots, d.FormatMajorVersion()) {
canDeleteOrExcise doesn't only check bounds: it checks sequence number and key types too. doesn't that mean we don't necessarily want to step forward to the next hint? I.e., the next file may also fall wholly within h
's bounds and it may be eligible for deletion.
compaction.go
line 2488 at r2 (raw file):
d.mu.Lock() defer d.mu.Unlock() return d.mu.versions.getNextFileNum()
should we update the nextFileNum to be an atomic and rid ourselves of the mutex concern (and the threading of the closure through parameters)?
compaction.go
line 2496 at r2 (raw file):
sort.Slice(c.deletionHints, func(i, j int) bool { return d.cmp(c.deletionHints[i].start, c.deletionHints[j].start) < 0 })
nit: use slices.SortFunc
ingest.go
line 1710 at r2 (raw file):
ve *versionEdit, level int, nextFileNumGetter func() base.FileNum,
previously excise
always held DB.mu. now the delete compaction body may call excise
without holding DB.mu
. I don't see anything, but just making sure excise
is truly agnostic to whether DB.mu
is held now.
Executing this function without the mutex honestly makes much more sense, since we're performing I/O to constrain the bounds. Maybe we should lift the off the *DB
, taking only what we need to make the relationship to DB
state more obvious.
table_stats.go
line 636 at r2 (raw file):
if hintSeqNum > file.SmallestSeqNum { hintSeqNum = file.SmallestSeqNum }
should we pull out the logic of the other branch that handles the hint type and avoids updating the hint sequence number for files that should not be affected by the hint?
6110932
to
28d400b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: 1 of 8 files reviewed, 5 unresolved discussions (waiting on @jbowens)
compaction.go
line 2404 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
canDeleteOrExcise doesn't only check bounds: it checks sequence number and key types too. doesn't that mean we don't necessarily want to step forward to the next hint? I.e., the next file may also fall wholly within
h
's bounds and it may be eligible for deletion.
Good catch and you're right. I had to significantly rework this method to address this, seeing as we basically had to go through every overlapping hint for a file before we could move on from a file.
I've moved towards a quadratic-ish function here (but only in particularly pathological cases), but also capped the number of deletion only compaction hints we resolve in one delete-only compaction to prevent us from scheduling really CPU-hot or massive delete-only compactions. PTAL!
compaction.go
line 2488 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
should we update the nextFileNum to be an atomic and rid ourselves of the mutex concern (and the threading of the closure through parameters)?
I left a TODO for that here, as we seem to update it in a lot of places and can be left to a quick follow-up PR.
compaction.go
line 2496 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: use
slices.SortFunc
Done.
ingest.go
line 1710 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
previously
excise
always held DB.mu. now the delete compaction body may callexcise
without holdingDB.mu
. I don't see anything, but just making sureexcise
is truly agnostic to whetherDB.mu
is held now.Executing this function without the mutex honestly makes much more sense, since we're performing I/O to constrain the bounds. Maybe we should lift the off the
*DB
, taking only what we need to make the relationship toDB
state more obvious.
Yeah, d.excise is now mu-agnostic. The only case where we needed the mutex was if we were getting the next file num, which is also abstracted out now. Clarified in a comment.
I tried just removing *DB as a receiver but I had to add a whole bunch of new arguments to this method, from options to table cache to newIters to comparer, so I figured it's best to just leave it as-is.
table_stats.go
line 636 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
should we pull out the logic of the other branch that handles the hint type and avoids updating the hint sequence number for files that should not be affected by the hint?
Good catch! Done - the hint seqnum updating is now also conditional on key type in this branch.
Previously, itsbilal (Bilal Akhtar) wrote…
Maybe what we need is a min heap of files (sorted by file smallest keys), and the iteration to work through a sorted list of hints. If we part-excise a file, we add its remnants to the heap. I'll try to think of whether that has any edge cases that I couldn't think of, or if that's the strictly better solution |
28d400b
to
f6be059
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reviewing runDeleteOnlyCompactionForLevel
Reviewed 2 of 6 files at r1, 1 of 3 files at r2, 1 of 3 files at r3, all commit messages.
Reviewable status: 5 of 8 files reviewed, 8 unresolved discussions (waiting on @itsbilal)
compaction.go
line 2395 at r3 (raw file):
snapshots compact.Snapshots, ) (newFiles []manifest.NewFileEntry, err error) { if !h.canDeleteOrExcise(d.cmp, f, snapshots, d.FormatMajorVersion()) {
nit: I think we should return an enum from canDeleteOrExcise
telling us whether we can delete the file, excise the file or neither. always nice to avoid the cost of the extra key comparisons, but more importantly we avoid duplicating fiddly bounds checking logic
compaction.go
line 2413 at r3 (raw file):
// to use d.excise. if d.FormatMajorVersion() < FormatVirtualSSTables { // A future hint will apply to this file and delete it in its entirety.
nit: is this comment meant to say "A future compaction will find the entirety of this file deleted by range tombstones." or the like?
compaction.go
line 2425 at r3 (raw file):
FileNum: f.FileNum, }]; !ok { // This hint did not touch this file.
Can you add a comment describing under what circumstances this can happen?
compaction.go
line 2443 at r3 (raw file):
iter := cl.files.Iter() // Outer loop loops on files. Middle lop loops on fragments. Inner loop
nit: s/lop/loop/
f6be059
to
9e12f85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: 4 of 8 files reviewed, 6 unresolved discussions (waiting on @jbowens)
compaction.go
line 2395 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: I think we should return an enum from
canDeleteOrExcise
telling us whether we can delete the file, excise the file or neither. always nice to avoid the cost of the extra key comparisons, but more importantly we avoid duplicating fiddly bounds checking logic
Done.
compaction.go
line 2413 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: is this comment meant to say "A future compaction will find the entirety of this file deleted by range tombstones." or the like?
I just meant that one of the other hints must be deleting the entirety of this file, otherwise we wouldn't have picked it for this delete-only compaction. If we don't find such a hint, the caller will panic anyway.
compaction.go
line 2425 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Can you add a comment describing under what circumstances this can happen?
Done - removed this case as it should be impossible. Artifact from a previous iteration.
compaction.go
line 2443 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: s/lop/loop/
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 8 files reviewed, 6 unresolved discussions (waiting on @itsbilal)
compaction.go
line 2488 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
I left a TODO for that here, as we seem to update it in a lot of places and can be left to a quick follow-up PR.
nit: it'd reduce the code churn within this PR (since this would no longer need to pass a func into excise
) to do it first (and can still be a quick precursor PR)
compaction.go
line 2413 at r3 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
I just meant that one of the other hints must be deleting the entirety of this file, otherwise we wouldn't have picked it for this delete-only compaction. If we don't find such a hint, the caller will panic anyway.
I'm still confused by this comment then. Wouldn't that have been a different hint? And this hint would be hintDoesNotApply
? because we do:
if fmv < FormatVirtualSSTables {
// The file's keys must be completely contained within the hint range; excises
// aren't allowed.
return hintDoesNotApply
}
seems like this should be a panic right here.
compaction.go
line 2530 at r4 (raw file):
} type deleteCompactionHintFragment struct {
i think we need some comments describing what this fragmentation is and what it's achieving. I didn't see that anywhere and it's not obvious to me
9e12f85
to
1be57fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: 2 of 9 files reviewed, 5 unresolved discussions (waiting on @jbowens)
compaction.go
line 2488 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: it'd reduce the code churn within this PR (since this would no longer need to pass a func into
excise
) to do it first (and can still be a quick precursor PR)
Done #3986 , and rebased.
compaction.go
line 2413 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I'm still confused by this comment then. Wouldn't that have been a different hint? And this hint would be
hintDoesNotApply
? because we do:if fmv < FormatVirtualSSTables { // The file's keys must be completely contained within the hint range; excises // aren't allowed. return hintDoesNotApply }
seems like this should be a panic right here.
Yeah, now that we do hintDoesNotApply in this case, this should be a panic. The idea was that if you can't virtualize sstables, and you see a hint that partially overlaps with a file, you skip it because you'll definitely see another hint that fully overlaps with that file later on (otherwise we wouldn't have scheduled this file in the compaction to begin with). And if you didn't, the caller already has a "did I try deleting every single file in the compaction" check that would panic.
It's all a little easier to reason about if you think of delete-only compactions as how they were before this PR. They didn't have any hints in them, they just blindly deleted every file they were created with.
Turned this into a panic now.
compaction.go
line 2530 at r4 (raw file):
Previously, jbowens (Jackson Owens) wrote…
i think we need some comments describing what this fragmentation is and what it's achieving. I didn't see that anywhere and it's not obvious to me
Added one. Let me know if that makes sense. The idea is to simplify the iteration instead of having to backtrack through the hints (or binary search through a list for every file) to answer "what all hints apply to this file"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What prevents us from excising for very small ranges, potentially leading to a ton of small vssts over time? Is it that a delete-only compaction hint only exists if there is a range-del that encompasses an entire file underneath (so that we eliminate at least a table for every excise?)
It feels like maybe we want to have a heuristic along the lines of: only do a delete-only compaction if the total number of ssts doesn't increase? (so e.g. eliminating one sst while splitting 4 other ssts into two each would not happen).
Reviewable status: 2 of 9 files reviewed, 11 unresolved discussions (waiting on @itsbilal and @jbowens)
compaction.go
line 1927 at r5 (raw file):
// hintDoesNotApply indicates that the hint does not apply to the file. hintDoesNotApply deletionHintOverlap = iota // hintExcisesFile indicates that the hint excises a portion of the file.
[nit] and the database format supports excises.
compaction.go
line 1997 at r5 (raw file):
return hintDoesNotApply } // Check for any overlap. In cases of partial overlap, we excise the part of the file
[nit] we can excise
compaction.go
line 2006 at r5 (raw file):
} func checkDeleteCompactionHints(
[nit] could use a comment, especially explaining the return values
compaction.go
line 2016 at r5 (raw file):
var byLevel [numLevels][]*fileMetadata // Deletion only compactions can be quadratic (O(mn)) in terms of runtime
[nit] Delete-only
compaction.go
line 2079 at r5 (raw file):
iter := overlaps.Iter() for m := iter.First(); m != nil; m = iter.Next() { if m.IsCompacting() || h.canDeleteOrExcise(cmp, m, snapshots, fmv) == hintDoesNotApply || files[m] {
Should we check/assert that for every hint, there is at least one file with hintDeletesFile
?
compaction.go
line 2404 at r5 (raw file):
} func (d *DB) applyHintOnFile(
[nit] needs a comment
Previously, we'd only act on delete-only compaction hints if the entirety of a file was covered by a delete-only compaction hint. This change leverages the fact that we can now partially excise files using virtual sstables, to also use delete-only compactions to do excises of partial sstables and reduce the w-amp impact of writing rangedels and rangekeydels. Fixes cockroachdb#3820.
1be57fc
to
e3fe2c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RaduBerinde Good point. I was hoping to do some size-based heuristic as opposed to an sstable count one, but a count one is easier to implement like you've described so I've gone ahead and implemented it. We should only be scheduling delete-only compactions if the sstable count for resolving a given hint goes down or stays flat. PTAL!
Reviewable status: 2 of 9 files reviewed, 6 unresolved discussions (waiting on @jbowens and @RaduBerinde)
compaction.go
line 1927 at r5 (raw file):
Previously, RaduBerinde wrote…
[nit] and the database format supports excises.
Done.
compaction.go
line 1997 at r5 (raw file):
Previously, RaduBerinde wrote…
[nit] we can excise
Done.
compaction.go
line 2006 at r5 (raw file):
Previously, RaduBerinde wrote…
[nit] could use a comment, especially explaining the return values
Done.
compaction.go
line 2016 at r5 (raw file):
Previously, RaduBerinde wrote…
[nit] Delete-only
Done.
compaction.go
line 2079 at r5 (raw file):
Previously, RaduBerinde wrote…
Should we check/assert that for every hint, there is at least one file with
hintDeletesFile
?
That shouldn't be an issue even if it were to happen. Realized that it would actually happen a fair bit right now because we throw all resolved hints into the delete-only compaction.
compaction.go
line 2404 at r5 (raw file):
Previously, RaduBerinde wrote…
[nit] needs a comment
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this is great! I still need to look some more at the implementation.
Reviewable status: 2 of 9 files reviewed, 6 unresolved discussions (waiting on @jbowens)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. Nice work!
As always, all nits are optional suggestions.
Reviewable status: 2 of 9 files reviewed, 17 unresolved discussions (waiting on @itsbilal and @jbowens)
compaction.go
line 2530 at r4 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Added one. Let me know if that makes sense. The idea is to simplify the iteration instead of having to backtrack through the hints (or binary search through a list for every file) to answer "what all hints apply to this file"
Could use a comment on the struct as well, maybe
// deleteCompactionHintFragment represents a fragment of the key space and
// contains a set of deleteCompactionHints that apply to that fragment; a
// fragment starts at the start field and ends where the next fragment starts.
compaction.go
line 2105 at r6 (raw file):
// file on the right, decrement the counter once. if cmp(h.start, m.Smallest.UserKey) > 0 { filesDeletedByCurrentHint--
[nit] to make it easier to follow, we could have a separate virtualFilesAddedByCurrentHint
counter. The condition will become super clear: if filesDeletedByCurrentHint < virtualFilesAddedByCurrentHint
compaction.go
line 2107 at r6 (raw file):
filesDeletedByCurrentHint-- } if base.UserKeyExclusive(h.end).CompareUpperBounds(cmp, m.UserKeyBounds().End) < 0 {
m.UserKeyBounds().End.IsUpperBoundFor(h.end)
compaction.go
line 2502 at r6 (raw file):
curFragment := 0 iter := cl.files.Iter()
[nit] perhaps assert that cl.level > 0?
compaction.go
line 2509 at r6 (raw file):
for f := iter.First(); f != nil; f = iter.Next() { // curFile usually matches f, except if f got excised in which case // it maps to the "current state" of f.
[nit] maps to a virtual file that replaces f, or nil if f got removed in its entirety
compaction.go
line 2511 at r6 (raw file):
// it maps to the "current state" of f. curFile := f for curFragment < len(fragments) && d.cmp(fragments[curFragment].start, f.Smallest.UserKey) <= 0 {
[nit] It would be cleaner if curFragment
never went back; we could do curFragment < len()-1 && d.cmp(fragments[curFragment+1].start
here; below we can do
for {
if curFragment == len(fragments)-1 || !f.UserKeyBounds().End.IsUpperBound(fragments[curFragment+1].start) {
break
}
curFragment++
..
compaction.go
line 2539 at r6 (raw file):
return err } if _, ok := ve.DeletedFiles[manifest.DeletedFileEntry{Level: cl.level, FileNum: curFile.FileNum}]; ok {
Couldn't we do this at the very end?
ve.NewFiles = slices.DeleteFunc(ve.NewFiles, func(e manifest.NewFileEntry) bool {
deletedFileEntry := manifest.DeletedFileEntry{Level: cl.level, FileNum: e.Meta.FileNum}
if _, deleted := ve.DeletedFiles[deletedFileEntry]; deleted {
delete(ve.DeletedFiles, deletedFileEntry)
return true
}
return false
})
This could actually be a method on VersionEdit
(with a unit test).
compaction.go
line 2595 at r6 (raw file):
return cmp(i.start, j.start) }) j := 0
[nit] fragments = slices.CompactFunc(fragments, ...)
table_stats.go
line 559 at r6 (raw file):
iter := overlaps.Iter() for file := iter.First(); file != nil; file = iter.Next() { startCmp := d.cmp(start, file.Smallest.UserKey)
[nit] can move these close to where they are first used
table_stats.go
line 605 at r6 (raw file):
hintSeqNum = file.SmallestSeqNum } } else if d.cmp(file.Smallest.UserKey, end) <= 0 && d.cmp(start, file.Largest.UserKey) <= 0 {
Shouldn't the first one be < 0
? And the second one should check if Largest is exclusive?
testdata/compaction_delete_only_hints
line 468 at r6 (raw file):
first next next
[nit] remove the last two next
s
testdata/compaction_delete_only_hints
line 495 at r6 (raw file):
first next next
ditto
testdata/compaction_delete_only_hints
line 571 at r6 (raw file):
first next next
ditto
Previously, we'd only act on delete-only compaction hints if the entirety of a file was covered by a delete-only compaction hint. This change leverages the fact that we can now partially excise files using virtual sstables, to also use delete-only compactions to do excises of partial sstables and reduce the w-amp impact of writing rangedels and rangekeydels.
Fixes #3820.