Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: use delete-only compaction hints to do excises #3926

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

itsbilal
Copy link
Member

@itsbilal itsbilal commented Sep 6, 2024

Previously, we'd only act on delete-only compaction hints if the entirety of a file was covered by a delete-only compaction hint. This change leverages the fact that we can now partially excise files using virtual sstables, to also use delete-only compactions to do excises of partial sstables and reduce the w-amp impact of writing rangedels and rangekeydels.

Fixes #3820.

@itsbilal itsbilal requested a review from a team as a code owner September 6, 2024 20:12
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 6 files at r1, all commit messages.
Reviewable status: 1 of 8 files reviewed, 6 unresolved discussions (waiting on @itsbilal)


compaction.go line 2404 at r2 (raw file):

			continue
		}
		if d.cmp(h.end, f.Smallest.UserKey) <= 0 || !h.canDeleteOrExcise(d.cmp, f, snapshots, d.FormatMajorVersion()) {

canDeleteOrExcise doesn't only check bounds: it checks sequence number and key types too. doesn't that mean we don't necessarily want to step forward to the next hint? I.e., the next file may also fall wholly within h's bounds and it may be eligible for deletion.


compaction.go line 2488 at r2 (raw file):

		d.mu.Lock()
		defer d.mu.Unlock()
		return d.mu.versions.getNextFileNum()

should we update the nextFileNum to be an atomic and rid ourselves of the mutex concern (and the threading of the closure through parameters)?


compaction.go line 2496 at r2 (raw file):

	sort.Slice(c.deletionHints, func(i, j int) bool {
		return d.cmp(c.deletionHints[i].start, c.deletionHints[j].start) < 0
	})

nit: use slices.SortFunc


ingest.go line 1710 at r2 (raw file):

	ve *versionEdit,
	level int,
	nextFileNumGetter func() base.FileNum,

previously excise always held DB.mu. now the delete compaction body may call excise without holding DB.mu. I don't see anything, but just making sure excise is truly agnostic to whether DB.mu is held now.

Executing this function without the mutex honestly makes much more sense, since we're performing I/O to constrain the bounds. Maybe we should lift the off the *DB, taking only what we need to make the relationship to DB state more obvious.


table_stats.go line 636 at r2 (raw file):

				if hintSeqNum > file.SmallestSeqNum {
					hintSeqNum = file.SmallestSeqNum
				}

should we pull out the logic of the other branch that handles the hint type and avoids updating the hint sequence number for files that should not be affected by the hint?

Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: 1 of 8 files reviewed, 5 unresolved discussions (waiting on @jbowens)


compaction.go line 2404 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

canDeleteOrExcise doesn't only check bounds: it checks sequence number and key types too. doesn't that mean we don't necessarily want to step forward to the next hint? I.e., the next file may also fall wholly within h's bounds and it may be eligible for deletion.

Good catch and you're right. I had to significantly rework this method to address this, seeing as we basically had to go through every overlapping hint for a file before we could move on from a file.

I've moved towards a quadratic-ish function here (but only in particularly pathological cases), but also capped the number of deletion only compaction hints we resolve in one delete-only compaction to prevent us from scheduling really CPU-hot or massive delete-only compactions. PTAL!


compaction.go line 2488 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

should we update the nextFileNum to be an atomic and rid ourselves of the mutex concern (and the threading of the closure through parameters)?

I left a TODO for that here, as we seem to update it in a lot of places and can be left to a quick follow-up PR.


compaction.go line 2496 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: use slices.SortFunc

Done.


ingest.go line 1710 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

previously excise always held DB.mu. now the delete compaction body may call excise without holding DB.mu. I don't see anything, but just making sure excise is truly agnostic to whether DB.mu is held now.

Executing this function without the mutex honestly makes much more sense, since we're performing I/O to constrain the bounds. Maybe we should lift the off the *DB, taking only what we need to make the relationship to DB state more obvious.

Yeah, d.excise is now mu-agnostic. The only case where we needed the mutex was if we were getting the next file num, which is also abstracted out now. Clarified in a comment.

I tried just removing *DB as a receiver but I had to add a whole bunch of new arguments to this method, from options to table cache to newIters to comparer, so I figured it's best to just leave it as-is.


table_stats.go line 636 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

should we pull out the logic of the other branch that handles the hint type and avoids updating the hint sequence number for files that should not be affected by the hint?

Good catch! Done - the hint seqnum updating is now also conditional on key type in this branch.

@itsbilal
Copy link
Member Author

compaction.go line 2404 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Good catch and you're right. I had to significantly rework this method to address this, seeing as we basically had to go through every overlapping hint for a file before we could move on from a file.

I've moved towards a quadratic-ish function here (but only in particularly pathological cases), but also capped the number of deletion only compaction hints we resolve in one delete-only compaction to prevent us from scheduling really CPU-hot or massive delete-only compactions. PTAL!

Maybe what we need is a min heap of files (sorted by file smallest keys), and the iteration to work through a sorted list of hints. If we part-excise a file, we add its remnants to the heap. I'll try to think of whether that has any edge cases that I couldn't think of, or if that's the strictly better solution

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing runDeleteOnlyCompactionForLevel

Reviewed 2 of 6 files at r1, 1 of 3 files at r2, 1 of 3 files at r3, all commit messages.
Reviewable status: 5 of 8 files reviewed, 8 unresolved discussions (waiting on @itsbilal)


compaction.go line 2395 at r3 (raw file):

	snapshots compact.Snapshots,
) (newFiles []manifest.NewFileEntry, err error) {
	if !h.canDeleteOrExcise(d.cmp, f, snapshots, d.FormatMajorVersion()) {

nit: I think we should return an enum from canDeleteOrExcise telling us whether we can delete the file, excise the file or neither. always nice to avoid the cost of the extra key comparisons, but more importantly we avoid duplicating fiddly bounds checking logic


compaction.go line 2413 at r3 (raw file):

	// to use d.excise.
	if d.FormatMajorVersion() < FormatVirtualSSTables {
		// A future hint will apply to this file and delete it in its entirety.

nit: is this comment meant to say "A future compaction will find the entirety of this file deleted by range tombstones." or the like?


compaction.go line 2425 at r3 (raw file):

		FileNum: f.FileNum,
	}]; !ok {
		// This hint did not touch this file.

Can you add a comment describing under what circumstances this can happen?


compaction.go line 2443 at r3 (raw file):

	iter := cl.files.Iter()

	// Outer loop loops on files. Middle lop loops on fragments. Inner loop

nit: s/lop/loop/

Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: 4 of 8 files reviewed, 6 unresolved discussions (waiting on @jbowens)


compaction.go line 2395 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: I think we should return an enum from canDeleteOrExcise telling us whether we can delete the file, excise the file or neither. always nice to avoid the cost of the extra key comparisons, but more importantly we avoid duplicating fiddly bounds checking logic

Done.


compaction.go line 2413 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: is this comment meant to say "A future compaction will find the entirety of this file deleted by range tombstones." or the like?

I just meant that one of the other hints must be deleting the entirety of this file, otherwise we wouldn't have picked it for this delete-only compaction. If we don't find such a hint, the caller will panic anyway.


compaction.go line 2425 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Can you add a comment describing under what circumstances this can happen?

Done - removed this case as it should be impossible. Artifact from a previous iteration.


compaction.go line 2443 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: s/lop/loop/

Done.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 8 files reviewed, 6 unresolved discussions (waiting on @itsbilal)


compaction.go line 2488 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I left a TODO for that here, as we seem to update it in a lot of places and can be left to a quick follow-up PR.

nit: it'd reduce the code churn within this PR (since this would no longer need to pass a func into excise) to do it first (and can still be a quick precursor PR)


compaction.go line 2413 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I just meant that one of the other hints must be deleting the entirety of this file, otherwise we wouldn't have picked it for this delete-only compaction. If we don't find such a hint, the caller will panic anyway.

I'm still confused by this comment then. Wouldn't that have been a different hint? And this hint would be hintDoesNotApply? because we do:

if fmv < FormatVirtualSSTables {
		// The file's keys must be completely contained within the hint range; excises
		// aren't allowed.
		return hintDoesNotApply
}

seems like this should be a panic right here.


compaction.go line 2530 at r4 (raw file):

}

type deleteCompactionHintFragment struct {

i think we need some comments describing what this fragmentation is and what it's achieving. I didn't see that anywhere and it's not obvious to me

Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: 2 of 9 files reviewed, 5 unresolved discussions (waiting on @jbowens)


compaction.go line 2488 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: it'd reduce the code churn within this PR (since this would no longer need to pass a func into excise) to do it first (and can still be a quick precursor PR)

Done #3986 , and rebased.


compaction.go line 2413 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I'm still confused by this comment then. Wouldn't that have been a different hint? And this hint would be hintDoesNotApply? because we do:

if fmv < FormatVirtualSSTables {
		// The file's keys must be completely contained within the hint range; excises
		// aren't allowed.
		return hintDoesNotApply
}

seems like this should be a panic right here.

Yeah, now that we do hintDoesNotApply in this case, this should be a panic. The idea was that if you can't virtualize sstables, and you see a hint that partially overlaps with a file, you skip it because you'll definitely see another hint that fully overlaps with that file later on (otherwise we wouldn't have scheduled this file in the compaction to begin with). And if you didn't, the caller already has a "did I try deleting every single file in the compaction" check that would panic.

It's all a little easier to reason about if you think of delete-only compactions as how they were before this PR. They didn't have any hints in them, they just blindly deleted every file they were created with.

Turned this into a panic now.


compaction.go line 2530 at r4 (raw file):

Previously, jbowens (Jackson Owens) wrote…

i think we need some comments describing what this fragmentation is and what it's achieving. I didn't see that anywhere and it's not obvious to me

Added one. Let me know if that makes sense. The idea is to simplify the iteration instead of having to backtrack through the hints (or binary search through a list for every file) to answer "what all hints apply to this file"

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prevents us from excising for very small ranges, potentially leading to a ton of small vssts over time? Is it that a delete-only compaction hint only exists if there is a range-del that encompasses an entire file underneath (so that we eliminate at least a table for every excise?)

It feels like maybe we want to have a heuristic along the lines of: only do a delete-only compaction if the total number of ssts doesn't increase? (so e.g. eliminating one sst while splitting 4 other ssts into two each would not happen).

Reviewable status: 2 of 9 files reviewed, 11 unresolved discussions (waiting on @itsbilal and @jbowens)


compaction.go line 1927 at r5 (raw file):

	// hintDoesNotApply indicates that the hint does not apply to the file.
	hintDoesNotApply deletionHintOverlap = iota
	// hintExcisesFile indicates that the hint excises a portion of the file.

[nit] and the database format supports excises.


compaction.go line 1997 at r5 (raw file):

		return hintDoesNotApply
	}
	// Check for any overlap. In cases of partial overlap, we excise the part of the file

[nit] we can excise


compaction.go line 2006 at r5 (raw file):

}

func checkDeleteCompactionHints(

[nit] could use a comment, especially explaining the return values


compaction.go line 2016 at r5 (raw file):

	var byLevel [numLevels][]*fileMetadata

	// Deletion only compactions can be quadratic (O(mn)) in terms of runtime

[nit] Delete-only


compaction.go line 2079 at r5 (raw file):

			iter := overlaps.Iter()
			for m := iter.First(); m != nil; m = iter.Next() {
				if m.IsCompacting() || h.canDeleteOrExcise(cmp, m, snapshots, fmv) == hintDoesNotApply || files[m] {

Should we check/assert that for every hint, there is at least one file with hintDeletesFile?


compaction.go line 2404 at r5 (raw file):

}

func (d *DB) applyHintOnFile(

[nit] needs a comment

Previously, we'd only act on delete-only compaction hints if
the entirety of a file was covered by a delete-only compaction
hint. This change leverages the fact that we can now partially
excise files using virtual sstables, to also use delete-only
compactions to do excises of partial sstables and reduce the
w-amp impact of writing rangedels and rangekeydels.

Fixes cockroachdb#3820.
Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RaduBerinde Good point. I was hoping to do some size-based heuristic as opposed to an sstable count one, but a count one is easier to implement like you've described so I've gone ahead and implemented it. We should only be scheduling delete-only compactions if the sstable count for resolving a given hint goes down or stays flat. PTAL!

Reviewable status: 2 of 9 files reviewed, 6 unresolved discussions (waiting on @jbowens and @RaduBerinde)


compaction.go line 1927 at r5 (raw file):

Previously, RaduBerinde wrote…

[nit] and the database format supports excises.

Done.


compaction.go line 1997 at r5 (raw file):

Previously, RaduBerinde wrote…

[nit] we can excise

Done.


compaction.go line 2006 at r5 (raw file):

Previously, RaduBerinde wrote…

[nit] could use a comment, especially explaining the return values

Done.


compaction.go line 2016 at r5 (raw file):

Previously, RaduBerinde wrote…

[nit] Delete-only

Done.


compaction.go line 2079 at r5 (raw file):

Previously, RaduBerinde wrote…

Should we check/assert that for every hint, there is at least one file with hintDeletesFile?

That shouldn't be an issue even if it were to happen. Realized that it would actually happen a fair bit right now because we throw all resolved hints into the delete-only compaction.


compaction.go line 2404 at r5 (raw file):

Previously, RaduBerinde wrote…

[nit] needs a comment

Done.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is great! I still need to look some more at the implementation.

Reviewable status: 2 of 9 files reviewed, 6 unresolved discussions (waiting on @jbowens)

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Sorry for the delay. Nice work!

As always, all nits are optional suggestions.

Reviewable status: 2 of 9 files reviewed, 17 unresolved discussions (waiting on @itsbilal and @jbowens)


compaction.go line 2530 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Added one. Let me know if that makes sense. The idea is to simplify the iteration instead of having to backtrack through the hints (or binary search through a list for every file) to answer "what all hints apply to this file"

Could use a comment on the struct as well, maybe

// deleteCompactionHintFragment represents a fragment of the key space and
// contains a set of deleteCompactionHints that apply to that fragment; a
// fragment starts at the start field and ends where the next fragment starts.

compaction.go line 2105 at r6 (raw file):

					// file on the right, decrement the counter once.
					if cmp(h.start, m.Smallest.UserKey) > 0 {
						filesDeletedByCurrentHint--

[nit] to make it easier to follow, we could have a separate virtualFilesAddedByCurrentHint counter. The condition will become super clear: if filesDeletedByCurrentHint < virtualFilesAddedByCurrentHint


compaction.go line 2107 at r6 (raw file):

						filesDeletedByCurrentHint--
					}
					if base.UserKeyExclusive(h.end).CompareUpperBounds(cmp, m.UserKeyBounds().End) < 0 {

m.UserKeyBounds().End.IsUpperBoundFor(h.end)


compaction.go line 2502 at r6 (raw file):

	curFragment := 0
	iter := cl.files.Iter()

[nit] perhaps assert that cl.level > 0?


compaction.go line 2509 at r6 (raw file):

	for f := iter.First(); f != nil; f = iter.Next() {
		// curFile usually matches f, except if f got excised in which case
		// it maps to the "current state" of f.

[nit] maps to a virtual file that replaces f, or nil if f got removed in its entirety


compaction.go line 2511 at r6 (raw file):

		// it maps to the "current state" of f.
		curFile := f
		for curFragment < len(fragments) && d.cmp(fragments[curFragment].start, f.Smallest.UserKey) <= 0 {

[nit] It would be cleaner if curFragment never went back; we could do curFragment < len()-1 && d.cmp(fragments[curFragment+1].start here; below we can do

for {
  if curFragment == len(fragments)-1 || !f.UserKeyBounds().End.IsUpperBound(fragments[curFragment+1].start) {
    break
  }
  curFragment++
  ..

compaction.go line 2539 at r6 (raw file):

					return err
				}
				if _, ok := ve.DeletedFiles[manifest.DeletedFileEntry{Level: cl.level, FileNum: curFile.FileNum}]; ok {

Couldn't we do this at the very end?

	ve.NewFiles = slices.DeleteFunc(ve.NewFiles, func(e manifest.NewFileEntry) bool {
		deletedFileEntry := manifest.DeletedFileEntry{Level: cl.level, FileNum: e.Meta.FileNum}
		if _, deleted := ve.DeletedFiles[deletedFileEntry]; deleted {
			delete(ve.DeletedFiles, deletedFileEntry)
			return true
		}
		return false
	})

This could actually be a method on VersionEdit (with a unit test).


compaction.go line 2595 at r6 (raw file):

		return cmp(i.start, j.start)
	})
	j := 0

[nit] fragments = slices.CompactFunc(fragments, ...)


table_stats.go line 559 at r6 (raw file):

		iter := overlaps.Iter()
		for file := iter.First(); file != nil; file = iter.Next() {
			startCmp := d.cmp(start, file.Smallest.UserKey)

[nit] can move these close to where they are first used


table_stats.go line 605 at r6 (raw file):

					hintSeqNum = file.SmallestSeqNum
				}
			} else if d.cmp(file.Smallest.UserKey, end) <= 0 && d.cmp(start, file.Largest.UserKey) <= 0 {

Shouldn't the first one be < 0? And the second one should check if Largest is exclusive?


testdata/compaction_delete_only_hints line 468 at r6 (raw file):

first
next
next

[nit] remove the last two nexts


testdata/compaction_delete_only_hints line 495 at r6 (raw file):

first
next
next

ditto


testdata/compaction_delete_only_hints line 571 at r6 (raw file):

first
next
next

ditto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

db: add ability to excise without ingest
4 participants