-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/manifest: Relax SeqNum overlap invariant in L0 #591
internal/manifest: Relax SeqNum overlap invariant in L0 #591
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like what I was imagining. Mostly small comments.
Reviewable status: 0 of 5 files reviewed, 5 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)
internal/manifest/version.go, line 596 at r1 (raw file):
overlapRangeStartIdx = i } if f.SmallestSeqNum < smallestOverlapRangeFlushedSeqNum {
Nit: for cases where we're trying to find the smallest or largest, I like to arrange the comparison to match the subsequent assignment:
if smallestOverlapRangeFlushedSeqNum > f.SmallestSeqNum {
smallestOverlapRangeFlushedSeqNum = f.SmallestSeqNum
}
internal/manifest/version.go, line 605 at r1 (raw file):
// range and check for any key range overlaps in set of // files. overlapRange := expandOverlapRange(i-1)
I'm confused about what overlapRangeStartIdx
is being used for. It seems to either be unnecessary, or could be used to optimize expandOverlapRange
. Perhaps it is too late on Friday afternoon and I'm missing something.
internal/manifest/version.go, line 607 at r1 (raw file):
overlapRange := expandOverlapRange(i-1) overlapRangeCopy := make([]*FileMetadata, len(overlapRange)) copy(overlapRangeCopy, overlapRange)
Rather than having checkOverlapRangeForKeyOverlap
mutate the slice it is passed, I think it would be cleaner and safer to have that function make a copy internally. This would get rid of a small amount of duplication, and ensure that a new caller of checkOverlapRangeForKeyOverlap
doesn't have to be aware of this requirement.
internal/manifest/testdata/version_check_ordering, line 104 at r1 (raw file):
m.SET.20-n.SET.20 ---- L0 flushed file 000007 has an ingested file coincident with smallest seqnum: 15-17
I realize you didn't touch this error message in this change, but it reads a bit strangely. How does a "flushed file" have an "ingested file coincident"? I think this should be something like: L0 flushed file 000007 has smallest seqnum coincident with ingested file: 15 vs 15-17
.
tool/testdata/manifest_dump, line 173 at r1 (raw file):
added: L0 000002:0<#1-#4>[#0,DEL-#0,DEL] EOF pebble: internal error: L0 files 000002 and 000001 have overlapping key and seqnum ranges: #0,DEL-#0,DEL vs #0,DEL-#0,DEL, and 1-4 vs 2-5
I think we should use the format used in manifest dump
: <#SmallestSeq-#LargestSeq>[SmallestKey-LargestKey]
.
Something else I wanted to mention. We don't really document this anywhere, but in the commit/PR title, the bit before the colon (i.e. the "scope") is supposed to be the directory containing the major portion of the change. In this case it should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 7 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
internal/manifest/version.go, line 482 at r1 (raw file):
// // Partitioned flushes are currently not implemented in Pebble, but this case // is handled here for compatibility with future versions of pebble.
I was going to suggest adding something like:
Key-partitioned intra-L0 compactions, also not currently implemented, similarly result in overlapping seqnum files that satisfy the same invariant.
But then realized that the invariant there is even weaker and can't be checked without peering into the files, so inclined to say that we should not do the overlap checking below.
Consider the case of two sublevel key intervals [a, c] [d, e]
[a, c] [d, e]
f3[10,15] f4[11,16]
f1[0,5] f2[1,6]
An intra-L0 compaction is allowed to compact f2, f3, f4
which will produce an output file, say f5
with key range [a, e]
and seqnum range [1,16]
. Both the keys and the seqnum intervals overlap. There are no keys in f5
in the interval [a, c]
that overlap with the seqnum [0, 5]
which is why the sublevel invariant still holds.
___
*[internal/manifest/version.go, line 516 at r1](https://reviewable.io/reviews/cockroachdb/pebble/591#-M3fPPkT4B4dJYD0gKnJ:-M3fPPkT4B4dJYD0gKnK:b793nhx) ([raw file](https://github.com/cockroachdb/pebble/blob/92c153d65426f92e617cc0f864908de87d099429/internal/manifest/version.go#L516)):*
> ```Go
> if f.SmallestSeqNum == f.LargestSeqNum {
> // Ingested file.
> continue
> ```
why the special case for ingested files? Since the files are ordered with non-decreasing `LargestSeqNum` isn't it sufficient to just use the if-condition below for all files?
<!-- Sent from Reviewable.io -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 7 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
internal/manifest/version.go, line 482 at r1 (raw file):
Previously, sumeerbhola wrote…
I was going to suggest adding something like:
Key-partitioned intra-L0 compactions, also not currently implemented, similarly result in overlapping seqnum files that satisfy the same invariant.But then realized that the invariant there is even weaker and can't be checked without peering into the files, so inclined to say that we should not do the overlap checking below.
Consider the case of two sublevel key intervals [a, c] [d, e][a, c] [d, e] f3[10,15] f4[11,16] f1[0,5] f2[1,6]
An intra-L0 compaction is allowed to compact
f2, f3, f4
which will produce an output file, sayf5
with key range[a, e]
and seqnum range[1,16]
. Both the keys and the seqnum intervals overlap. There are no keys inf5
in the interval[a, c]
that overlap with the seqnum[0, 5]
which is why the sublevel invariant still holds.
Interesting. Seems like there are very few invariants that can be checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 7 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
internal/manifest/version.go, line 482 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Interesting. Seems like there are very few invariants that can be checked.
Yeah. That's also in line with rocksdb; it practically does no invariant checks in L0 (except that the files after a sort are sorted in the order the sort produces them in). I'll remove the overlap cases then, and keep the ingestion checks around.
92c153d
to
6593c22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reviews and for pointing this out @sumeerbhola ! I've significantly relaxed the seqnum overlap invariants now. Also fixed the PR title and description. PTAL!
Reviewable status: 0 of 7 files reviewed, 6 unresolved discussions (waiting on @jbowens, @petermattis, and @sumeerbhola)
internal/manifest/version.go, line 482 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Yeah. That's also in line with rocksdb; it practically does no invariant checks in L0 (except that the files after a sort are sorted in the order the sort produces them in). I'll remove the overlap cases then, and keep the ingestion checks around.
Done.
internal/manifest/version.go, line 516 at r1 (raw file):
Previously, sumeerbhola wrote…
why the special case for ingested files? Since the files are ordered with non-decreasing
LargestSeqNum
isn't it sufficient to just use the if-condition below for all files?
Not applicable anymore.
internal/manifest/version.go, line 596 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Nit: for cases where we're trying to find the smallest or largest, I like to arrange the comparison to match the subsequent assignment:
if smallestOverlapRangeFlushedSeqNum > f.SmallestSeqNum { smallestOverlapRangeFlushedSeqNum = f.SmallestSeqNum }
Not applicable anymore.
internal/manifest/version.go, line 605 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I'm confused about what
overlapRangeStartIdx
is being used for. It seems to either be unnecessary, or could be used to optimizeexpandOverlapRange
. Perhaps it is too late on Friday afternoon and I'm missing something.
Not applicable anymore.
internal/manifest/version.go, line 607 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Rather than having
checkOverlapRangeForKeyOverlap
mutate the slice it is passed, I think it would be cleaner and safer to have that function make a copy internally. This would get rid of a small amount of duplication, and ensure that a new caller ofcheckOverlapRangeForKeyOverlap
doesn't have to be aware of this requirement.
Not applicable anymore.
internal/manifest/testdata/version_check_ordering, line 104 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I realize you didn't touch this error message in this change, but it reads a bit strangely. How does a "flushed file" have an "ingested file coincident"? I think this should be something like:
L0 flushed file 000007 has smallest seqnum coincident with ingested file: 15 vs 15-17
.
Done. I always found that error message weird too, so it's pretty worthwhile to address it now.
tool/testdata/manifest_dump, line 173 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I think we should use the format used in
manifest dump
:<#SmallestSeq-#LargestSeq>[SmallestKey-LargestKey]
.
Not applicable anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 7 files reviewed, 4 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
internal/manifest/testdata/version_check_ordering, line 35 at r2 (raw file):
k.SET.16-n.SET.19 ---- OK
Are these new test cases interesting given that the invariant check is so minimal now?
internal/manifest/testdata/version_check_ordering, line 116 at r2 (raw file):
a.SET.3-d.SET.5 ---- L0 flushed file 000002 has smallest sequence number coincident with an ingested file : 3-5 vs 3
Per my other comment, perhaps we should display the sequence numbers here as <#3-#5>
and <#3-?>
or something like that.
tool/testdata/manifest_dump, line 173 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Not applicable anymore.
I think it might still be worthwhile to represent key ranges as [<start>-<end>]
(i.e. add the square braces here).
Changes Manifest.CheckOrdering to allow L0 SSTables to overlap in sequence numbers. Implemented for better compatibility with future versions of Pebble which could support partitioned flushes and L0 sublevels. Fixes cockroachdb#587.
6593c22
to
10a1170
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 8 files reviewed, 4 unresolved discussions (waiting on @jbowens, @petermattis, and @sumeerbhola)
internal/manifest/testdata/version_check_ordering, line 35 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Are these new test cases interesting given that the invariant check is so minimal now?
Not as much, but it's still worthwhile to show some of the real-world complexity that can emerge from this.
internal/manifest/testdata/version_check_ordering, line 116 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Per my other comment, perhaps we should display the sequence numbers here as
<#3-#5>
and<#3-?>
or something like that.
Done. That makes sense.
tool/testdata/manifest_dump, line 173 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I think it might still be worthwhile to represent key ranges as
[<start>-<end>]
(i.e. add the square braces here).
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 8 files reviewed, 2 unresolved discussions (waiting on @jbowens and @sumeerbhola)
TFTR! |
Changes Manifest.CheckOrdering to allow L0 SSTables to overlap
in sequence numbers.
Implemented for better compatibility with future versions of
Pebble which could support partitioned flushes and L0 sublevels.
Fixes #587 .