-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: refactor merging iterator's prefix handling for TrySeekUsingNext #2178
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Thanks for sprinkling the extra documentation around too.
- though perhaps wait for a second set of 👀.
Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
-- commits
line 18 at r1:
nit: final ,
-> .
.
merging_iter.go
line 559 at r1 (raw file):
if m.prefix != nil { n := m.split(key) return !bytes.Equal(m.prefix, key[:n])
nit: I realize the equality check is sufficient here. However, in a strict sense, isn't this method behaving more like equalsPrefix
? Worth updating the method, or actually doing the comparison for prefix(key) > m.prefix
?
a502d7a
to
45ba088
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new invariant checking in nextEntry
revealed another problem. Range key masking may result in the interleaving iterator Next-ing beyond the current iteration prefix. I made a fix to avoid Next-ing beyond a masked point if the masked point is already beyond the iteration prefix. I also documented this new variant we're adding on the internal iterator comment. I expect this change to have a large benefit to MVCCGets that are retrieving within a large swath of masked point keys. Previously, we'd next until we exhausted the merging iterator stopped surfacing keys or we found a non-masked point key. I'll try to get some numbers from a benchmark today.
Reviewable status: 6 of 9 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @nicktrav, and @sumeerbhola)
merging_iter.go
line 559 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
nit: I realize the equality check is sufficient here. However, in a strict sense, isn't this method behaving more like
equalsPrefix
? Worth updating the method, or actually doing the comparison forprefix(key) > m.prefix
?
adjusted to unequalPrefix
. I want to keep it in the negative (eg, as opposed to equalsPrefix
) since if we're not in prefix iteration mode, it's a little nonsensical to return true
from equalsPrefix
.
d04149b
to
41160ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like that the code is simpler and we're imposing stronger invariants.
My only questions are related to performance.
Reviewed 2 of 3 files at r2, 1 of 2 files at r3, all commit messages.
Reviewable status: 8 of 10 files reviewed, 5 unresolved discussions (waiting on @itsbilal, @jbowens, and @nicktrav)
-- commits
line 38 at r3:
is b) rare?
This was a concern raised in #2151 (review)
merging_iter.go
line 556 at r3 (raw file):
// unequalPrefix returns true if the iterator is in prefix iteration mode and // the provided key does not contain the current iteration prefix. func (m *mergingIter) unequalPrefix(key []byte) bool {
is this performance sensitive? If yes, does this inline?
merging_iter.go
line 568 at r3 (raw file):
// exhausts the iterator by resetting the heap to empty if the iteration prefix // has already been exceeded. func (m *mergingIter) maybeNextEntry(item *mergingIterItem) {
same inlining question.
41160ce
to
aaa62c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, also really like that the code is simpler now in findNextEntry
.
Reviewed 5 of 6 files at r1, 3 of 3 files at r2, 1 of 2 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @jbowens and @nicktrav)
Previously, nicktrav (Nick Travers) wrote…
nit: final
,
->.
.
also nit: missing #2151.
internal/keyspan/interleaving_iter.go
line 692 at r4 (raw file):
// iterating too far beyond the prefix when there's a // broad swath of point keys masked by a range key. if i.cmp(i.pointKey.UserKey, i.nextPrefixBuf) >= 0 {
Nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @nicktrav and @sumeerbhola)
Previously, sumeerbhola wrote…
is b) rare?
This was a concern raised in #2151 (review)
i'm not sure. i originally thought it was because snapshots are rare in Cockroach (and prefix iteration through a snapshot more rare, I believe), and the mutable memtable keys make up a small portion of the keys in a database. in thinking about it some more, i realized that successful bloom filter exclusions make it more likely that the next key in the mutable memtable will be a heap root.
an alternative: we require InternalIterator
to return keys within the iteration prefix. if n levels are unable to efficiently exclude the level through bloom filter checks, etc, then we must perform n new prefix equality checks. however if k prefix equality checks then exclude those levels, we reduce the size of the merging iterator heap to n-k, which reduces the heapify key comparisons by O(k).
in the (typical?) case that there are 0 or 1 keys with the prefix (even if n is large), there's no regression. instead of our current O(n) heapify key comparisons plus one prefix check in the top-level Iterator, we'd perform O(n) faster prefix checks.
in the case where n-k ≥ 2 (multiple levels contain keys with the iteration prefix), i don't see how we'd avoid some regression, at least on seeks that do not then scan the remainder of the prefix.
merging_iter.go
line 556 at r3 (raw file):
Previously, sumeerbhola wrote…
is this performance sensitive? If yes, does this inline?
it did not inline, so I manually inlined it.
merging_iter.go
line 568 at r3 (raw file):
Previously, sumeerbhola wrote…
same inlining question.
this one also did not inline. I refactored it into a separate method that's only invoked within prefix iteration mode, so that only prefix-iteration mode needs to suffer the function call overhead.
841c72e
to
b83901c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 10 files reviewed, 5 unresolved discussions (waiting on @itsbilal, @nicktrav, and @sumeerbhola)
internal/keyspan/interleaving_iter.go
line 687 at r5 (raw file):
// all subsequent keys with the same prefix must also be // masked according to the key ordering. We can stop and // return nil.
missed this opportunity the first time around
b83901c
to
923535f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 6 files at r1, 1 of 3 files at r2, 1 of 1 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @itsbilal, @jbowens, and @nicktrav)
i realized that successful bloom filter exclusions make it more likely that the next key in the mutable memtable will be a heap root.
Hmm, I am a bit confused. What I was concerned about earlier is
if !item.key.Visible(m.snapshot, m.batchSnapshot) {
if m.prefix == nil {
m.nextEntry(item)
} else {
m.maybeNextEntryWithinPrefix(item)
}
continue
}
where the key found repeatedly matches the prefix and is not visible. And so we are doing repeated prefix comparisons when previously we did not. I buy the fact that we shouldn't be concerned about snapshots. So the remaining concern is a batch where the key is not visible due to the value of m.batchSnapshot
. I didn't quite follow what that has to do with mutable memtables. Also, I think my worry was contrived -- I suspect this is not a real concern.
You raised a good point in that to keep iterating we are doing O(log heap-size) comparisons for each iteration step. And we are adding 1 more prefix comparison.
If the top element is a prefix and not visible, it is likely to keep coming from the same level, so there are 2 heap comparisons involved (if there are >= 3 element in the heap) per step. So 2 heap comparisons + 1 prefix comparison. If that level would have its InternalIterator
do the prefix comparison instead (as you outlined), it would just shift that prefix comparison into that InternalIterator
. So it doesn't really save during iteration.
in the (typical?) case that there are 0 or 1 keys with the prefix (even if n is large), there's no regression. instead of our current O(n) heapify key comparisons plus one prefix check in the top-level Iterator, we'd perform O(n) faster prefix checks.
This is compelling. I am not sure we need to do this now -- perhaps you could make an issue for a future optimization?
-- commits
line 47 at r6:
nice
merging_iter.go
line 555 at r6 (raw file):
// maybeNextEntryWithinPrefix steps to the next entry, as long as the iteration // prefix has not alreday been exceeded. If it has, it exhausts the iterator by
nit: already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tftrs!
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @itsbilal, @nicktrav, and @sumeerbhola)
I didn't quite follow what that has to do with mutable memtables.
I think an iterator's mutable memtable iterator can encounter keys (in the arenaskl skiplist) committed after the iterator was opened, so these newer keys would be filtered out here.
perhaps you could make an issue for a future optimization?
filed #2182.
Builds off of cockroachdb#2151. This commit refactors the merging iterator's handling of prefix-related invariants for enabling TrySeekUsingNext optimizations when using SeekPrefixGE. The TrySeekUsingNext optimization notes successively larger seek keys, and attempts to use next to find the appropriate key in the later seeks, avoiding seeking from scratch. In a level's file metadata, we now use the presence of the TrySeekUsingNext flag to signal that we can Next through the file metadata. This saves key comparisons, especially in larger LSMs with many files within a level. Ever since the introduction of cockroachdb#2048, we've encountered a slew of metamorphic test failures due to interactions between this optimization and prefix iteration. We've made various fixes in cockroachdb#2087, cockroachdb#2111, cockroachdb#2123 and cockroachdb#2151. Most of these failures revolve around the progression of a level iterator beyond the iterator's current iteration prefix. This is problematic, because the incomplete view of the LSM during prefix iteration can result in inconsistent positioning outside of the current iterator prefix. The internal iterators do not guarantee that keys or range deletions outside the prefix are observed. This commit fixes the latest instance of this (cockroachdb#2174), and attempts to restructure the code to clarify the required invariants. Now, any method that may advance the iterator during prefix iteration (eg, nextEntry or isNextEntryDeleted) check that the iteration prefix has not already been exceeded. This check is only performed when the merging iterator is advancing internal iterators of its own volition (eg, due to non-visible keys or keys deleted by range tombstones). It does NOT perform this check when the top-level Iterator Nexts the merging iterator because of a user Next, a point-delete, etc. This means these prefix checks are only redundant with the top-level Iterator Nexts in two cases: a) the heap root is deleted by a range tombstone b) the heap root is not visible at the iterator's sequence number Both of these cases are expected to be rare, and the redundant prefix check is expected to be insignificant. Additionally, to obey these new stricter invariants, the interleaving iterator is adjusted to exhaust the iterator when the remainder of a prefix is masked by a range key. Besides ensuring invariants are respected, this change provides a large benefit to prefix iteration within a large swath of masked point keys. ``` name old time/op new time/op delta Iterator_RangeKeyMasking/range-keys-suffixes=@10/forward/seekprefix-24 59.8ms ± 2% 60.3ms ± 2% +0.92% (p=0.013 n=25+25) Iterator_RangeKeyMasking/range-keys-suffixes=@10/forward/next-24 4.50ms ± 6% 4.52ms ± 3% ~ (p=0.388 n=25+24) Iterator_RangeKeyMasking/range-keys-suffixes=@10/backward-24 5.64ms ± 7% 5.51ms ± 3% -2.26% (p=0.001 n=25+25) Iterator_RangeKeyMasking/range-keys-suffixes=@50/forward/seekprefix-24 64.0ms ± 5% 60.2ms ± 4% -5.89% (p=0.000 n=25+25) Iterator_RangeKeyMasking/range-keys-suffixes=@50/forward/next-24 4.11ms ± 3% 4.16ms ± 5% +1.06% (p=0.040 n=25+24) Iterator_RangeKeyMasking/range-keys-suffixes=@50/backward-24 4.94ms ± 3% 4.89ms ± 3% -0.97% (p=0.011 n=25+24) Iterator_RangeKeyMasking/range-keys-suffixes=@75/forward/seekprefix-24 69.0ms ± 2% 60.1ms ± 2% -12.88% (p=0.000 n=22+24) Iterator_RangeKeyMasking/range-keys-suffixes=@75/forward/next-24 2.55ms ± 2% 2.59ms ± 3% +1.90% (p=0.000 n=25+25) Iterator_RangeKeyMasking/range-keys-suffixes=@75/backward-24 3.20ms ± 2% 3.21ms ± 3% ~ (p=0.418 n=25+25) Iterator_RangeKeyMasking/range-keys-suffixes=@100/forward/seekprefix-24 2.19s ± 3% 0.06s ± 2% -97.27% (p=0.000 n=24+25) Iterator_RangeKeyMasking/range-keys-suffixes=@100/forward/next-24 346µs ± 2% 346µs ± 2% ~ (p=0.898 n=24+25) Iterator_RangeKeyMasking/range-keys-suffixes=@100/backward-24 687µs ± 3% 696µs ± 2% +1.31% (p=0.000 n=25+25) ``` Fix cockroachdb#2174.
923535f
to
73b34ab
Compare
Builds off of #2151. This commit refactors the merging iterator's
handling of prefix-related invariants for enabling TrySeekUsingNext
optimizations when using SeekPrefixGE.
The TrySeekUsingNext optimization notes successively larger seek keys,
and attempts to use next to find the appropriate key in the later seeks,
avoiding seeking from scratch. In a level's file metadata, we now use
the presence of the TrySeekUsingNext flag to signal that we can Next
through the file metadata. This saves key comparisons, especially in
larger LSMs with many files within a level.
Ever since the introduction of #2048, we've encountered a slew of
metamorphic test failures due to interactions between this optimization
and prefix iteration. We've made various fixes in #2087, #2111, #2123
and #2151.
Most of these failures revolve around the progression of a level
iterator beyond the iterator's current iteration prefix. This is
problematic, because the incomplete view of the LSM during prefix
iteration can result in inconsistent positioning outside of the current
iterator prefix. The internal iterators do not guarantee that keys or
range deletions outside the prefix are observed.
This commit fixes the latest instance of this (#2174), and attempts to
restructure the code to clarify the required invariants. Now, any method
that may advance the iterator during prefix iteration (eg, nextEntry or
isNextEntryDeleted) check that the iteration prefix has not already been
exceeded. This check is only performed when the merging iterator is
advancing internal iterators of its own volition (eg, due to non-visible
keys or keys deleted by range tombstones). It does NOT perform this
check when the top-level Iterator Nexts the merging iterator because
of a user Next, a point-delete, etc. This means these prefix checks are
only redundant with the top-level Iterator Nexts in two cases:
a) the heap root is deleted by a range tombstone
b) the heap root is not visible at the iterator's sequence number
Both of these cases are expected to be rare, and the redundant prefix
check is expected to be insignificant.
Additionally, to obey these new stricter invariants, the interleaving
iterator is adjusted to exhaust the iterator when the remainder of a
prefix is masked by a range key. Besides ensuring invariants are
respected, this change provides a large benefit to prefix iteration
within a large swath of masked point keys.
Fix #2174.