Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: bump 21.1 to include levelIter optimization #1093

Merged
merged 2 commits into from
Mar 19, 2021

Conversation

sumeerbhola
Copy link
Collaborator

This optimization was identified as helping with the tpccbench regression
as discussed in cockroachdb/cockroach#62078

See
cockroachdb/cockroach#62078 (comment)
for the risk discussion.

Limited iteration mode is a best-effort way for the caller
to specify an exclusive forward or inclusive reverse limit
on each limited-iteration call. The limit lasts only for
the duration of the call. The limit is exclusive for
forward iteration and inclusive for reverse iteration.

Iterator supports WithLimit variants for SeekGE, SeekLT,
Next, Prev. These are motivated by the O(N^2) complexity
we observe when doing coordinated iteration over the
lock table and MVCC keys in CockroachDB where all the
intents in the lock table are deleted but have not yet
been compacted away. The O(N^2) complexity arises in two
cases:
- Limited scans: the extreme is a limited scan with a
  limit of 1. Each successive scan traverses N, N-1, ...
  locks.
- Mix of forward/reverse iteration in a single scan: A
  Next/SeekGE will incur a cost of N and the Prev/SeekLT
  another cost of N.
  This situation could be potentially fixed with a narrower
  solution, given the constrained way that the combination of
  forward and backward iteration is used in CockroachDB
  code, but it would be fragile. The limited iteration mode
  solves it in a more general way.

The implementation of limited iteration eschews
best-effort behavior (though that would reduce some
comparisons) so that it can be deterministic and be
tested via the metamorphic test.

The existing seek optimizations are impacted in the
following way:
- Use next before seeking:
  - Repeated seeks (forward or reverse) with monotonic
    bounds: this optimization operates at mergingIter
    and below and is therefore unaffected by whatever
    Iterator may be doing with the keys exposed by
    mergingIter.
  - monotonic SeekPrefixGE: SeekPrefixGE does not
    expose a limited iteration variant since the
    prefix comparison performed by Iterator already
    ensures the absence of quadratic behavior.
- Noop seeks (forward and reverse): this optimization
  happens purely in Iterator. It is altered to account
  for limited iteration, and forks into 2 paths,
  (a) truly a noop, (b) the limit on the preceding
  seek has positioned the iterator at a key that is
  outside the limit. In case (b) the underlying
  mergingIter does not need to be seeked, but there
  is work to be done in finding an appropriate
  prev or next entry.

See https://github.com/sumeerbhola/cockroach/tree/iter_coord
for the corresponding CockroachDB change that uses
these methods.
When SeekPrefixGE on the underlying file returns false
due to a bloom filter non-match, levelIter would skip
to the next file. This is wasteful if the upper bound
of the file is beyond the prefix. Additionally, it
defeats the optimization for sparse key spaces like
CockroachDB's lock table, where we try to reuse the
current position of the iterator -- by skipping to the
next file the subsequent SeekPrefixGE will again need
to reload the previous file.

This behavior was first noticed when diagnosing tpcc
slowness in CockroacbDB, where almost half the
overhead of seeking in the lock table could be
attributed to this (see
cockroachdb/cockroach#62078
for details).

The benchmark numbers for bloom=true/with-tombstone=false
are the ones intended to benefit from this change.

name                                                                        old time/op    new time/op    delta
IteratorSeqSeekPrefixGENotFound/skip=1/bloom=false/with-tombstone=false-16     441ns ± 9%     445ns ± 7%     ~     (p=0.332 n=19+20)
IteratorSeqSeekPrefixGENotFound/skip=1/bloom=false/with-tombstone=true-16      299ns ± 3%     300ns ± 3%     ~     (p=0.455 n=20+20)
IteratorSeqSeekPrefixGENotFound/skip=1/bloom=true/with-tombstone=false-16     3.73µs ± 8%    0.82µs ± 2%  -78.02%  (p=0.000 n=20+16)
IteratorSeqSeekPrefixGENotFound/skip=1/bloom=true/with-tombstone=true-16      1.78µs ±73%    1.21µs ± 7%  -32.15%  (p=0.000 n=20+20)
IteratorSeqSeekPrefixGENotFound/skip=2/bloom=false/with-tombstone=false-16     484ns ±27%     427ns ± 2%  -11.83%  (p=0.000 n=19+19)
IteratorSeqSeekPrefixGENotFound/skip=2/bloom=false/with-tombstone=true-16      320ns ± 7%     300ns ± 3%   -6.11%  (p=0.000 n=16+19)
IteratorSeqSeekPrefixGENotFound/skip=2/bloom=true/with-tombstone=false-16     5.07µs ±41%    0.82µs ± 2%  -83.84%  (p=0.000 n=20+18)
IteratorSeqSeekPrefixGENotFound/skip=2/bloom=true/with-tombstone=true-16      1.76µs ±37%    1.21µs ± 9%  -30.92%  (p=0.000 n=20+20)
IteratorSeqSeekPrefixGENotFound/skip=4/bloom=false/with-tombstone=false-16     439ns ± 4%     436ns ± 6%     ~     (p=0.109 n=20+20)
IteratorSeqSeekPrefixGENotFound/skip=4/bloom=false/with-tombstone=true-16      435ns ±29%     307ns ± 5%  -29.40%  (p=0.000 n=20+19)
IteratorSeqSeekPrefixGENotFound/skip=4/bloom=true/with-tombstone=false-16     5.63µs ±19%    0.82µs ± 2%  -85.40%  (p=0.000 n=20+19)
IteratorSeqSeekPrefixGENotFound/skip=4/bloom=true/with-tombstone=true-16      1.87µs ±36%    1.24µs ± 8%  -33.66%  (p=0.000 n=20+20)

name                                                                        old alloc/op   new alloc/op   delta
IteratorSeqSeekPrefixGENotFound/skip=1/bloom=false/with-tombstone=false-16     0.00B          0.00B          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=1/bloom=false/with-tombstone=true-16      0.00B          0.00B          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=1/bloom=true/with-tombstone=false-16      0.00B          0.00B          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=1/bloom=true/with-tombstone=true-16        271B ± 0%      271B ± 0%     ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=2/bloom=false/with-tombstone=false-16     0.00B          0.00B          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=2/bloom=false/with-tombstone=true-16      0.00B          0.00B          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=2/bloom=true/with-tombstone=false-16      0.00B          0.00B          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=2/bloom=true/with-tombstone=true-16        271B ± 0%      271B ± 0%     ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=4/bloom=false/with-tombstone=false-16     0.00B          0.00B          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=4/bloom=false/with-tombstone=true-16      0.00B          0.00B          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=4/bloom=true/with-tombstone=false-16      0.00B          0.00B          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=4/bloom=true/with-tombstone=true-16        271B ± 0%      271B ± 0%     ~     (all equal)

name                                                                        old allocs/op  new allocs/op  delta
IteratorSeqSeekPrefixGENotFound/skip=1/bloom=false/with-tombstone=false-16      0.00           0.00          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=1/bloom=false/with-tombstone=true-16       0.00           0.00          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=1/bloom=true/with-tombstone=false-16       0.00           0.00          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=1/bloom=true/with-tombstone=true-16        1.00 ± 0%      1.00 ± 0%     ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=2/bloom=false/with-tombstone=false-16      0.00           0.00          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=2/bloom=false/with-tombstone=true-16       0.00           0.00          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=2/bloom=true/with-tombstone=false-16       0.00           0.00          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=2/bloom=true/with-tombstone=true-16        1.00 ± 0%      1.00 ± 0%     ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=4/bloom=false/with-tombstone=false-16      0.00           0.00          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=4/bloom=false/with-tombstone=true-16       0.00           0.00          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=4/bloom=true/with-tombstone=false-16       0.00           0.00          ~     (all equal)
IteratorSeqSeekPrefixGENotFound/skip=4/bloom=true/with-tombstone=true-16        1.00 ± 0%      1.00 ± 0%     ~     (all equal)
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:LGTM:

Reviewable status: 0 of 19 files reviewed, all discussions resolved (waiting on @itsbilal and @petermattis)

@sumeerbhola
Copy link
Collaborator Author

TFTR!
The TestCacheEvict failure seems unrelated.

@sumeerbhola sumeerbhola merged commit bc44610 into cockroachdb:crl-release-21.1 Mar 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants