Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: enhance Iterator with limited iteration mode #1079

Merged
merged 1 commit into from
Mar 13, 2021

Conversation

sumeerbhola
Copy link
Collaborator

Limited iteration mode is a best-effort way for the caller
to specify an exclusive forward or inclusive reverse limit
on each limited-iteration call. The limit lasts only for
the duration of the call. The limit is exclusive for
forward iteration and inclusive for reverse iteration.

Iterator supports WithLimit variants for SeekGE, SeekLT,
Next, Prev. These are motivated by the O(N^2) complexity
we observe when doing coordinated iteration over the
lock table and MVCC keys in CockroachDB where all the
intents in the lock table are deleted but have not yet
been compacted away. The O(N^2) complexity arises in two
cases:

  • Limited scans: the extreme is a limited scan with a
    limit of 1. Each successive scan traverses N, N-1, ...
    locks.
  • Mix of forward/reverse iteration in a single scan: A
    Next/SeekGE will incur a cost of N and the Prev/SeekLT
    another cost of N.
    This situation could be potentially fixed with a narrower
    solution, given the constrained way that the combination of
    forward and backward iteration is used in CockroachDB
    code, but it would be fragile. The limited iteration mode
    solves it in a more general way.

The implementation of limited iteration eschews
best-effort behavior (though that would reduce some
comparisons) so that it can be deterministic and be
tested via the metamorphic test.

The existing seek optimizations are impacted in the
following way:

  • Use next before seeking:
    • Repeated seeks (forward or reverse) with monotonic
      bounds: this optimization operates at mergingIter
      and below and is therefore unaffected by whatever
      Iterator may be doing with the keys exposed by
      mergingIter.
    • monotonic SeekPrefixGE: SeekPrefixGE does not
      expose a limited iteration variant since the
      prefix comparison performed by Iterator already
      ensures the absence of quadratic behavior.
  • Noop seeks (forward and reverse): this optimization
    happens purely in Iterator. It is altered to account
    for limited iteration, and forks into 2 paths,
    (a) truly a noop, (b) the limit on the preceding
    seek has positioned the iterator at a key that is
    outside the limit. In case (b) the underlying
    mergingIter does not need to be seeked, but there
    is work to be done in finding an appropriate
    prev or next entry.

See https://github.com/sumeerbhola/cockroach/tree/iter_coord
for the corresponding CockroachDB change that uses
these methods.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

I find it unfortunate that we have to forego a performance optimization just to please the metamorphic test. Maybe we can make the iter*Ops in the metamorphic test "work around" that? Like, say, do the limit check after each positioning operation in the metamorphic test, and if it's past the limit, just spit out the limit and deterministically force the iterator into the related paused state (eg. iter.SeekGEWithLimit(limit, limit) or something) ?

Reviewed 11 of 11 files at r1.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jbowens, @petermattis, and @sumeerbhola)


iterator.go, line 115 at r1 (raw file):

	iterValidityState IterValidityState
	// The position of iter. When this is iterPos{Prev,Next} the iter has been
	// moved past the current key-value, which can only happen if iterValidityState=true,

nit: iterValidityState=IterValid


iterator.go, line 270 at r1 (raw file):

	// to avoid double counting a key by sampling both when we pause at it,
	// and when we return it to the client.
	if i.iterValidityState != IterValid {

Since the paused key isn't being returned to the caller, maybe this is the right behaviour - we only sample the keys we return.

Limited iteration mode is a best-effort way for the caller
to specify an exclusive forward or inclusive reverse limit
on each limited-iteration call. The limit lasts only for
the duration of the call. The limit is exclusive for
forward iteration and inclusive for reverse iteration.

Iterator supports WithLimit variants for SeekGE, SeekLT,
Next, Prev. These are motivated by the O(N^2) complexity
we observe when doing coordinated iteration over the
lock table and MVCC keys in CockroachDB where all the
intents in the lock table are deleted but have not yet
been compacted away. The O(N^2) complexity arises in two
cases:
- Limited scans: the extreme is a limited scan with a
  limit of 1. Each successive scan traverses N, N-1, ...
  locks.
- Mix of forward/reverse iteration in a single scan: A
  Next/SeekGE will incur a cost of N and the Prev/SeekLT
  another cost of N.
  This situation could be potentially fixed with a narrower
  solution, given the constrained way that the combination of
  forward and backward iteration is used in CockroachDB
  code, but it would be fragile. The limited iteration mode
  solves it in a more general way.

The implementation of limited iteration eschews
best-effort behavior (though that would reduce some
comparisons) so that it can be deterministic and be
tested via the metamorphic test.

The existing seek optimizations are impacted in the
following way:
- Use next before seeking:
  - Repeated seeks (forward or reverse) with monotonic
    bounds: this optimization operates at mergingIter
    and below and is therefore unaffected by whatever
    Iterator may be doing with the keys exposed by
    mergingIter.
  - monotonic SeekPrefixGE: SeekPrefixGE does not
    expose a limited iteration variant since the
    prefix comparison performed by Iterator already
    ensures the absence of quadratic behavior.
- Noop seeks (forward and reverse): this optimization
  happens purely in Iterator. It is altered to account
  for limited iteration, and forks into 2 paths,
  (a) truly a noop, (b) the limit on the preceding
  seek has positioned the iterator at a key that is
  outside the limit. In case (b) the underlying
  mergingIter does not need to be seeked, but there
  is work to be done in finding an appropriate
  prev or next entry.

See https://github.com/sumeerbhola/cockroach/tree/iter_coord
for the corresponding CockroachDB change that uses
these methods.
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

I find it unfortunate that we have to forego a performance optimization just to please the metamorphic test. Maybe we can make the iter*Ops in the metamorphic test "work around" that? Like, say, do the limit check after each positioning operation in the metamorphic test, and if it's past the limit, just spit out the limit and deterministically force the iterator into the related paused state (eg. iter.SeekGEWithLimit(limit, limit) or something) ?

That is an interesting idea, and I'll keep that in mind. It is a bit trickier, and I believe these extra comparisons aren't actually costing us much. And doing additional ops that perturb the Iterator state to restore determinism may hide bugs in the pre-perturbation state.

Reviewable status: 4 of 10 files reviewed, 1 unresolved discussion (waiting on @itsbilal, @jbowens, and @petermattis)


iterator.go, line 115 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

nit: iterValidityState=IterValid

Done


iterator.go, line 270 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Since the paused key isn't being returned to the caller, maybe this is the right behaviour - we only sample the keys we return.

The comment was more leading towards the question of whether we should only sample the keys we return. If we don't sample the deleted keys we would not use those to direct compactions even though those deleted keys were read. I've updated the comment and it is no longer a TODO, since it isn't clear whether we should address this deficiency.

@sumeerbhola sumeerbhola merged commit 639dfce into cockroachdb:master Mar 13, 2021
@sumeerbhola sumeerbhola deleted the limited_iter branch October 26, 2021 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants