Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage,kv: faster intent resolution over a key range #66268

Merged
merged 1 commit into from
Sep 1, 2021

Conversation

sumeerbhola
Copy link
Collaborator

@sumeerbhola sumeerbhola commented Jun 9, 2021

Closes #66900

When all the intents on a range are known to be separated,
as indicated by MCCStats, and a ResolveIntentRangeRequest
needs to be processed (for a transaction that did enough
writes that we did not track individual intent keys), we
avoid iterating over MVCC key-value pairs to find those
intents, and instead iterate over the separated lock table
key space.

A new iterForKeyVersions interface is introduced which
provides a narrow iterator interface to be used by
mvccResolveWriteIntent that resolves individual intents.
This interface is only for iterating over a single key,
and is implemented by the usual MVCCIterator and by the
new separatedIntentAndVersionIter. The latter serves
both as a way to find intents over a range and for
individual intent resolution, in an optimized manner.

There are new benchmarks which vary the sparseness
(sparseness of N means that 1 in every N keys has an
intent that is for the txn for which intent resolution
is being performed) and whether the remaining N-1 keys
have an intent from a different transaction
(other-txn-intents=true) or not (the former results
in the cost of at least parsing the MVCCMetadata to
compare txn IDs).

Full results are below. Two things to note are:

  • The percent-flushed={0,50} sometimes show a regression.
    These are atypical cases where the SingleDelete to
    remove the intent has not been flushed so the intent
    Set and SingleDelete are both alive. This would be
    a pathalogical case when the LSM is unhealthy. Even
    then the regression is < 170%.
  • The percent-flushed=100 cases are the typical one.
    Once sparseness increases to 100 or 1000 we see speed
    gains close to 100x, as indicated by the following:
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16        235µs ± 2%      58µs ± 3%   -75.52%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16      154µs ± 2%       2µs ± 4%   -98.72%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16     137µs ± 1%       1µs ± 2%   -99.07%  (p=0.008 n=5+5)

Full results:

name                                                                                                             old time/op    new time/op    delta
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=0-16          450µs ± 5%     234µs ± 2%   -48.07%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=50-16         294µs ± 3%     101µs ± 4%   -65.49%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16        235µs ± 2%      58µs ± 3%   -75.52%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=0-16        494µs ± 7%     218µs ± 6%   -55.84%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=50-16       251µs ± 2%      57µs ± 3%   -77.29%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16      154µs ± 2%       2µs ± 4%   -98.72%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=0-16         343µs ± 5%     236µs ± 2%   -31.28%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=50-16        240µs ± 1%      74µs ± 2%   -69.02%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=100-16       203µs ± 1%      30µs ± 1%   -84.97%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=0-16       516µs ± 4%     274µs ±13%   -46.91%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=50-16      256µs ± 2%      57µs ± 1%   -77.67%  (p=0.016 n=4+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16     137µs ± 1%       1µs ± 2%   -99.07%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=0-16        325µs ± 2%     237µs ± 1%   -26.98%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=50-16       232µs ± 4%      72µs ± 1%   -68.89%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=100-16      199µs ± 1%      30µs ± 0%   -84.87%  (p=0.016 n=5+4)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=0-16        2.08ms ± 1%    2.68ms ± 8%   +28.57%  (p=0.016 n=4+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=50-16        592µs ± 4%     596µs ± 7%      ~     (p=1.000 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=100-16       242µs ± 2%      59µs ± 3%   -75.73%  (p=0.016 n=4+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=0-16      3.04ms ± 5%    2.60ms ± 3%   -14.59%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=50-16      840µs ± 4%     539µs ± 3%   -35.87%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=100-16     163µs ± 6%       2µs ± 6%   -98.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=0-16        936µs ±38%    2497µs ±12%  +166.83%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=50-16       281µs ±12%     551µs ± 2%   +95.64%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=100-16      209µs ±13%      31µs ± 9%   -84.95%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1000/other-txn-intents=false/percent-flushed=0-16     3.04ms ± 6%    2.55ms ± 0%      ~     (p=0.333 n=5+1)

name                                                                                                             old alloc/op   new alloc/op   delta
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=0-16         12.8kB ± 0%    13.0kB ± 0%    +0.92%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=50-16        12.8kB ± 0%    13.0kB ± 0%    +0.94%  (p=0.016 n=5+4)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16       12.9kB ± 0%    13.0kB ± 0%    +0.59%  (p=0.016 n=5+4)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=0-16       1.75kB ± 0%    0.29kB ± 0%      ~     (p=0.079 n=4+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=50-16      1.75kB ± 0%    0.29kB ± 0%   -83.18%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16     1.75kB ± 0%    0.29kB ± 0%   -83.22%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=0-16        12.8kB ± 0%    11.4kB ± 0%   -11.37%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=50-16       12.8kB ± 0%    11.4kB ± 0%      ~     (p=0.079 n=4+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=100-16      12.8kB ± 0%    11.4kB ± 0%   -11.41%  (p=0.000 n=5+4)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=0-16      1.66kB ± 0%    0.17kB ± 0%   -89.61%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=50-16     1.66kB ± 0%    0.17kB ± 0%   -89.61%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16    1.66kB ± 0%    0.17kB ± 0%   -89.63%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=0-16       12.8kB ± 0%    11.4kB ± 0%   -11.50%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=50-16      12.8kB ± 0%    11.4kB ± 0%      ~     (p=0.079 n=4+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=100-16     12.8kB ± 0%    11.4kB ± 0%   -11.53%  (p=0.029 n=4+4)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=0-16        12.8kB ± 0%    13.0kB ± 0%    +0.95%  (p=0.016 n=5+4)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=50-16       12.8kB ± 0%    13.0kB ± 0%    +0.95%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=100-16      12.9kB ± 0%    13.0kB ± 0%    +0.54%  (p=0.016 n=5+4)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=0-16      1.75kB ± 0%    0.29kB ± 0%   -83.14%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=50-16     1.75kB ± 0%    0.29kB ± 0%   -83.19%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=100-16    1.75kB ± 0%    0.29kB ± 0%   -83.22%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=0-16       12.8kB ± 0%    11.4kB ± 0%   -11.37%  (p=0.000 n=5+4)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=50-16      12.8kB ± 0%    11.4kB ± 0%   -11.37%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=100-16     12.8kB ± 0%    11.4kB ± 0%   -11.41%  (p=0.000 n=5+4)
IntentRangeResolution/separated=true/versions=100/sparseness=1000/other-txn-intents=false/percent-flushed=0-16     1.66kB ± 0%    0.17kB ± 0%   -89.56%  (p=0.000 n=5+1)

name                                                                                                             old allocs/op  new allocs/op  delta
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=0-16            401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=50-16           401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16          401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=0-16          104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=50-16         104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16        104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=0-16           401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=50-16          401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=100-16         401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=0-16         102 ± 0%         3 ± 0%   -97.06%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=50-16        102 ± 0%         3 ± 0%   -97.06%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16       102 ± 0%         3 ± 0%   -97.06%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=0-16          401 ± 0%       303 ± 0%   -24.44%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=50-16         401 ± 0%       303 ± 0%   -24.44%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=100-16        401 ± 0%       303 ± 0%   -24.44%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=0-16           401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=50-16          401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=100-16         401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=0-16         104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=50-16        104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=100-16       104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=0-16          401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=50-16         401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=100-16        401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1000/other-txn-intents=false/percent-flushed=0-16        102 ± 0%         3 ± 0%   -97.06%  (p=0.000 n=5+1)

Release note (performance improvement): Intent resolution for transactions
that write many intents such that we track intent ranges, for the purpose
of intent resolution, is much faster (potentially 100x) when using the
separated lock table.

Release justification: Fix for high-severity poor performance in existing
functionality, and is a low risk, high benefit change, that potentially
improves ranged intent resolution speed by 100x.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests yet -- I'll add after initial opinions.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/batcheval/cmd_resolve_intent_range.go, line 49 at r1 (raw file):

	onlySeparatedIntents := false
	stats := cArgs.EvalCtx.GetMVCCStats()
	if stats.ContainsEstimates == 0 && stats.IntentCount == stats.SeparatedIntentCount {

This question is whether we should trust this or not
@tbg @nvanbenschoten

One way to strengthen this would be to additionally gate this on the long -running migration being written by @itsbilal having finished running across the whole cluster. The plan is for that to run soon after the cluster version is finalized at 21.2. It is a correctness requirement for that to not leave any interleaved intents, since we will rip out the code that looks for interleaved intents in 22.1.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see this. Clearly we want this, the only question in my mind is whether we can afford to have it wait until later (i.e. one release after the migration), but I don't think so since there isn't much (any, really?) complexity here that would be avoided later. I heavily lean towards "just" using the cluster version to gate this functionality, unless there are arguments for being more aggressive. It's just the boring, safe, option.

I haven't given this a detailed review yet, have focused on the specific question of gating the mechanism, and want to understand the comment below:

The percent-flushed={0,50} sometimes show a regression.
These are atypical cases where the SingleDelete to
remove the intent has not been flushed so the intent
Set and SingleDelete are both alive. This would be
a pathalogical case when the LSM is unhealthy. Even
then the regression is < 100%.

For my education, can you explain what scenario this is? In the benchmark you set up intents, so there aren't any single deletes in the dataset on top of which you repeatedly open the batches. MVCCResolveWriteIntentRange is the thing writing the SingleDeletes and once it's deleted something it has handled that intent and won't return in the same batch, right? I'm missing how you even care about where the SingleDeletes go. Also, and this is unsurprising since I'm already lost at this point, I don't understand why this would be the case in practice in pathological LSMs.
Also, there is a +166% regression in the data you posted.

Reviewed 8 of 8 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)


pkg/kv/kvserver/batcheval/cmd_resolve_intent_range.go, line 49 at r1 (raw file):
There's reason to be hesitant about using the stats for something that could affect correctness or availability. For a while we had our fair share of stats bugs where the numbers just weren't right. This is a few years back, and our testing has improved, but I don't know that future changes to the stats couldn't have introduced a silent, so far benign, problem. That said, I don't know of anything concrete here. This should just work. We could gather more evidence from CockroachCloud - if the stats were "wrong" in the sense that they don't agree with recomputation (but all of the replica have the same stats), the consistency checker issues a delta that patches them up. I think SREs could look for this log line

// We've found that there's something to correct; send an RecomputeStatsRequest. Note that this
// code runs only on the lease holder (at the time of initiating the computation), so this work
// isn't duplicated except in rare leaseholder change scenarios (and concurrent invocation of
// RecomputeStats is allowed because these requests block on one another). Also, we're
// essentially paced by the consistency checker so we won't call this too often.
log.Infof(ctx, "triggering stats recomputation to resolve delta of %+v", results[0].Response.Delta)
req := roachpb.RecomputeStatsRequest{
RequestHeader: roachpb.RequestHeader{Key: startKey},
}

which they shouldn't generally see outside of a few expected cases (we introduced AbortSpanBytes, etc); none of which touch the IntentCount field.

Still, there hasn't been much attention paid to the correctness of stats and so I wouldn't want to start relying on it if we don't have to. The cluster version seems like the obvious choice. Using stats would require additional motivation - such as avoiding a performance drop prior to the migration completing. I'd hope that we've gotten the perf overhead "close enough" to vanilla 21.1 to avoid issues here.

As an aside, if the version is active, but IntentCount is not equal to SeparatedIntentCount, something is amiss, so we should catch that at least in our internal testing.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The percent-flushed={0,50} sometimes show a regression.
These are atypical cases where the SingleDelete to
remove the intent has not been flushed so the intent
Set and SingleDelete are both alive. This would be
a pathalogical case when the LSM is unhealthy. Even
then the regression is < 100%.

For my education, can you explain what scenario this is? In the benchmark you set up intents, so there aren't any single deletes in the dataset on top of which you repeatedly open the batches. MVCCResolveWriteIntentRange is the thing writing the SingleDeletes and once it's deleted something it has handled that intent and won't return in the same batch, right?

The SingleDeletes are indeed happening during intent resolution, but not for the part that the benchmark is timing, where we just discard the batch and don't measure the time to commit it (since we want to be able to repeatedly resolve the same intents). The setupKeysWithIntent is writing each version with an intent and resolving all the intents except some of the latest versions that the benchmark iterations will work with. That is where the SingleDeletes are getting written to Pebble.

I'm missing how you even care about where the SingleDeletes go. Also, and this is unsurprising since I'm already lost at this point, I don't understand why this would be the case in practice in pathological LSMs.

The bad case is when all the SingleDeletes, Sets still exist in the LSM, since with the separated locks, which have the txnID in the key suffix, the latest live intent (i.e., without the SingleDelete) is probabilistically somewhere in the middle of all these keys which share the same roachpb.Key. The iteration code (neither the one on master, or the optimized path here) can avoid iterating over them inside pebble.Iterator, which is wasteful. The unoptimized path only needs to iterate over 1/2 of them, the ones preceding the still live intent, after which it seeks using nextKey. The optimized path does a NextEngineKey (since seeks generally are more expensive) and also encounters the deleted ones that are immediately after the live intent (hence the 166% regression). I have never seen this in practice in our tpcc or kv roachtests, but it occurs often in our unit tests since we have so little data that everything is still sitting in the memtable. setupKeysWithIntent explicitly compensates for this by flushing (based on the numFlushedVersions parameter). Another case where it can occur is if the LSM became unhealthy -- say lots of bytes accumulated in L0 (and the Set and SingleDelete were in different flushes to L0, hence in different L0 sstables).

Also, there is a +166% regression in the data you posted.

oops, I misread. Fixed.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @tbg)

@tbg tbg self-requested a review June 10, 2021 15:15
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation! I understand this now, though trying to corroborate it in setupKeysWithIntent was challenging. That method has a ton of params and logic and perhaps some of your prose above can be turned into documentation for it. I have a feeling that this is the kind of method that might both be touched by someone else in the future and be challenging for them to understand the original motivations for.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @tbg)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 8 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)


pkg/kv/kvserver/batcheval/cmd_resolve_intent_range.go, line 49 at r1 (raw file):

Clearly we want this, the only question in my mind is whether we can afford to have it wait until later (i.e. one release after the migration), but I don't think so since there isn't much (any, really?) complexity here that would be avoided later. I heavily lean towards "just" using the cluster version to gate this functionality, unless there are arguments for being more aggressive.

I think it's clearer to you two than it is to me what the alternatives for a migration we have are. When we say "use the cluster version" here, do you mean wait until 22.1 to ensure that the long-running migration added in 21.2 will already be complete? Or do you mean wait until that long-running migration has completed in 22.1 and then switch over? The second option is what I would have expected, and I think it allows us to avoid any dependency on the correctness of MVCCStats.

Is there a downside to waiting on this long-running migration to complete? The only one I can think of is that we can't start using this fast-path on some ranges until all ranges have been migrated over. But is that much of a concern?


pkg/storage/mvcc.go, line 2788 at r2 (raw file):

// mvccResolveWriteIntent. The MVCCIterator is positioned lazily, only if
// needed -- the fast path for intent resolution when a transaction is
// committing does not need to position the MVCCIterator.

Is the iterator positioned if the transaction is committing and needs to change its commit timestamp from the provisional values existing timestamp? What about when a transaction is aborted? I believe the answer is "yes" and "yes", but it would be worth confirming and adding to this comment.


pkg/storage/mvcc.go, line 2884 at r2 (raw file):

}

func mvccGetIntent(

Could we give this a comment?


pkg/storage/mvcc.go, line 3154 at r2 (raw file):

	var unsafeNextKey MVCCKey
	var unsafeNextValue []byte
	if nextKey.IsValue() {

Is this just checking that lastKey's timestamp wasn't 0,1? If so, it may be worth a comment that this is the common case.


pkg/storage/mvcc.go, line 3309 at r2 (raw file):

	var putBuf *putBuffer
	// Exactly one of sepIter and mvccIter is non-nil.

Some commentary on which cases each is used would be nice.


pkg/storage/mvcc.go, line 3394 at r2 (raw file):

		var ok bool
		if !key.IsValue() {
			// When we start allowing existence of multiple intents on the same key,

This was optimistic.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)


pkg/kv/kvserver/batcheval/cmd_resolve_intent_range.go, line 49 at r1 (raw file):

Or do you mean wait until that long-running migration has completed in 22.1 and then switch over? The second option is what I would have expected, and I think it allows us to avoid any dependency on the correctness of MVCCStats.

I assume you meant 21.2. Yes, that was the option I was thinking of, and I believe that was what Tobi was saying too. Waiting for 22.1 will not really save us much code -- the changes here are mostly for the optimized path, and there are only small tweaks to the unoptimized legacy path, and the latter is what we would delete in 22.1.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)


pkg/kv/kvserver/batcheval/cmd_resolve_intent_range.go, line 49 at r1 (raw file):

Previously, sumeerbhola wrote…

Or do you mean wait until that long-running migration has completed in 22.1 and then switch over? The second option is what I would have expected, and I think it allows us to avoid any dependency on the correctness of MVCCStats.

I assume you meant 21.2. Yes, that was the option I was thinking of, and I believe that was what Tobi was saying too. Waiting for 22.1 will not really save us much code -- the changes here are mostly for the optimized path, and there are only small tweaks to the unoptimized legacy path, and the latter is what we would delete in 22.1.

Yes, if 21.1 is active, there shouldn't (mustn't) be any more inline intents, so we can always take this path. I think that's the canonical choice here and looks like we all agree.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!
I'll hold off on this PR until the separated intents migration is merged since I need to gate this fast path on that migration being complete.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @tbg)


pkg/storage/mvcc.go, line 2788 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is the iterator positioned if the transaction is committing and needs to change its commit timestamp from the provisional values existing timestamp? What about when a transaction is aborted? I believe the answer is "yes" and "yes", but it would be worth confirming and adding to this comment.

Correct. Done.


pkg/storage/mvcc.go, line 2884 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could we give this a comment?

Done


pkg/storage/mvcc.go, line 3154 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this just checking that lastKey's timestamp wasn't 0,1? If so, it may be worth a comment that this is the common case.

Done


pkg/storage/mvcc.go, line 3309 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Some commentary on which cases each is used would be nice.

Done


pkg/storage/mvcc.go, line 3394 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This was optimistic.

:)
Removed since it isn't relevant. The separated intents case already looks at each intent.

@sumeerbhola
Copy link
Collaborator Author

I've updated this with tests, and to use the post migration cluster version. PTAL

@sumeerbhola sumeerbhola force-pushed the opt_rir branch 3 times, most recently from 4e9a819 to 97159cc Compare August 19, 2021 22:58
@otan otan removed the request for review from a team August 24, 2021 05:51
@sumeerbhola
Copy link
Collaborator Author

@nvanbenschoten @itsbilal gentle ping on this -- I'd like to merge this early in the stability period.

Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: code looks great to me! minor comments mostly just about comments

Reviewed 13 of 14 files at r4, 3 of 3 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/kv/kvserver/replica_write.go, line 757 at r5 (raw file):

		// The lock spans are insufficient for ranged intent resolution, which
		// does not declare lock spans and directly calls
		// spanSetBatch.NewEngineIterator. TODO(sumeer): we can't keep adding

TODO should be on a new line.


pkg/kv/kvserver/store.go, line 237 at r5 (raw file):

		// This is before PostSeparatedIntentsMigration, so we may have
		// interleaved intents.
		version = clusterversion.ByKey(clusterversion.SeparatedIntentsMigration)

Why not do version = clusterversion.ByKey(clusterversion.SeparatedIntentsMigration - 1) so it's clear the migration hasn't run? Makes little difference though, but just makes the purpose clearer. I can see a cluster version at clusterversion.SeparatedIntentsMigration being seen as "all live replicas have been migrated" while in reality we could still be writing interleaved intents with the knob in tests (a criticism that applies more with the other instance of this line below).


pkg/kv/kvserver/batcheval/cmd_resolve_intent_test.go, line 277 at r5 (raw file):

		// Before PostSeparatedIntentsMigration, so intent resolution will
		// not assume only separated intents.
		version = clusterversion.ByKey(clusterversion.SeparatedIntentsMigration)

Same


pkg/storage/mvcc.go, line 3324 at r5 (raw file):

	var putBuf *putBuffer
	// Exactly one of sepIter and mvccIter is non-nil. sepIter is used when
	// onlySeparatedIntents=true, and mvccIter when false. The former allows for

sepIter is used when onlySeparatedIntents=true and rw provides consistent iterators right? maybe the latter is not important if only pebble snapshots/batches will be passed in, and never an engine.


pkg/storage/mvcc.go, line 3336 at r5 (raw file):

	// ConsistentIterators() since we want the two iterators to be mutually
	// consistent (and production code will have consistent iterators).
	// TODO(sumeer): when removing the slow path, use

Nit: empty line before TODO

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 13 of 14 files at r4, 3 of 3 files at r5.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/kv/kvserver/batcheval/cmd_end_transaction.go, line 463 at r5 (raw file):

	// Some tests have st == nil.
	if st != nil {
		onlySeparatedIntents = !st.Version.ActiveVersionOrEmpty(ctx).Less(

Are you intentionally avoiding IsActive? I have the same question elsewhere.


pkg/kv/kvserver/batcheval/cmd_resolve_intent_test.go, line 228 at r5 (raw file):

			rir.EndKey = endKey

			declareKeysResolveIntentRange(&desc, h, &rir, &spans, nil)

I know this isn't all new, but why are we declaring keys in this test? Is that needed?


pkg/storage/mvcc.go, line 3403 at r5 (raw file):

			}
		} else {
			mvccIter.SeekGE(nextKey)

It's a little strange to me as a reader where the two iterators are seeked. sepIter is seeked before the loop and then at the end of each iteration. mvccIter is seeked at the beginning of each iteration. Would it be worth reconciling this difference?

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @itsbilal and @nvanbenschoten)


pkg/kv/kvserver/replica_write.go, line 757 at r5 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

TODO should be on a new line.

Done


pkg/kv/kvserver/store.go, line 237 at r5 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Why not do version = clusterversion.ByKey(clusterversion.SeparatedIntentsMigration - 1) so it's clear the migration hasn't run? Makes little difference though, but just makes the purpose clearer. I can see a cluster version at clusterversion.SeparatedIntentsMigration being seen as "all live replicas have been migrated" while in reality we could still be writing interleaved intents with the knob in tests (a criticism that applies more with the other instance of this line below).

Done


pkg/kv/kvserver/batcheval/cmd_end_transaction.go, line 463 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Are you intentionally avoiding IsActive? I have the same question elsewhere.

Silliness. Fixed here and elsewhere.


pkg/kv/kvserver/batcheval/cmd_resolve_intent_test.go, line 228 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I know this isn't all new, but why are we declaring keys in this test? Is that needed?

They are being used by spanSetBatch that we created earlier. We are just initializing the spans later, but before doing reads and writes. I've added a comment.


pkg/kv/kvserver/batcheval/cmd_resolve_intent_test.go, line 277 at r5 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Same

Done


pkg/storage/mvcc.go, line 3324 at r5 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

sepIter is used when onlySeparatedIntents=true and rw provides consistent iterators right? maybe the latter is not important if only pebble snapshots/batches will be passed in, and never an engine.

Updated the comment. We almost never use an engine for production paths. But tests often do.


pkg/storage/mvcc.go, line 3336 at r5 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Nit: empty line before TODO

Done


pkg/storage/mvcc.go, line 3403 at r5 (raw file):

sepIter is seeked before the loop and then at the end of each iteration. mvccIter is seeked at the beginning of each iteration. Would it be worth reconciling this difference?

I've added some code comments to explain the difference. The main reason is that we step the sepIter and only seek it once, so we do the seek before the loop starts. The calls to sepIter.nextEngineKey need to happen at the end of an iteration to prepare for the next iteration. This is important for performance. While mvccIter is always being seeked, so we can consistently do it at the beginning of the iteration.

@sumeerbhola sumeerbhola force-pushed the opt_rir branch 4 times, most recently from 0dfdc68 to f2ee987 Compare August 30, 2021 16:25
@sumeerbhola
Copy link
Collaborator Author

I can't reproduce the TestCCLLogic failure locally, so possibly flaky test.
The Example_demo failure is reproducible and looks to be a test setup inconsistency in that we randomize the separated intents setting for the sticky engine, which can be inconsistent with that the KV layer thinks. I'm going to change the sticky engine to always run with separated intents.

@sumeerbhola
Copy link
Collaborator Author

logictestccl is timing out, which happens on my machine even when I disable this optimization.

I've added a randomized ranged intent resolution test to compare the results using the fast and slow paths.
It turns out there is a bug where if the txn is aborting, but there are partial rollbacks in the ignored seq nums, we do the latter and don't fully resolve the intent. This in the single intent resolution code, and unrelated to this PR. This is harmless since the intent resolution will eventually get retried. I am not going to fix it in this PR.

When all the intents on a range are known to be separated,
as indicated by MCCStats, and a ResolveIntentRangeRequest
needs to be processed (for a transaction that did enough
writes that we did not track individual intent keys), we
avoid iterating over MVCC key-value pairs to find those
intents, and instead iterate over the separated lock table
key space.

A new iterForKeyVersions interface is introduced which
provides a narrow iterator interface to be used by
mvccResolveWriteIntent that resolves individual intents.
This interface is only for iterating over a single key,
and is implemented by the usual MVCCIterator and by the
new separatedIntentAndVersionIter. The latter serves
both as a way to find intents over a range and for
individual intent resolution, in an optimized manner.

There are new benchmarks which vary the sparseness
(sparseness of N means that 1 of every N key has an
intent that is for the txn for which intent resolution
is being performed) and whether the remaining N-1 keys
have an intent from a different transaction
(other-txn-intents=true) or not (the former results
in the cost of at least parsing the MVCCMetadata to
compare txn IDs).

Full results are below. Two things to note are:
- The percent-flushed={0,50} sometimes show a regression.
  These are atypical cases where the SingleDelete to
  remove the intent has not been flushed so the intent
  Set and SingleDelete are both alive. This would be
  a pathalogical case when the LSM is unhealthy. Even
  then the regression is < 170%.
- The percent-flushed=100 cases are the typical one.
  Once sparseness increases to 100 or 1000 we see speed
  gains close to 100x, as indicated by the following:

```
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16        235µs ± 2%      58µs ± 3%   -75.52%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16      154µs ± 2%       2µs ± 4%   -98.72%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16     137µs ± 1%       1µs ± 2%   -99.07%  (p=0.008 n=5+5)
```

Full results:
```
name                                                                                                             old time/op    new time/op    delta
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=0-16          450µs ± 5%     234µs ± 2%   -48.07%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=50-16         294µs ± 3%     101µs ± 4%   -65.49%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16        235µs ± 2%      58µs ± 3%   -75.52%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=0-16        494µs ± 7%     218µs ± 6%   -55.84%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=50-16       251µs ± 2%      57µs ± 3%   -77.29%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16      154µs ± 2%       2µs ± 4%   -98.72%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=0-16         343µs ± 5%     236µs ± 2%   -31.28%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=50-16        240µs ± 1%      74µs ± 2%   -69.02%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=100-16       203µs ± 1%      30µs ± 1%   -84.97%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=0-16       516µs ± 4%     274µs ±13%   -46.91%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=50-16      256µs ± 2%      57µs ± 1%   -77.67%  (p=0.016 n=4+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16     137µs ± 1%       1µs ± 2%   -99.07%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=0-16        325µs ± 2%     237µs ± 1%   -26.98%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=50-16       232µs ± 4%      72µs ± 1%   -68.89%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=100-16      199µs ± 1%      30µs ± 0%   -84.87%  (p=0.016 n=5+4)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=0-16        2.08ms ± 1%    2.68ms ± 8%   +28.57%  (p=0.016 n=4+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=50-16        592µs ± 4%     596µs ± 7%      ~     (p=1.000 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=100-16       242µs ± 2%      59µs ± 3%   -75.73%  (p=0.016 n=4+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=0-16      3.04ms ± 5%    2.60ms ± 3%   -14.59%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=50-16      840µs ± 4%     539µs ± 3%   -35.87%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=100-16     163µs ± 6%       2µs ± 6%   -98.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=0-16        936µs ±38%    2497µs ±12%  +166.83%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=50-16       281µs ±12%     551µs ± 2%   +95.64%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=100-16      209µs ±13%      31µs ± 9%   -84.95%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1000/other-txn-intents=false/percent-flushed=0-16     3.04ms ± 6%    2.55ms ± 0%      ~     (p=0.333 n=5+1)

name                                                                                                             old alloc/op   new alloc/op   delta
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=0-16         12.8kB ± 0%    13.0kB ± 0%    +0.92%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=50-16        12.8kB ± 0%    13.0kB ± 0%    +0.94%  (p=0.016 n=5+4)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16       12.9kB ± 0%    13.0kB ± 0%    +0.59%  (p=0.016 n=5+4)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=0-16       1.75kB ± 0%    0.29kB ± 0%      ~     (p=0.079 n=4+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=50-16      1.75kB ± 0%    0.29kB ± 0%   -83.18%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16     1.75kB ± 0%    0.29kB ± 0%   -83.22%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=0-16        12.8kB ± 0%    11.4kB ± 0%   -11.37%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=50-16       12.8kB ± 0%    11.4kB ± 0%      ~     (p=0.079 n=4+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=100-16      12.8kB ± 0%    11.4kB ± 0%   -11.41%  (p=0.000 n=5+4)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=0-16      1.66kB ± 0%    0.17kB ± 0%   -89.61%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=50-16     1.66kB ± 0%    0.17kB ± 0%   -89.61%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16    1.66kB ± 0%    0.17kB ± 0%   -89.63%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=0-16       12.8kB ± 0%    11.4kB ± 0%   -11.50%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=50-16      12.8kB ± 0%    11.4kB ± 0%      ~     (p=0.079 n=4+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=100-16     12.8kB ± 0%    11.4kB ± 0%   -11.53%  (p=0.029 n=4+4)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=0-16        12.8kB ± 0%    13.0kB ± 0%    +0.95%  (p=0.016 n=5+4)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=50-16       12.8kB ± 0%    13.0kB ± 0%    +0.95%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=100-16      12.9kB ± 0%    13.0kB ± 0%    +0.54%  (p=0.016 n=5+4)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=0-16      1.75kB ± 0%    0.29kB ± 0%   -83.14%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=50-16     1.75kB ± 0%    0.29kB ± 0%   -83.19%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=100-16    1.75kB ± 0%    0.29kB ± 0%   -83.22%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=0-16       12.8kB ± 0%    11.4kB ± 0%   -11.37%  (p=0.000 n=5+4)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=50-16      12.8kB ± 0%    11.4kB ± 0%   -11.37%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=100-16     12.8kB ± 0%    11.4kB ± 0%   -11.41%  (p=0.000 n=5+4)
IntentRangeResolution/separated=true/versions=100/sparseness=1000/other-txn-intents=false/percent-flushed=0-16     1.66kB ± 0%    0.17kB ± 0%   -89.56%  (p=0.000 n=5+1)

name                                                                                                             old allocs/op  new allocs/op  delta
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=0-16            401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=50-16           401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16          401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=0-16          104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=50-16         104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16        104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=0-16           401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=50-16          401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=100-16         401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=0-16         102 ± 0%         3 ± 0%   -97.06%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=50-16        102 ± 0%         3 ± 0%   -97.06%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16       102 ± 0%         3 ± 0%   -97.06%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=0-16          401 ± 0%       303 ± 0%   -24.44%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=50-16         401 ± 0%       303 ± 0%   -24.44%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=100-16        401 ± 0%       303 ± 0%   -24.44%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=0-16           401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=50-16          401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=100-16         401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=0-16         104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=50-16        104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=100-16       104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=0-16          401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=50-16         401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=100-16        401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1000/other-txn-intents=false/percent-flushed=0-16        102 ± 0%         3 ± 0%   -97.06%  (p=0.000 n=5+1)
```

Release note (performance improvement): Intent resolution for transactions
that write many intents such that we track intent ranges, for the purpose
of intent resolution, is much faster (potentially 100x) when using the
separated lock table.

Release justification: Fix for high-severity poor performance in existing
functionality, and is a low risk, high benefit change, that potentially
improves ranged intent resolution speed by 100x.
@sumeerbhola
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 1, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage,kv: improve the performance intent resolution over a key range
5 participants