kvserver: lower priority level for mvcc gc work #85823

irfansharif · 2022-08-09T16:50:55Z

GC could be expected to be LowPri, so that it does not impact
user-facing traffic when resources (e.g. CPU, write capacity of the
store) are scarce. However long delays in GC can slow down user-facing
traffic due to more versions in the store, and can increase write
amplification of the store since there is more live data. Ideally, we
should adjust this priority based on how far behind we are with respect
to GC-ing a particular range. Keeping the priority level static at
NormalPri proved disruptive when a large volume of MVCC GC work is
suddenly accrued (if an old protected timestamp record was just released
for ex. following a long paused backup job being completed/canceled, or
just an old, long running backup job finishing).

After dynamic priority adjustment, it's not yet clear whether we need
additional pacing mechanisms to provide better latency isolation,
similar to ongoing work for backups. MVCC GC work is CPU intensive:
#82955. This patch is also speculative in nature and in response to
observed incidents where NormalPri proved too disruptive. Fuller
treatment would entail working off of reliable reproductions of this
behaviour.

We also added a cluster setting (kv.mvcc_gc.queue_interval)
that controls how long the MVCC GC queue waits between
processing replicas. It was previously hardcoded to 1s (which is the
default value), but changing it would've come in handy in support
incidents as a form of manual pacing of MVCC GC work (we have a similar
useful knob for the merge queue).

Release note (performance improvement): Previously if there was sudden
increase in the volume of pending MVCC GC work, there was an impact on
foreground latencies. These sudden increases commonly occurred when:

gc.ttlseconds was reduced dramatically over tables/indexes that accrue
a lot of MVCC garbage (think "rows being frequently deleted")
a paused backup job from a while ago (think > 1 day) was
canceled/failed
a backup job that started a while ago (think > 1 day) just finished

Indicators of a large increase in the volume of pending MVCC GC work is
a steep climb in the "GC Queue" graph found in the DB console page, when
navigating to 'Metrics', and selecting the 'Queues' dashboard. With this
patch, the effect on foreground latencies as a result of this sudden
build up should be reduced.

cockroach-teamcity · 2022-08-09T16:51:12Z

This change is

lunevalex · 2022-08-09T17:19:56Z

pkg/kv/kvserver/gc/gc.go

@@ -115,6 +116,20 @@ var MaxIntentKeyBytesPerCleanupBatch = settings.RegisterIntSetting(
 	settings.NonNegativeInt,
 )

+// AdmissionPriority determines the admission priority level to use for MVCC GC
+// work.
+var AdmissionPriority = settings.RegisterEnumSetting(


nit: I dont think a setting is necessary, but also have no objections to adding one.

nvanbenschoten

Should we follow this up with a change that makes mvccGCQueueTimerDuration a cluster setting? We have precedent for that with kv.range_merge.queue_interval.

Reviewed 3 of 3 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @lunevalex, @sumeerbhola, and @tbg)

-- commits line 16 at r1:
I suspect that we will eventually need additional pacing in the keyspace iteration that the MVCC GC queue performs (e.g. processReplicatedKeyRange) to compute which keys to GC. In cases where many keys are being garbage collected, it is probably the case that we will issue batches of GCRequests frequently enough for this change to serve as a form of pacing. However, for MVCC GC queue passes where few keys are actually GCed, this probably won't be enough. In those cases, we'll loop quickly through the entire range with no intermittent GC requests, which you've demonstrated elsewhere to be a CPU hog with adverse effects on goroutine scheduling latency.

-- commits line 18 at r1:

MVCC GC work is CPU intensive

I think it's also worth considering that a future revision of MVCC GC might do away with most of the CPU intensive work. In such a world, the synchronous keyspace iteration might be the only work left in the MVCC GC queue, assuming we want accurate stats even with delayed GC during compaction.

That doesn't change anything in this PR, but it might shift how we think about the solution space around disruptive MVCC GC going forward.

pkg/kv/kvserver/gc/gc.go line 121 at r1 (raw file):

Previously, lunevalex wrote…

nit: I dont think a setting is necessary, but also have no objections to adding one.

I think it's a good idea to add one, given the potential for a negative feedback loop with starved MVCC GC and foreground traffic whose efficiency depends on timely MVCC GC. We might as well build the tool that we would need to combat this if it ever comes up in a support escalation.

pkg/kv/kvserver/gc/gc.go line 126 at r1 (raw file):

	"the admission priority to use for mvcc gc work",
	"low_pri",
	map[int64]string{

Not for now, but I wouldn't be surprised if we follow this pattern in other places and decide to package it up as a known enum in the settings package.

nvanbenschoten

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @lunevalex, @sumeerbhola, and @tbg)

pkg/kv/kvserver/gc/gc.go line 127 at r1 (raw file):

	"low_pri",
	map[int64]string{
		int64(admissionpb.LowPri):    "low_pri",

Actually, are these the correct priority levels, and is this the correct default? We use BulkNormalPri for all other bulk jobs. LowPri doesn't seem to be used outside of the admission package.

irfansharif

Should we follow this up with a change that makes mvccGCQueueTimerDuration a cluster setting?

Sure, just added a commit here. Will merge on green.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @lunevalex, @nvanbenschoten, @sumeerbhola, and @tbg)

-- commits line 16 at r1:

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I suspect that we will eventually need additional pacing in the keyspace iteration that the MVCC GC queue performs (e.g. processReplicatedKeyRange) to compute which keys to GC. In cases where many keys are being garbage collected, it is probably the case that we will issue batches of GCRequests frequently enough for this change to serve as a form of pacing. However, for MVCC GC queue passes where few keys are actually GCed, this probably won't be enough. In those cases, we'll loop quickly through the entire range with no intermittent GC requests, which you've demonstrated elsewhere to be a CPU hog with adverse effects on goroutine scheduling latency.

Agreed, more experimentation here will tell us. Added a sentence below

-- commits line 18 at r1:

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

MVCC GC work is CPU intensive

I think it's also worth considering that a future revision of MVCC GC might do away with most of the CPU intensive work. In such a world, the synchronous keyspace iteration might be the only work left in the MVCC GC queue, assuming we want accurate stats even with delayed GC during compaction.

That doesn't change anything in this PR, but it might shift how we think about the solution space around disruptive MVCC GC going forward.

Doing MVCC GC as part of compactions makes a lot of sense (I need to periodically sift through X-noremind PRs to dig up such things). Added a sentence.

pkg/kv/kvserver/gc/gc.go line 121 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think it's a good idea to add one, given the potential for a negative feedback loop with starved MVCC GC and foreground traffic whose efficiency depends on timely MVCC GC. We might as well build the tool that we would need to combat this if it ever comes up in a support escalation.

Added it for that same reason, especially given it's something we're looking to backport to 22.1. We should want this knob to go away though after more experimentation work, if that's what you're getting at Alex.

pkg/kv/kvserver/gc/gc.go line 127 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Actually, are these the correct priority levels, and is this the correct default? We use BulkNormalPri for all other bulk jobs. LowPri doesn't seem to be used outside of the admission package.

LowPri was mentioned in an earlier revision of a comment, but I think BulkNormalPri wasn't a thing then. Changed, and removed LowPri. The important thing is to have something lower than NormalPri.

nvanbenschoten

Reviewed 1 of 2 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif, @sumeerbhola, and @tbg)

pkg/kv/kvserver/mvcc_gc_queue.go line 72 at r3 (raw file):

var mvccGCQueueInterval = settings.RegisterDurationSetting(
	settings.SystemOnly,
	"kv.mvcc_gc_merge.queue_interval",

s/kv.mvcc_gc_merge.queue_interval/kv.mvcc_gc.queue_interval/

pkg/kv/kvserver/gc/gc.go line 127 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

LowPri was mentioned in an earlier revision of a comment, but I think BulkNormalPri wasn't a thing then. Changed, and removed LowPri. The important thing is to have something lower than NormalPri.

Should HighPri also be changed to UserHighPri?

GC could be expected to be LowPri, so that it does not impact user-facing traffic when resources (e.g. CPU, write capacity of the store) are scarce. However long delays in GC can slow down user-facing traffic due to more versions in the store, and can increase write amplification of the store since there is more live data. Ideally, we should adjust this priority based on how far behind we are with respect to GC-ing a particular range. Keeping the priority level static at NormalPri proved disruptive when a large volume of MVCC GC work is suddenly accrued (if an old protected timestamp record was just released for ex. following a long paused backup job being completed/canceled, or just an old, long running backup job finishing). After dynamic priority adjustment, it's not yet clear whether we need additional pacing mechanisms to provide better latency isolation, similar to ongoing work for backups. MVCC GC work is CPU intensive: \cockroachdb#82955. This patch is also speculative in nature and in response to observed incidents where NormalPri proved too disruptive. Fuller treatment would entail working off of reliable reproductions of this behaviour. Release note (performance improvement): Previously if there was sudden increase in the volume of pending MVCC GC work, there was an impact on foreground latencies. These sudden increases commonly occurred when: - gc.ttlseconds was reduced dramatically over tables/indexes that accrue a lot of MVCC garbage (think "rows being frequently deleted") - a paused backup job from a while ago (think > 1 day) was canceled/failed - a backup job that started a while ago (think > 1 day) just finished Indicators of a large increase in the volume of pending MVCC GC work is a steep climb in the "GC Queue" graph found in the DB console page, when navigating to 'Metrics', and selecting the 'Queues' dashboard. With this patch, the effect on foreground latencies as a result of this sudden build up should be reduced.

irfansharif

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten, @sumeerbhola, and @tbg)

pkg/kv/kvserver/mvcc_gc_queue.go line 72 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/kv.mvcc_gc_merge.queue_interval/kv.mvcc_gc.queue_interval/

🤦

sumeerbhola

Reviewed 1 of 3 files at r1, 2 of 2 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif, @nvanbenschoten, @sumeerbhola, and @tbg)

pkg/kv/kvserver/mvcc_gc_queue.go line 465 at r4 (raw file):

			// BulkNormalPri.
			//
			// After we implement dynamic priority adjustment, it's not clear

It is good to have this knob to start with. But I hope we never have to tune it, or do dynamic priority adjustment in the code (it will just make things harder to understand). If someone has not provisioned well enough for GC to function, then it is their problem to add more resources.

craig · 2022-08-09T23:14:00Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

irfansharif · 2022-08-10T00:00:10Z

bors r-

craig · 2022-08-10T00:00:12Z

Canceled.

Cluster setting that controls how long the MVCC GC queue waits between processing replicas. It was previously hardcoded to 1s (which is the default value), but changing it would've come in handy in support incidents as a form of manual pacing of MVCC GC work (we have a similar useful knob for the merge queue). Release note: None

irfansharif · 2022-08-10T01:07:03Z

bors r+

craig · 2022-08-10T01:07:07Z

🕐 Waiting for PR status (Github check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

craig · 2022-08-10T03:46:37Z

Build succeeded:

Bazel Essential CI (Cockroach)

blathers-crl · 2022-08-10T03:46:45Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

error creating merge commit from d5e7e0a to blathers/backport-release-22.1-85823: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

irfansharif requested review from tbg and sumeerbhola August 9, 2022 16:50

irfansharif requested a review from a team as a code owner August 9, 2022 16:50

irfansharif added the backport-22.1.x label Aug 9, 2022

irfansharif force-pushed the 220809.mvcc-gc-priority branch from 3712eea to 09a21bf Compare August 9, 2022 16:58

lunevalex reviewed Aug 9, 2022

View reviewed changes

nvanbenschoten approved these changes Aug 9, 2022

View reviewed changes

nvanbenschoten reviewed Aug 9, 2022

View reviewed changes

irfansharif force-pushed the 220809.mvcc-gc-priority branch from 09a21bf to cec0265 Compare August 9, 2022 18:42

irfansharif commented Aug 9, 2022

View reviewed changes

irfansharif force-pushed the 220809.mvcc-gc-priority branch from cec0265 to 4d0d98b Compare August 9, 2022 18:46

nvanbenschoten reviewed Aug 9, 2022

View reviewed changes

irfansharif force-pushed the 220809.mvcc-gc-priority branch from 4d0d98b to 7119010 Compare August 9, 2022 20:48

irfansharif commented Aug 9, 2022

View reviewed changes

sumeerbhola approved these changes Aug 9, 2022

View reviewed changes

irfansharif force-pushed the 220809.mvcc-gc-priority branch from 7119010 to f00b130 Compare August 10, 2022 00:01

craig bot merged commit a3cb8aa into cockroachdb:master Aug 10, 2022

cockroach-teamcity mentioned this pull request Aug 10, 2022

kvserver: lower priority level for mvcc gc work cockroachdb/docs#14771

Closed

irfansharif deleted the 220809.mvcc-gc-priority branch August 10, 2022 10:10

irfansharif mentioned this pull request Aug 10, 2022

release-22.1: kvserver: lower priority level for mvcc gc work #85899

Merged

irfansharif mentioned this pull request Oct 5, 2022

kvserver: pacing/admission control for mvcc gc #82955

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: lower priority level for mvcc gc work #85823

kvserver: lower priority level for mvcc gc work #85823

irfansharif commented Aug 9, 2022 •

edited

Loading

cockroach-teamcity commented Aug 9, 2022

lunevalex Aug 9, 2022

nvanbenschoten left a comment

nvanbenschoten left a comment

irfansharif left a comment

nvanbenschoten left a comment

irfansharif left a comment

sumeerbhola left a comment

craig bot commented Aug 9, 2022

irfansharif commented Aug 10, 2022

craig bot commented Aug 10, 2022

irfansharif commented Aug 10, 2022

craig bot commented Aug 10, 2022

craig bot commented Aug 10, 2022

blathers-crl bot commented Aug 10, 2022

kvserver: lower priority level for mvcc gc work #85823

kvserver: lower priority level for mvcc gc work #85823

Conversation

irfansharif commented Aug 9, 2022 • edited Loading

cockroach-teamcity commented Aug 9, 2022

lunevalex Aug 9, 2022

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

craig bot commented Aug 9, 2022

irfansharif commented Aug 10, 2022

craig bot commented Aug 10, 2022

irfansharif commented Aug 10, 2022

craig bot commented Aug 10, 2022

craig bot commented Aug 10, 2022

blathers-crl bot commented Aug 10, 2022

irfansharif commented Aug 9, 2022 •

edited

Loading