Skip to content

Commit

Permalink
kvserver: lower priority level for mvcc gc work
Browse files Browse the repository at this point in the history
GC could be expected to be LowPri, so that it does not impact
user-facing traffic when resources (e.g. CPU, write capacity of the
store) are scarce. However long delays in GC can slow down user-facing
traffic due to more versions in the store, and can increase write
amplification of the store since there is more live data. Ideally, we
should adjust this priority based on how far behind we are with respect
to GC-ing a particular range. Keeping the priority level static at
NormalPri proved disruptive when a large volume of MVCC GC work is
suddenly accrued (if an old protected timestamp record was just released
for ex. following a long paused backup job being completed/canceled, or
just an old, long running backup job finishing).

After dynamic priority adjustment, it's not yet clear whether we need
additional pacing mechanisms to provide better latency isolation,
similar to ongoing work for backups. MVCC GC work is CPU intensive:
\#82955. This patch is also speculative in nature and in response to
observed incidents where NormalPri proved too disruptive. Fuller
treatment would entail working off of reliable reproductions of this
behaviour.

Release note (performance improvement): Previously if there was sudden
increase in the volume of pending MVCC GC work, there was an impact on
foreground latencies. These sudden increases commonly occurred when:
- gc.ttlseconds was reduced dramatically over tables/indexes that accrue
  a lot of MVCC garbage (think "rows being frequently deleted")
- a paused backup job from a while ago (think > 1 day) was
  canceled/failed
- a backup job that started a while ago (think > 1 day) just finished

Indicators of a large increase in the volume of pending MVCC GC work is
a steep climb in the "GC Queue" graph found in the DB console page, when
navigating to 'Metrics', and selecting the 'Queues' dashboard. With this
patch, the effect on foreground latencies as a result of this sudden
build up should be reduced.
  • Loading branch information
irfansharif committed Aug 9, 2022
1 parent 24906bd commit d5e7e0a
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 9 deletions.
1 change: 1 addition & 0 deletions pkg/kv/kvserver/gc/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ go_library(
"//pkg/storage",
"//pkg/storage/enginepb",
"//pkg/util",
"//pkg/util/admission/admissionpb",
"//pkg/util/bufalloc",
"//pkg/util/contextutil",
"//pkg/util/hlc",
Expand Down
15 changes: 15 additions & 0 deletions pkg/kv/kvserver/gc/gc.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/storage"
"github.com/cockroachdb/cockroach/pkg/storage/enginepb"
"github.com/cockroachdb/cockroach/pkg/util"
"github.com/cockroachdb/cockroach/pkg/util/admission/admissionpb"
"github.com/cockroachdb/cockroach/pkg/util/bufalloc"
"github.com/cockroachdb/cockroach/pkg/util/contextutil"
"github.com/cockroachdb/cockroach/pkg/util/hlc"
Expand Down Expand Up @@ -115,6 +116,20 @@ var MaxIntentKeyBytesPerCleanupBatch = settings.RegisterIntSetting(
settings.NonNegativeInt,
)

// AdmissionPriority determines the admission priority level to use for MVCC GC
// work.
var AdmissionPriority = settings.RegisterEnumSetting(
settings.SystemOnly,
"kv.gc.admission_priority",
"the admission priority to use for mvcc gc work",
"bulk_normal_pri",
map[int64]string{
int64(admissionpb.BulkNormalPri): "bulk_normal_pri",
int64(admissionpb.NormalPri): "normal_pri",
int64(admissionpb.UserHighPri): "user_high_pri",
},
)

// CalculateThreshold calculates the GC threshold given the policy and the
// current view of time.
func CalculateThreshold(now hlc.Timestamp, gcttl time.Duration) (threshold hlc.Timestamp) {
Expand Down
33 changes: 24 additions & 9 deletions pkg/kv/kvserver/mvcc_gc_queue.go
Original file line number Diff line number Diff line change
Expand Up @@ -445,17 +445,32 @@ func (r *replicaGCer) send(ctx context.Context, req roachpb.GCRequest) error {
// admission control here, as we are bypassing server.Node.
var admissionHandle interface{}
if r.admissionController != nil {
pri := admissionpb.WorkPriority(gc.AdmissionPriority.Get(&r.repl.ClusterSettings().SV))
ba.AdmissionHeader = roachpb.AdmissionHeader{
// GC is currently assigned NormalPri.
// TODO(irfansharif): GC could be expected to be BulkNormalPri, so
// that it does not impact user-facing traffic when resources (e.g.
// CPU, write capacity of the store) are scarce. However long delays
// in GC can slow down user-facing traffic due to more versions in
// the store, and can increase write amplification of the store
// since there is more live data. Ideally, we should adjust this
// priority based on how far behind we are with respect to GC-ing
// data in this range. Keeping it static at NormalPri proved
// disruptive when a large volume of MVCC GC work is suddenly
// accrued (if an old protected timestamp record was just released
// for ex. following a long paused backup job being
// completed/canceled, or just an old, long running backup job
// finishing). For now, use a cluster setting that defaults to
// BulkNormalPri.
//
// TODO(kv): GC could be expected to be LowPri, so that it does not
// impact user-facing traffic when resources (e.g. CPU, write capacity
// of the store) are scarce. However long delays in GC can slow down
// user-facing traffic due to more versions in the store, and can
// increase write amplification of the store since there is more live
// data. Ideally, we should adjust this priority based on how far behind
// we are wrt GCing in this range.
Priority: int32(admissionpb.NormalPri),
// After we implement dynamic priority adjustment, it's not clear
// whether we need additional pacing mechanisms to provide better
// latency isolation similar to ongoing work for backups (since MVCC
// GC work is CPU intensive): #82955. It's also worth noting that we
// might be able to do most MVCC GC work as part of regular
// compactions (#42514) -- the CPU use by the MVCC GC queue during
// keyspace might still be worth explicitly accounting/limiting, but
// it'll be lessened overall.
Priority: int32(pri),
CreateTime: timeutil.Now().UnixNano(),
Source: roachpb.AdmissionHeader_ROOT_KV,
NoMemoryReservedAtSource: true,
Expand Down

0 comments on commit d5e7e0a

Please sign in to comment.