Skip to content

Commit

Permalink
kvserver: check L0 sub-levels on allocation
Browse files Browse the repository at this point in the history
Previously, the only store health signal used as a hard allocation and
rebalancing constraint was disk capacity. This patch introduces L0
sub-levels as an additional constraint, to avoid allocation and
rebalancing to replicas to stores which are unhealthy, indicated by a
high number of L0 sub-levels.

A store's sub-level count  must exceed both the (1) threshold and (2)
cluster in order to be considered unhealthy. The average check ensures
that a cluster full of  moderately high read amplification stores is not
unable to make progress, whilst still ensuring that positively skewed
distributions exclude the positive tail.

Simulation of the effect on candidate exclusion under different L0
sub-level distributions by using the mean as an additional check vs
percentiles can be found here:
https://gist.github.com/kvoli/be27efd4662e89e8918430a9c7117858

The threshold corresponds to the cluster setting
`kv.allocator.L0_sublevels_threshold`, which is the number of L0
sub-levels, that when a candidate store exceeds it will be potentially
excluded as a target for rebalancing, or both rebalancing and allocation
of replicas.

The enforcement of this threshold can be applied under 4 different
levels of strictness. This is configured by the cluster setting:
`kv.allocator.L0_sublevels_threshold_enforce`.

The 4 levels are:

`block_none`: L0 sub-levels is ignored entirely.
`block_none_log`: L0 sub-levels are logged if threshold exceeded.

Both states below log as above.

`block_rebalance_to`: L0 sub-levels are considered when excluding stores
for rebalance targets.
`block_all`: L0 sub-levels are considered when excluding stores for
rebalance targets and allocation targets.

By default, `kv.allocator.L0_sublevels_threshold` is `20`. Which
corresponds to admissions control's threshold, above which it begins
limiting admission of work to a store based on store health. The default
enforcement level of `kv.allocator.L0_sublevels_threshold_enforce` is
`block_none_log`.

resolves cockroachdb#73714

Release justification: low risk, high benefit during high read
amplification scenarios where an operator may limit rebalancing to high
read amplification stores, to stop fueling the flame.

Release note (ops change): introduce cluster settings
`kv.allocator.l0_sublevels_threshold` and
`kv.allocator.L0_sublevels_threshold_enforce`, which enable excluding
stores as targets for allocation and rebalancing of replicas when they
have high read amplification, indicated by the number of L0 sub-levels
in level 0 of the store's LSM. When both
`kv.allocator.l0_sublevels_threshold` and the cluster average is
exceeded, the action corresponding to
`kv.allocator.l0_sublevels_threshold_enforce` is taken. `block_none`
will exclude no candidate stores, `block_none_log` will exclude no
candidates but log an event, `block_rebalance_to` will exclude
candidates stores from being targets of rebalance actions, `block_all`
will exclude candidate stores from being targets of both allocation and
rebalancing. Default `kv.allocator.l0_sublevels_threshold` is set to
`20` and `kv.allocator.l0_sublevels_threshold_enforce` is set to
`block_none_log`.
  • Loading branch information
kvoli committed Apr 7, 2022
1 parent 4940bc1 commit deb3bf4
Show file tree
Hide file tree
Showing 17 changed files with 1,060 additions and 132 deletions.
2 changes: 1 addition & 1 deletion docs/generated/settings/settings-for-tenants.txt
Original file line number Diff line number Diff line change
Expand Up @@ -193,4 +193,4 @@ trace.jaeger.agent string the address of a Jaeger agent to receive traces using
trace.opentelemetry.collector string address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as <host>:<port>. If no port is specified, 4317 will be used.
trace.span_registry.enabled boolean true if set, ongoing traces can be seen at https://<ui>/#/debug/tracez
trace.zipkin.collector string the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used.
version version 21.2-106 set the active cluster version in the format '<major>.<minor>'
version version 21.2-108 set the active cluster version in the format '<major>.<minor>'
2 changes: 1 addition & 1 deletion docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,6 @@
<tr><td><code>trace.opentelemetry.collector</code></td><td>string</td><td><code></code></td><td>address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as <host>:<port>. If no port is specified, 4317 will be used.</td></tr>
<tr><td><code>trace.span_registry.enabled</code></td><td>boolean</td><td><code>true</code></td><td>if set, ongoing traces can be seen at https://<ui>/#/debug/tracez</td></tr>
<tr><td><code>trace.zipkin.collector</code></td><td>string</td><td><code></code></td><td>the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used.</td></tr>
<tr><td><code>version</code></td><td>version</td><td><code>21.2-106</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
<tr><td><code>version</code></td><td>version</td><td><code>21.2-108</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
</tbody>
</table>
8 changes: 8 additions & 0 deletions pkg/clusterversion/cockroach_versions.go
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,10 @@ const (
// to 'yes|no|only'
EnableNewChangefeedOptions

// GossipL0Sublevels is the version where L0 sublevels are gossiped
// in a store descriptor's capacity and can be used for allocation
// decisions.
GossipL0Sublevels
// *************************************************
// Step (1): Add new versions here.
// Do not add new versions to a patch release.
Expand Down Expand Up @@ -580,6 +584,10 @@ var versionsSingleton = keyedVersions{
Version: roachpb.Version{Major: 21, Minor: 2, Internal: 106},
},

{
Key: GossipL0Sublevels,
Version: roachpb.Version{Major: 21, Minor: 2, Internal: 108},
},
// *************************************************
// Step (2): Add new versions here.
// Do not add new versions to a patch release.
Expand Down
5 changes: 3 additions & 2 deletions pkg/clusterversion/key_string.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

35 changes: 30 additions & 5 deletions pkg/kv/kvserver/allocator.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (
"strings"
"time"

"github.com/cockroachdb/cockroach/pkg/clusterversion"
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/constraint"
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/settings"
Expand Down Expand Up @@ -899,7 +900,7 @@ func (a *Allocator) allocateTarget(
conf,
existingVoters,
existingNonVoters,
a.scorerOptions(),
a.scorerOptions(ctx),
// When allocating a *new* replica, we explicitly disregard nodes with any
// existing replicas. This is important for multi-store scenarios as
// otherwise, stores on the nodes that have existing replicas are simply
Expand Down Expand Up @@ -1122,6 +1123,7 @@ func (a Allocator) removeTarget(

replicaSetForDiversityCalc := getReplicasForDiversityCalc(targetType, existingVoters, existingReplicas)
rankedCandidates := candidateListForRemoval(
ctx,
candidateStoreList,
constraintsChecker,
a.storePool.getLocalitiesByStore(replicaSetForDiversityCalc),
Expand Down Expand Up @@ -1451,16 +1453,18 @@ func (a Allocator) RebalanceNonVoter(
)
}

func (a *Allocator) scorerOptions() *rangeCountScorerOptions {
func (a *Allocator) scorerOptions(ctx context.Context) *rangeCountScorerOptions {
return &rangeCountScorerOptions{
storeHealthOptions: a.storeHealthOptions(ctx),
deterministic: a.storePool.deterministic,
rangeRebalanceThreshold: rangeRebalanceThreshold.Get(&a.storePool.st.SV),
}
}

func (a *Allocator) scorerOptionsForScatter() *scatterScorerOptions {
func (a *Allocator) scorerOptionsForScatter(ctx context.Context) *scatterScorerOptions {
return &scatterScorerOptions{
rangeCountScorerOptions: rangeCountScorerOptions{
storeHealthOptions: a.storeHealthOptions(ctx),
deterministic: a.storePool.deterministic,
rangeRebalanceThreshold: 0,
},
Expand Down Expand Up @@ -1588,6 +1592,24 @@ func (a *Allocator) leaseholderShouldMoveDueToPreferences(
return true
}

// storeHealthOptions returns the store health options, currently only
// considering the threshold for L0 sub-levels. This threshold is not
// considered in allocation or rebalancing decisions (excluding candidate
// stores as targets) when enforcementLevel is set to storeHealthNoAction or
// storeHealthLogOnly. By default storeHealthLogOnly is the action taken. When
// there is a mixed version cluster, storeHealthNoAction is set instead.
func (a *Allocator) storeHealthOptions(ctx context.Context) storeHealthOptions {
enforcementLevel := storeHealthNoAction
if a.storePool.st.Version.IsActive(ctx, clusterversion.GossipL0Sublevels) {
enforcementLevel = storeHealthEnforcement(l0SublevelsThresholdEnforce.Get(&a.storePool.st.SV))
}

return storeHealthOptions{
enforcementLevel: enforcementLevel,
l0SublevelThreshold: l0SublevelsThreshold.Get(&a.storePool.st.SV),
}
}

// TransferLeaseTarget returns a suitable replica to transfer the range lease
// to from the provided list. It includes the current lease holder replica
// unless asked to do otherwise by the excludeLeaseRepl parameter.
Expand Down Expand Up @@ -1735,11 +1757,14 @@ func (a *Allocator) TransferLeaseTarget(
// https://github.com/cockroachdb/cockroach/issues/75630.
bestStore, noRebalanceReason := bestStoreToMinimizeQPSDelta(
leaseReplQPS,
qpsRebalanceThreshold.Get(&a.storePool.st.SV),
minQPSDifferenceForTransfers.Get(&a.storePool.st.SV),
leaseRepl.StoreID(),
candidates,
storeDescMap,
&qpsScorerOptions{
storeHealthOptions: a.storeHealthOptions(ctx),
qpsRebalanceThreshold: qpsRebalanceThreshold.Get(&a.storePool.st.SV),
minRequiredQPSDiff: minQPSDifferenceForTransfers.Get(&a.storePool.st.SV),
},
)

switch noRebalanceReason {
Expand Down
Loading

0 comments on commit deb3bf4

Please sign in to comment.