-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: actuate load-based replica rebalancing under heterogeneous localities #65379
kvserver: actuate load-based replica rebalancing under heterogeneous localities #65379
Commits on Sep 8, 2021
-
kvserver: clean up convergesScore computation during rebalancing
Release justification: Fixes high priority bug Release note: None
Configuration menu - View commit details
-
Copy full SHA for 5eba047 - Browse repository at this point
Copy the full SHA 5eba047View commit details -
kvserver: allow computing
balanceScore
off of QPS for rebalancingPreviously, the replica rebalancing logic inside the allocator only computed `balanceScore` (a score of whether a store is overfull, underfull or balanced based on some signal) based on range count. This commit augments the replica rebalancing logic to support an option to allow computing `balanceScore` based on QPS instead. When the `balanceScore` is being computed off of QPS, we disable `convergesScore` (which we can only compute off of RangeCount and would typically take precedence over `balanceScore`). A future commit in this patchset will leverage this option in the `StoreRebalancer` to make zone-aware rebalancing decisions based on QPS. Release justification: Fixes high priority bug Release note: None
Configuration menu - View commit details
-
Copy full SHA for b88a2b7 - Browse repository at this point
Copy the full SHA b88a2b7View commit details -
kvserver: sharpen computation of load based signals for replica removal
This commit improves the computation of `convergesScore` and `balanceScore` during replica removal by computing these scores only in relation to the set of candidates that are the least diverse (i.e. the candidates that are actually being considered for removal). This is necessary for these load based signals to be meaningful in heterogeneously loaded localities. Release justification: Fixes high priority bug Release note: None
Configuration menu - View commit details
-
Copy full SHA for 3f4ed4e - Browse repository at this point
Copy the full SHA 3f4ed4eView commit details -
kvserver: actuate load-based replica rebalancing under heterogeneous …
…localities This commit teaches the `StoreRebalancer` to make load-based rebalancing decisions that are meaningful within the context of the replication constraints placed on the ranges being relocated and the set of stores that can legally receive replicas for such ranges. Previously, the `StoreRebalancer` would compute the QPS underfull and overfull thresholds based on the overall average QPS being served by all stores in the cluster. Notably, this included stores that were in replication zones that would not satisfy required constraints for the range being considered for rebalancing. This meant that the store rebalancer would effectively never be able to rebalance ranges within the stores inside heavily loaded replication zones (since all the _valid_ stores would be above the overfull thresholds). This patch is a move away from the bespoke relocation logic in the `StoreRebalancer`. Instead, we have the `StoreRebalancer` rely on the rebalancing logic used by the `replicateQueue` that already has the machinery to compute load based signals for candidates _relative to other comparable stores_. The main difference here is that the `StoreRebalancer` uses this machinery to promote convergence of QPS across stores, whereas the `replicateQueue` uses it to promote convergence of range counts. A series of preceeding commits in this patchset generalize the existing replica rebalancing logic, and this commit teaches the `StoreRebalancer` to use it. This generalization also addresses another key limitation (see cockroachdb#62922) of the `StoreRebalancer` regarding its inability to make partial improvements to a range. Previously, if the `StoreRebalancer` couldn't move a range _entirely_ off of overfull stores, it would give up and not even move the subset of replicas it could. This is no longer the case. Resolves cockroachdb#61883 Resolves cockroachdb#62992 Release justification: Fixes high priority bug Release note (performance improvement): QPS-based replica rebalancing is now aware of different constraints placed on different replication zones. This means that heterogeneously loaded replication zones (for instance, regions) will achieve a more even distribution of QPS within the stores inside each such zone. /cc @cockroachdb/kv
Configuration menu - View commit details
-
Copy full SHA for d611828 - Browse repository at this point
Copy the full SHA d611828View commit details -
kvserver: refactor allocator's scorer options
This commit turns the allocator's `scorerOptions` into an interface that has two implementations: one that promotes the balancing of range count across comparable stores, and another that promotes the balancing of QPS across comparable stores. The replicateQueue uses the former, whereas the `StoreRebalancer` uses the latter. Release justification: Fixes high priority bug Release note: None
Configuration menu - View commit details
-
Copy full SHA for 3efcecf - Browse repository at this point
Copy the full SHA 3efcecfView commit details -
kvserver: rename
StoreList.filter
This commit renames `StoreList`'s `filter()` method to `excludeInvalid()` as the existing name was ambiguous. Release justification: Fixes high priority bug Release note: None
Configuration menu - View commit details
-
Copy full SHA for 153fd6b - Browse repository at this point
Copy the full SHA 153fd6bView commit details -
kvserver: promote QPS convergence during load-based lease rebalancing
This commit augments `TransferLeaseTarget()` by adding a mode that picks the best lease transfer target that would lead to QPS convergence across the stores that have a replica for a given range. This commit implements a strategy that predicates lease transfer decisions on whether they would serve to reduce the QPS delta between existing replicas' stores. Resolves cockroachdb#31135 Release justification: Fixes high priority bug Release note (bug fix): Previously, the store rebalancer was unable to rebalance leases for hot ranges that received a disproportionate amount of traffic relative to the rest of the cluster. This often led to prolonged single node hotspots in certain workloads that led to hot ranges. This bug is now fixed.
Configuration menu - View commit details
-
Copy full SHA for d61f474 - Browse repository at this point
Copy the full SHA d61f474View commit details