-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: make the StoreRebalancer aware of non-voters #61600
kvserver: make the StoreRebalancer aware of non-voters #61600
Conversation
There's a few things about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15)
pkg/kv/kvserver/store_rebalancer.go, line 514 at r1 (raw file):
) rangeDesc, zone := replWithStats.repl.DescAndZone()
It feels wrong that these four lines are duplicated here and at the very beginning of getRebalanceCandidatesBasedOnQPS
.
pkg/kv/kvserver/store_rebalancer.go, line 536 at r1 (raw file):
} // If the new set of replicas has lower diversity scores than the existing
Do you think it's worth pulling this check into a helper method as well?
pkg/kv/kvserver/store_rebalancer.go, line 627 at r1 (raw file):
currentNonVoters := rangeDesc.Replicas().NonVoterDescriptors() pickReplsToKeep := func(
Give each of these helpers a one-sentence comment explaining what they do.
pkg/kv/kvserver/store_rebalancer.go, line 627 at r1 (raw file):
currentNonVoters := rangeDesc.Replicas().NonVoterDescriptors() pickReplsToKeep := func(
These closures are questionable from a clean code perspective. The problem is that it's very difficult to know which of the 16 variables in the outer scope are captured here, so it's hard to reason about the functions, what influences their output, and whether they have any side-effects. Can be they moved outside of the outer function and live on their own? That will force their contract to be clarified. It's possible they aren't actually capturing any state, but I can't tell.
pkg/kv/kvserver/store_rebalancer.go, line 641 at r1 (raw file):
} } return false
nit: just fall through here instead of branching.
pkg/kv/kvserver/store_rebalancer.go, line 685 at r1 (raw file):
pickRemainingRepls := func( targetRepls *[]roachpb.ReplicaDescriptor, numDesiredRepls int, targetType targetReplicaType,
nit: prefer returning a new slice over mutating a reference to a slice, which is much easier to miss.
pkg/kv/kvserver/store_rebalancer_test.go, line 248 at r1 (raw file):
testCases := []struct { voterStoreIDs, nonVoterStoreIDs []roachpb.StoreID
nit: You can probably drop "StoreIDs" from each of these field names and just use voters
, nonVoters
, etc.
But I think it would be worth renaming the "expected" fields to something that better indicates what's expected of them. Something like expectedRebalanceVoters
.
pkg/kv/kvserver/store_rebalancer_test.go, line 267 at r1 (raw file):
nonLive: []roachpb.StoreID{4, 5}, qps: 100, expectedVoterStoreIDs: []roachpb.StoreID{},
nit: You started with listing out expectedVoterStoreIDs
when empty, but then stopped. Any reason for that? In general, I think it's nice to be explicit about what a test case expects, even if it's the zero value.
@aayushshah15 are we considering this a GA-blocker for v21.1? If so, the window in which it can land is getting smaller, so I think we'll want to prioritize it early next week. What do you think? |
e64d9b3
to
3af9602
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/store_rebalancer.go, line 514 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It feels wrong that these four lines are duplicated here and at the very beginning of
getRebalanceCandidatesBasedOnQPS
.
I hoisted a bunch of these range specific bits of info into its own struct. Let me know if you dislike.
pkg/kv/kvserver/store_rebalancer.go, line 536 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do you think it's worth pulling this check into a helper method as well?
Done, let me know if you dislike what I've done here.
pkg/kv/kvserver/store_rebalancer.go, line 627 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Give each of these helpers a one-sentence comment explaining what they do.
Done.
pkg/kv/kvserver/store_rebalancer.go, line 627 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
These closures are questionable from a clean code perspective. The problem is that it's very difficult to know which of the 16 variables in the outer scope are captured here, so it's hard to reason about the functions, what influences their output, and whether they have any side-effects. Can be they moved outside of the outer function and live on their own? That will force their contract to be clarified. It's possible they aren't actually capturing any state, but I can't tell.
I struggled more than I'd like to admit at refactoring this. Let me know if you dislike what I've come up with here.
pkg/kv/kvserver/store_rebalancer.go, line 641 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: just fall through here instead of branching.
Done.
pkg/kv/kvserver/store_rebalancer.go, line 685 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: prefer returning a new slice over mutating a reference to a slice, which is much easier to miss.
Done.
pkg/kv/kvserver/store_rebalancer_test.go, line 248 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: You can probably drop "StoreIDs" from each of these field names and just use
voters
,nonVoters
, etc.But I think it would be worth renaming the "expected" fields to something that better indicates what's expected of them. Something like
expectedRebalanceVoters
.
Done.
pkg/kv/kvserver/store_rebalancer_test.go, line 267 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: You started with listing out
expectedVoterStoreIDs
when empty, but then stopped. Any reason for that? In general, I think it's nice to be explicit about what a test case expects, even if it's the zero value.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)
pkg/kv/kvserver/store_rebalancer.go, line 549 at r2 (raw file):
continue } if len(targetNonVoterRepls) < rebalanceCtx.numDesiredNonVoters {
Could there be adverse effects from not rebalancing voting replicas when a range can't find enough non-voting replica targets? Will this disable the store rebalancer in important cases that we otherwise wouldn't want it disabled in?
pkg/kv/kvserver/store_rebalancer.go, line 568 at r2 (raw file):
} allTargetRepls := append(targetVoterRepls, targetNonVoterRepls...)
Let's give this a comment as well. Specifically, explain why we need to check this twice and what the difference between these two checks is.
pkg/kv/kvserver/store_rebalancer.go, line 609 at r2 (raw file):
} func (sr *StoreRebalancer) worsensDiversity(
Let's give this method a comment.
pkg/kv/kvserver/store_rebalancer_test.go, line 541 at r2 (raw file):
) require.Lenf(t, voterTargets, len(tc.expectedRebalancedVoters), "DEBUG!!! %+v", tc)
Did you mean to leave the DEBUG
in?
This commit teaches the `StoreRebalancer` to rebalance non-voting replicas. Release justification: needed for non-voting replicas Release note: None
3af9602
to
d70e286
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/kv/kvserver/store_rebalancer.go, line 549 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Could there be adverse effects from not rebalancing voting replicas when a range can't find enough non-voting replica targets? Will this disable the store rebalancer in important cases that we otherwise wouldn't want it disabled in?
My rationale while writing this was to try and not introduce a behavior change -- if you consider the old behavior to be "find a replacement for all replicas on overfull stores or don't do anything".
I did wonder about what you're suggesting, but note that that desire isn't exclusive to non-voting replicas. For instance, if we can only find valid replacements for 2 out of 5 voting replicas, we currently won't do anything but we probably should. In fact, being able to do this for voters is more important than non-voters because voters will have more restrictive constraints (so fewer compatible stores to receive them). Maybe this is what the TODO here is referring to as well, but I can't tell.
Given all of this, and just our lack of large scale testing for the store rebalancer, I don't feel particularly comfortable trying to do that in this PR. I was thinking that we can try it out in its own PR once we're through our backlog of release commitments and then later determine if it fits the bill for backport into some patch release.
What do you think? Ofcourse, I'm happy to do it here if you think it makes sense.
pkg/kv/kvserver/store_rebalancer.go, line 568 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Let's give this a comment as well. Specifically, explain why we need to check this twice and what the difference between these two checks is.
Done.
pkg/kv/kvserver/store_rebalancer.go, line 609 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Let's give this method a comment.
Done.
pkg/kv/kvserver/store_rebalancer_test.go, line 541 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Did you mean to leave the
DEBUG
in?
Nope, fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)
pkg/kv/kvserver/store_rebalancer.go, line 549 at r2 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
My rationale while writing this was to try and not introduce a behavior change -- if you consider the old behavior to be "find a replacement for all replicas on overfull stores or don't do anything".
I did wonder about what you're suggesting, but note that that desire isn't exclusive to non-voting replicas. For instance, if we can only find valid replacements for 2 out of 5 voting replicas, we currently won't do anything but we probably should. In fact, being able to do this for voters is more important than non-voters because voters will have more restrictive constraints (so fewer compatible stores to receive them). Maybe this is what the TODO here is referring to as well, but I can't tell.
Given all of this, and just our lack of large scale testing for the store rebalancer, I don't feel particularly comfortable trying to do that in this PR. I was thinking that we can try it out in its own PR once we're through our backlog of release commitments and then later determine if it fits the bill for backport into some patch release.
What do you think? Ofcourse, I'm happy to do it here if you think it makes sense.
Ack. Thanks for the detailed response. Let's just be sure to file an issue detailing this behavior and ways we could improve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reviews on this!
bors r+
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/store_rebalancer.go, line 549 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Ack. Thanks for the detailed response. Let's just be sure to file an issue detailing this behavior and ways we could improve it.
Created #62992 to track.
Build failed (retrying...): |
Build failed: |
bors r+ |
Build succeeded: |
This commit teaches the
StoreRebalancer
to rebalance non-votingreplicas.
Release justification: needed for non-voting replicas
Release note: None