Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: make the StoreRebalancer aware of non-voters #61600

Conversation

aayushshah15
Copy link
Contributor

This commit teaches the StoreRebalancer to rebalance non-voting
replicas.

Release justification: needed for non-voting replicas
Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@aayushshah15
Copy link
Contributor Author

There's a few things about store_rebalancer{_test}.go that are incompatible with our current style guide, but I have not made an effort to refactor / clean them up in this PR.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15)


pkg/kv/kvserver/store_rebalancer.go, line 514 at r1 (raw file):

		)

		rangeDesc, zone := replWithStats.repl.DescAndZone()

It feels wrong that these four lines are duplicated here and at the very beginning of getRebalanceCandidatesBasedOnQPS.


pkg/kv/kvserver/store_rebalancer.go, line 536 at r1 (raw file):

		}

		// If the new set of replicas has lower diversity scores than the existing

Do you think it's worth pulling this check into a helper method as well?


pkg/kv/kvserver/store_rebalancer.go, line 627 at r1 (raw file):

	currentNonVoters := rangeDesc.Replicas().NonVoterDescriptors()

	pickReplsToKeep := func(

Give each of these helpers a one-sentence comment explaining what they do.


pkg/kv/kvserver/store_rebalancer.go, line 627 at r1 (raw file):

	currentNonVoters := rangeDesc.Replicas().NonVoterDescriptors()

	pickReplsToKeep := func(

These closures are questionable from a clean code perspective. The problem is that it's very difficult to know which of the 16 variables in the outer scope are captured here, so it's hard to reason about the functions, what influences their output, and whether they have any side-effects. Can be they moved outside of the outer function and live on their own? That will force their contract to be clarified. It's possible they aren't actually capturing any state, but I can't tell.


pkg/kv/kvserver/store_rebalancer.go, line 641 at r1 (raw file):

					}
				}
				return false

nit: just fall through here instead of branching.


pkg/kv/kvserver/store_rebalancer.go, line 685 at r1 (raw file):

	pickRemainingRepls := func(
		targetRepls *[]roachpb.ReplicaDescriptor, numDesiredRepls int, targetType targetReplicaType,

nit: prefer returning a new slice over mutating a reference to a slice, which is much easier to miss.


pkg/kv/kvserver/store_rebalancer_test.go, line 248 at r1 (raw file):

	testCases := []struct {
		voterStoreIDs, nonVoterStoreIDs []roachpb.StoreID

nit: You can probably drop "StoreIDs" from each of these field names and just use voters, nonVoters, etc.

But I think it would be worth renaming the "expected" fields to something that better indicates what's expected of them. Something like expectedRebalanceVoters.


pkg/kv/kvserver/store_rebalancer_test.go, line 267 at r1 (raw file):

			nonLive:               []roachpb.StoreID{4, 5},
			qps:                   100,
			expectedVoterStoreIDs: []roachpb.StoreID{},

nit: You started with listing out expectedVoterStoreIDs when empty, but then stopped. Any reason for that? In general, I think it's nice to be explicit about what a test case expects, even if it's the zero value.

@nvanbenschoten
Copy link
Member

@aayushshah15 are we considering this a GA-blocker for v21.1? If so, the window in which it can land is getting smaller, so I think we'll want to prioritize it early next week. What do you think?

@aayushshah15 aayushshah15 force-pushed the relocate-non-voters-via-store-rebalancer branch 2 times, most recently from e64d9b3 to 3af9602 Compare March 28, 2021 01:26
Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/store_rebalancer.go, line 514 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It feels wrong that these four lines are duplicated here and at the very beginning of getRebalanceCandidatesBasedOnQPS.

I hoisted a bunch of these range specific bits of info into its own struct. Let me know if you dislike.


pkg/kv/kvserver/store_rebalancer.go, line 536 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do you think it's worth pulling this check into a helper method as well?

Done, let me know if you dislike what I've done here.


pkg/kv/kvserver/store_rebalancer.go, line 627 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Give each of these helpers a one-sentence comment explaining what they do.

Done.


pkg/kv/kvserver/store_rebalancer.go, line 627 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

These closures are questionable from a clean code perspective. The problem is that it's very difficult to know which of the 16 variables in the outer scope are captured here, so it's hard to reason about the functions, what influences their output, and whether they have any side-effects. Can be they moved outside of the outer function and live on their own? That will force their contract to be clarified. It's possible they aren't actually capturing any state, but I can't tell.

I struggled more than I'd like to admit at refactoring this. Let me know if you dislike what I've come up with here.


pkg/kv/kvserver/store_rebalancer.go, line 641 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: just fall through here instead of branching.

Done.


pkg/kv/kvserver/store_rebalancer.go, line 685 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: prefer returning a new slice over mutating a reference to a slice, which is much easier to miss.

Done.


pkg/kv/kvserver/store_rebalancer_test.go, line 248 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: You can probably drop "StoreIDs" from each of these field names and just use voters, nonVoters, etc.

But I think it would be worth renaming the "expected" fields to something that better indicates what's expected of them. Something like expectedRebalanceVoters.

Done.


pkg/kv/kvserver/store_rebalancer_test.go, line 267 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: You started with listing out expectedVoterStoreIDs when empty, but then stopped. Any reason for that? In general, I think it's nice to be explicit about what a test case expects, even if it's the zero value.

Done.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)


pkg/kv/kvserver/store_rebalancer.go, line 549 at r2 (raw file):

			continue
		}
		if len(targetNonVoterRepls) < rebalanceCtx.numDesiredNonVoters {

Could there be adverse effects from not rebalancing voting replicas when a range can't find enough non-voting replica targets? Will this disable the store rebalancer in important cases that we otherwise wouldn't want it disabled in?


pkg/kv/kvserver/store_rebalancer.go, line 568 at r2 (raw file):

		}

		allTargetRepls := append(targetVoterRepls, targetNonVoterRepls...)

Let's give this a comment as well. Specifically, explain why we need to check this twice and what the difference between these two checks is.


pkg/kv/kvserver/store_rebalancer.go, line 609 at r2 (raw file):

}

func (sr *StoreRebalancer) worsensDiversity(

Let's give this method a comment.


pkg/kv/kvserver/store_rebalancer_test.go, line 541 at r2 (raw file):

			)

			require.Lenf(t, voterTargets, len(tc.expectedRebalancedVoters), "DEBUG!!! %+v", tc)

Did you mean to leave the DEBUG in?

This commit teaches the `StoreRebalancer` to rebalance non-voting
replicas.

Release justification: needed for non-voting replicas
Release note: None
@aayushshah15 aayushshah15 force-pushed the relocate-non-voters-via-store-rebalancer branch from 3af9602 to d70e286 Compare March 30, 2021 18:23
Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/kv/kvserver/store_rebalancer.go, line 549 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could there be adverse effects from not rebalancing voting replicas when a range can't find enough non-voting replica targets? Will this disable the store rebalancer in important cases that we otherwise wouldn't want it disabled in?

My rationale while writing this was to try and not introduce a behavior change -- if you consider the old behavior to be "find a replacement for all replicas on overfull stores or don't do anything".

I did wonder about what you're suggesting, but note that that desire isn't exclusive to non-voting replicas. For instance, if we can only find valid replacements for 2 out of 5 voting replicas, we currently won't do anything but we probably should. In fact, being able to do this for voters is more important than non-voters because voters will have more restrictive constraints (so fewer compatible stores to receive them). Maybe this is what the TODO here is referring to as well, but I can't tell.

Given all of this, and just our lack of large scale testing for the store rebalancer, I don't feel particularly comfortable trying to do that in this PR. I was thinking that we can try it out in its own PR once we're through our backlog of release commitments and then later determine if it fits the bill for backport into some patch release.

What do you think? Ofcourse, I'm happy to do it here if you think it makes sense.


pkg/kv/kvserver/store_rebalancer.go, line 568 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's give this a comment as well. Specifically, explain why we need to check this twice and what the difference between these two checks is.

Done.


pkg/kv/kvserver/store_rebalancer.go, line 609 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's give this method a comment.

Done.


pkg/kv/kvserver/store_rebalancer_test.go, line 541 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you mean to leave the DEBUG in?

Nope, fixed.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)


pkg/kv/kvserver/store_rebalancer.go, line 549 at r2 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

My rationale while writing this was to try and not introduce a behavior change -- if you consider the old behavior to be "find a replacement for all replicas on overfull stores or don't do anything".

I did wonder about what you're suggesting, but note that that desire isn't exclusive to non-voting replicas. For instance, if we can only find valid replacements for 2 out of 5 voting replicas, we currently won't do anything but we probably should. In fact, being able to do this for voters is more important than non-voters because voters will have more restrictive constraints (so fewer compatible stores to receive them). Maybe this is what the TODO here is referring to as well, but I can't tell.

Given all of this, and just our lack of large scale testing for the store rebalancer, I don't feel particularly comfortable trying to do that in this PR. I was thinking that we can try it out in its own PR once we're through our backlog of release commitments and then later determine if it fits the bill for backport into some patch release.

What do you think? Ofcourse, I'm happy to do it here if you think it makes sense.

Ack. Thanks for the detailed response. Let's just be sure to file an issue detailing this behavior and ways we could improve it.

Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews on this!

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/store_rebalancer.go, line 549 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Ack. Thanks for the detailed response. Let's just be sure to file an issue detailing this behavior and ways we could improve it.

Created #62992 to track.

@craig
Copy link
Contributor

craig bot commented Apr 1, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Apr 1, 2021

Build failed:

@aayushshah15 aayushshah15 reopened this Apr 1, 2021
@aayushshah15
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 2, 2021

Build succeeded:

@craig craig bot merged commit 0966b27 into cockroachdb:master Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants