Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: improve handling for removal of a replica, when multiple replicas already exist on the same node #60546

Merged
merged 1 commit into from
Feb 16, 2021

Conversation

lunevalex
Copy link
Collaborator

Fixes #60545

The allocator in some cases allows for a range to have a replica
on multiple stores of the same node. If that happens, it should allow
itself to fix the situation by removing one of the offending replicas.
This was only half working due to an ordering problem in how the replicas
appeared in the descriptor. It could remove the first replica, but not the second one.

.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: but @aayushshah15 should also sign off. Thanks for the quick patch.

How far back will we need to backport this? Also, if we're going to then we should add a release note.

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)

@aayushshah15
Copy link
Contributor

How far back will we need to backport this?

IIUC bug should only affect 20.2

Copy link
Contributor

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the backport to 20.2 may or may not be clean, my apologies for that in advance.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @lunevalex)


pkg/kv/kvserver/replica_command.go, line 1153 at r1 (raw file):

}

func validateReplicationChangesMultipleReplicasOnTheSameStore(

s/OnTheSameStore/OnTheSameNode


pkg/kv/kvserver/replica_command.go, line 1208 at r1 (raw file):

	}

	// Then, check that we're not adding a second replica on nodes that already

should this be s/nodes/stores?


pkg/kv/kvserver/replica_command.go, line 1211 at r1 (raw file):

	// have one, or "re-add" an existing replica. We delete from byNodeAndStoreID so that
	// after this loop, it contains only Nodes that we haven't seen in desc.
	for nodeID, descs := range descriptorsByNodeID {

should we rename descs to something like rDescsForNode, just to be very explicit?


pkg/kv/kvserver/replica_command.go, line 1217 at r1 (raw file):

		}
		delete(byNodeAndStoreID, nodeID)
		// The only valid thing to do when we already have multiple replicas on the

I'm having a bit of a hard time convincing myself that this is always true. When there's a (VOTER, LEARNER) pair on the same node, you're right that we'd expect the allocator to always try to remove that LEARNER, but I'm not sure if we want to make that assumption here?

For instance, the patch below produces an error, should it? I haven't tested this out, but the situation in the patch should be possible if the user intentionally / inadvertently constrains, say, 3 replicas to a node that has 3 stores.

Index: pkg/kv/kvserver/replica_command_test.go
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/pkg/kv/kvserver/replica_command_test.go b/pkg/kv/kvserver/replica_command_test.go
--- a/pkg/kv/kvserver/replica_command_test.go	(revision 688740b898700cd8ea5cd9db3ff0def457b83505)
+++ b/pkg/kv/kvserver/replica_command_test.go	(date 1613169923000)
@@ -198,7 +198,7 @@
 		InternalReplicas: []roachpb.ReplicaDescriptor{
 			{NodeID: 1, StoreID: 1},
 			{NodeID: 2, StoreID: 2},
-			{NodeID: 1, StoreID: 2, Type: &learnerType},
+			{NodeID: 1, StoreID: 3, Type: &learnerType},
 		},
 	}
 	err = validateReplicationChanges(descRebalancing, roachpb.ReplicationChanges{
@@ -208,7 +208,12 @@
 
 	// Test Case 15: same as 14 but remove the second node
 	err = validateReplicationChanges(descRebalancing, roachpb.ReplicationChanges{
-		{ChangeType: roachpb.REMOVE_VOTER, Target: roachpb.ReplicationTarget{NodeID: 1, StoreID: 2}},
+		{ChangeType: roachpb.REMOVE_VOTER, Target: roachpb.ReplicationTarget{NodeID: 1, StoreID: 3}},
+	})
+	require.NoError(t, err)
+
+	err = validateReplicationChanges(descRebalancing, roachpb.ReplicationChanges{
+		{roachpb.ADD_VOTER, roachpb.ReplicationTarget{NodeID: 1, StoreID: 4}},
 	})
 	require.NoError(t, err)
 

pkg/kv/kvserver/replica_command_test.go, line 201 at r1 (raw file):

			{NodeID: 1, StoreID: 1},
			{NodeID: 2, StoreID: 2},
			{NodeID: 1, StoreID: 2, Type: &learnerType},

nit: we should probably have 3 distinct store ids here.

Copy link
Collaborator Author

@lunevalex lunevalex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats right only 20.2, good call on release note adding.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)


pkg/kv/kvserver/replica_command.go, line 1153 at r1 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

s/OnTheSameStore/OnTheSameNode

ack


pkg/kv/kvserver/replica_command.go, line 1208 at r1 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

should this be s/nodes/stores?

we check for both and return a slightly different error message in each case


pkg/kv/kvserver/replica_command.go, line 1211 at r1 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

should we rename descs to something like rDescsForNode, just to be very explicit?

ack


pkg/kv/kvserver/replica_command.go, line 1217 at r1 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

I'm having a bit of a hard time convincing myself that this is always true. When there's a (VOTER, LEARNER) pair on the same node, you're right that we'd expect the allocator to always try to remove that LEARNER, but I'm not sure if we want to make that assumption here?

For instance, the patch below produces an error, should it? I haven't tested this out, but the situation in the patch should be possible if the user intentionally / inadvertently constrains, say, 3 replicas to a node that has 3 stores.

Index: pkg/kv/kvserver/replica_command_test.go
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/pkg/kv/kvserver/replica_command_test.go b/pkg/kv/kvserver/replica_command_test.go
--- a/pkg/kv/kvserver/replica_command_test.go	(revision 688740b898700cd8ea5cd9db3ff0def457b83505)
+++ b/pkg/kv/kvserver/replica_command_test.go	(date 1613169923000)
@@ -198,7 +198,7 @@
 		InternalReplicas: []roachpb.ReplicaDescriptor{
 			{NodeID: 1, StoreID: 1},
 			{NodeID: 2, StoreID: 2},
-			{NodeID: 1, StoreID: 2, Type: &learnerType},
+			{NodeID: 1, StoreID: 3, Type: &learnerType},
 		},
 	}
 	err = validateReplicationChanges(descRebalancing, roachpb.ReplicationChanges{
@@ -208,7 +208,12 @@
 
 	// Test Case 15: same as 14 but remove the second node
 	err = validateReplicationChanges(descRebalancing, roachpb.ReplicationChanges{
-		{ChangeType: roachpb.REMOVE_VOTER, Target: roachpb.ReplicationTarget{NodeID: 1, StoreID: 2}},
+		{ChangeType: roachpb.REMOVE_VOTER, Target: roachpb.ReplicationTarget{NodeID: 1, StoreID: 3}},
+	})
+	require.NoError(t, err)
+
+	err = validateReplicationChanges(descRebalancing, roachpb.ReplicationChanges{
+		{roachpb.ADD_VOTER, roachpb.ReplicationTarget{NodeID: 1, StoreID: 4}},
 	})
 	require.NoError(t, err)
 

We don't make that assumption here, Test Case 14 tests removing the voter and Test Case 15 tests removing the learner, either is allowed.


pkg/kv/kvserver/replica_command_test.go, line 201 at r1 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

nit: we should probably have 3 distinct store ids here.

ack

Copy link
Contributor

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @lunevalex, and @nvanbenschoten)


pkg/kv/kvserver/replica_command.go, line 1217 at r1 (raw file):

Previously, lunevalex wrote…

We don't make that assumption here, Test Case 14 tests removing the voter and Test Case 15 tests removing the learner, either is allowed.

I had the wrong impression that nodes with multiple stores aren't disallowed to have multiple replicas. This all makes sense now. The scenario in my patch above shouldn't be possible.

@lunevalex lunevalex force-pushed the alex/fix-rebalancing branch 2 times, most recently from 7226b9a to 40c0524 Compare February 16, 2021 17:37
…plicas already exist on the same node

Fixes cockroachdb#60545

The allocator in some cases allows for a range to have a replica
on multiple stores of the same node. If that happens, it should allow
itself to fix the situation by removing one of the offending replicas.
This was only half working due to an ordering problem in how the replicas
appeared in the descriptor. It could remove the first replica, but not the second one.

Release note (bug fix): 20.2 introduced an ability to rebalance replicas
between multiple stores on the same node. This change fixed a problem
with that feature, where ocassionaly an intra-node rebalance would
fail and a range would get stuck permanently under replicated.
@lunevalex
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 16, 2021

Build succeeded:

@craig craig bot merged commit 959750d into cockroachdb:master Feb 16, 2021
craig bot pushed a commit that referenced this pull request Feb 17, 2021
60633: release-20.2: kvserver: improve handling for removal of a replica, when multiple replicas already exist on the same node r=aayushshah15 a=lunevalex

Backport 1/1 commits from #60546.

/cc @cockroachdb/release

---

Fixes #60545

The allocator in some cases allows for a range to have a replica
on multiple stores of the same node. If that happens, it should allow
itself to fix the situation by removing one of the offending replicas.
This was only half working due to an ordering problem in how the replicas
appeared in the descriptor. It could remove the first replica, but not the second one.

.

Release note: None


Co-authored-by: Alex Lunev <alexl@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: rebalancing between stores on the same node fails
4 participants