Add locking between replicaGCQueue and multiraft.state.createGroup. #2868

bdarnell · 2015-10-20T18:25:21Z

This partially addresses the race seen in #2815. A similar race still
occurs but much less frequently.

bdarnell · 2015-10-20T18:27:20Z

I'm not sure how to test this. I've been testing it locally by increasing the iteration count in TestRaftRemoveRace, although checking in that change would noticeably increase total test runtime. I don't see a good way to trigger the race in a more direct/controlled way.

tamird · 2015-10-20T18:30:02Z

storage/replica_gc_queue.go

+		if _, err := rng.rm.GetReplica(desc.RangeID); err == nil {
+			log.Infof("replica recreated during deletion; aborting deletion")
+		}
+
 		// TODO(bdarnell): add some sort of locking to prevent the range


tamird · 2015-10-20T18:30:17Z

What is the similar race? can you document it?

bdarnell · 2015-10-20T18:32:13Z

I'm still working on identifying the similar race. All I know so far is that it produces the same error message as #2815 and it takes over a minute for the test to reproduce it.

bdarnell · 2015-10-20T20:58:34Z

One "similar race" is that I forgot to return nil after the "aborting deletion" log line. But even with that fixed I'm seeing other rare failures. I suspect that what may be happening is that the node is sometimes falling far enough behind that it is learning about multiple iterations of the add/remove loop at once.

This partially addresses the race seen in cockroachdb#2815. A similar race still occurs but much less frequently.

tamird · 2015-10-21T16:48:16Z

LGTM

Add locking between replicaGCQueue and multiraft.state.createGroup.

tamird reviewed Oct 20, 2015
View reviewed changes

Add locking between replicaGCQueue and multiraft.state.createGroup.

eed3b7d

This partially addresses the race seen in cockroachdb#2815. A similar race still occurs but much less frequently.

bdarnell force-pushed the destroy-create-race branch from 0e18eb8 to eed3b7d Compare October 20, 2015 21:01

bdarnell added a commit that referenced this pull request Oct 21, 2015

Merge pull request #2868 from bdarnell/destroy-create-race

29b526f

Add locking between replicaGCQueue and multiraft.state.createGroup.

bdarnell merged commit 29b526f into cockroachdb:master Oct 21, 2015

bdarnell deleted the destroy-create-race branch October 21, 2015 16:50

bdarnell mentioned this pull request Nov 11, 2015

flakiness in TestSplitSnapshotRace_SnapshotWins #3038

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add locking between replicaGCQueue and multiraft.state.createGroup. #2868

Add locking between replicaGCQueue and multiraft.state.createGroup. #2868

bdarnell commented Oct 20, 2015

bdarnell commented Oct 20, 2015

tamird Oct 20, 2015

bdarnell Oct 20, 2015

tamird commented Oct 20, 2015

bdarnell commented Oct 20, 2015

bdarnell commented Oct 20, 2015

tamird commented Oct 21, 2015

Add locking between replicaGCQueue and multiraft.state.createGroup. #2868

Add locking between replicaGCQueue and multiraft.state.createGroup. #2868

Conversation

bdarnell commented Oct 20, 2015

bdarnell commented Oct 20, 2015

tamird Oct 20, 2015

Choose a reason for hiding this comment

bdarnell Oct 20, 2015

Choose a reason for hiding this comment

tamird commented Oct 20, 2015

bdarnell commented Oct 20, 2015

bdarnell commented Oct 20, 2015

tamird commented Oct 21, 2015