Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: enable "CheckQuorum" when starting with "ForceNewCluster" #9347

Merged
merged 1 commit into from
Feb 26, 2018

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Feb 23, 2018

We enable "raft.Config.CheckQuorum" by default in other
Raft initial starts. So should start with "ForceNewCluster".

If new members are added to standalone node with CheckQuorum == false and the standalone node is the leader, this may reduce cluster availabilities, since the leader won't check quorum activities.

For current etcd, it won't help a lot, but once we enable Pre-Vote, this will be useful. For instance, pb.MsgAppResp from isolated candidate may be dropped or not even be triggered with Pre-Vote, and then pb.Msg[Pre]Vote arrives with higher term. And if the check quorum is false, the vote request would force current leader to step down even if the leader lease is still valid. If check quorum is true, the vote request will be ignored when its leader lease is still valid, thus less disruptive.

Reproducible (will write a better test case after pre-vote feature):

func TestForceNewClusterCheckQuorum(t *testing.T) {
	defer testutil.AfterTest(t)

	clus := NewClusterV3(t, &ClusterConfig{Size: 3})
	defer clus.Terminate(t)

	clus.WaitLeader(t)

	_, err := clus.Client(0).Put(context.TODO(), "foo", "bar")
	if err != nil {
		t.Fatal(err)
	}

	clus.Members[0].Stop(t)
	clus.Members[1].Terminate(t)
	clus.Members[2].Terminate(t)
	clus.Members = clus.Members[:1]
	clus.Members[0].ForceNewCluster = true
	clus.Members[0].ElectionTicks *= 2 // greater leader lease

	if err = clus.Members[0].Restart(t); err != nil {
		t.Fatalf("unexpected ForceRestart error: %v", err)
	}

	resp, err := clus.Client(0).MemberList(context.Background())
	if err != nil {
		t.Fatal(err)
	}
	if len(resp.Members) != 1 {
		t.Fatalf("expected len(resp.Members) == 1, got %d", len(resp.Members))
	}

	clus.AddMember(t)
	clus.WaitLeader(t)

	clus.AddMember(t)
	lead := clus.WaitLeader(t)
	if lead != 0 {
		t.Skip()
	}

	// m[0] is leader, started with raft.Config.CheckQuorum == false
	// m[1] started with raft.Config.CheckQuorum == true
	// m[2] started with raft.Config.CheckQuorum == true

	// drop incoming messages to m[2]
	// m[2] will vote request to m[0]
	clus.Members[0].s.PauseSending()

	// prevent m[1] from voting for m[0]
	// and prevent m[2] from becoming leader
	clus.Members[1].s.PauseSending()

	// make m[2] candidate
	time.Sleep(clus.Members[2].ElectionTimeout() + 100*time.Millisecond)

	clus.Members[0].s.ResumeSending()

	// force leader m[0] to step down by sending
	//   1. MsgAppResp with higher term
	//   2. MsgVote with higher term
	time.Sleep(clus.Members[2].ElectionTimeout())

	// m[0] CheckQuorum == true, m[0] leader lease is still valid,
	// and MsgVote arrives firt, then m[0] would ignore MsgVote from m[2]
	// and m[0] may win the next election over m[2]
	//
	// m[0] CheckQuorum == false, whether m[0] leader lease is still
	// valid or not, MsgVote from m[2] would force m[0] to step down
	// right away
}

We enable "raft.Config.CheckQuorum" by default in other
Raft initial starts. So should start with "ForceNewCluster".

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
@gyuho
Copy link
Contributor Author

gyuho commented Feb 23, 2018

@xiang90 Think we forgot to set this from #4779?

@gyuho gyuho merged commit 04e932b into etcd-io:master Feb 26, 2018
@gyuho gyuho deleted the raft-force-new-cluster branch February 26, 2018 19:28
gyuho added a commit that referenced this pull request Mar 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

1 participant