Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PD can get stuck electing a leader during a n/2+1 network partition #10576

Closed
aphyr opened this issue May 22, 2019 · 3 comments
Closed

PD can get stuck electing a leader during a n/2+1 network partition #10576

aphyr opened this issue May 22, 2019 · 3 comments

Comments

@aphyr
Copy link

aphyr commented May 22, 2019

Bug Report

  1. What did you do?
  • Pick any node in the cluster; call it slow-leader.
  • Set up a libfaketime wrapper for that node's PD server which runs it at 0.5x realtime.
  • For all other nodes, set their wrappers to run at 1x realtime.
  • Force slow-leader to become a leader by calling any PD node's /leader/transfer API.
  • Create a network partition which splits the network into randomly selected components: a majority component, with n/2+1 nodes, and a minority component, containing slow-leader, with n/2-1 nodes.
  • Wait.

You can reproduce this problem with Jepsen 479745e, by running

lein run test -w append --time-limit 60 --nemesis slow-primary

Note that the test will (at least for me) stall indefinitely as the nemesis waits for a new leader to come online.

  1. What did you expect to see?

I think under these conditions that the majority component should elect a new PD leader and go on to service requests; their clocks all run at 1x realtime, they have uninterrupted network connectivity to each other, a majority of the cluster, all are running...

  1. What did you see instead?

Sometimes, the majority-component PD nodes get stuck, and are unable to elect a leader. It looks like different components of PD are failing to restart, or initialize, due to dropped Raft proposals?

TiDB might log:

[2019/05/22 11:18:40.813 -07:00] [ERROR] [kv.go:245] ["fail to load safepoint from pd"] [error="context deadline exceeded"] [errorVerbose="context deadline exceeded\ngithub.com/pingcap/errors.AddStack\n\t/home/jenkins/workspace/tidb_ghpr_build/go/pkg/mod/github.com/pingcap/errors@v0.11.1/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/home/jenkins/workspace/tidb_ghpr_build/go/pkg/mod/github.com/pingcap/errors@v0.11.1/juju_adaptor.go:15\ngithub.com/pingcap/tidb/store/tikv.(*EtcdSafePointKV).Get\n\t/home/jenkins/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/store/tikv/safepoint.go:108\ngithub.com/pingcap/tidb/store/tikv.loadSafePoint\n\t/home/jenkins/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/store/tikv/safepoint.go:127\ngithub.com/pingcap/tidb/store/tikv.(*tikvStore).runSafePointChecker\n\t/home/jenkins/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/store/tikv/kv.go:238\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337"] [stack="github.com/pingcap/tidb/store/tikv.(*tikvStore).runSafePointChecker\n\t/home/jenkins/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/store/tikv/kv.go:245"]
[2019/05/22 11:18:40.861 -07:00] [WARN] [manager.go:170] ["failed to new session to etcd"] [ownerInfo="[ddl] /tidb/ddl/all_schema_versions/cc9181ed-51b0-4b59-b714-44640432856d"] [error="rpc error: code = Unknown desc = raft proposal dropped"]
[2019/05/22 11:18:40.889 -07:00] [ERROR] [domain.go:417] ["server restart failed"] [error="rpc error: code = Unknown desc = raft proposal dropped"] [errorVerbose="rpc error: code = Unknown desc = raft proposal dropped\ngithub.com/pingcap/errors.AddStack\n\t/home/jenkins/workspace/tidb_ghpr_build/go/pkg/mod/github.com/pingcap/errors@v0.11.1/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/home/jenkins/workspace/tidb_ghpr_build/go/pkg/mod/github.com/pingcap/errors@v0.11.1/juju_adaptor.go:15\ngithub.com/pingcap/tidb/owner.NewSession\n\t/home/jenkins/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/owner/manager.go:176\ngithub.com/pingcap/tidb/domain.(*InfoSyncer).newSessionAndStoreServerInfo\n\t/home/jenkins/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/domain/info.go:166\ngithub.com/pingcap/tidb/domain.(*InfoSyncer).Restart\n\t/home/jenkins/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/domain/info.go:157\ngithub.com/pingcap/tidb/domain.(*Domain).infoSyncerKeeper\n\t/home/jenkins/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/domain/domain.go:416\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337"] [stack="github.com/pingcap/tidb/domain.(*Domain).infoSyncerKeeper\n\t/home/jenkins/workspace/tidb_ghpr_build/go/src/github.com/pingcap/tidb/domain/domain.go:417"]
[2019/05/22 11:18:40.889 -07:00] [INFO] [domain.go:419] ["server info syncer restarted"]
[2019/05/22 11:18:40.889 -07:00] [INFO] [domain.go:415] ["server info syncer need to restart"]
[2019/05/22 11:18:40.889 -07:00] [WARN] [manager.go:170] ["failed to new session to etcd"] [ownerInfo="[Info-syncer] /tidb/server/info/cc9181ed-51b0-4b59-b714-44640432856d"] [error="rpc error: code = Unknown desc = raft proposal dropped"]

While PD seems to get stuck repeatedly trying to hold elections for a single term, because of leases which aren't expired? Lease values fluctuate though...

[logterm: 3, index: 463] at term 3: lease is not expired (remaining ticks: 4)
2019/05/22 11:21:21.081 raft.go:1135: [info] 5aa594b5d9b66c42 no leader at term 3; dropping proposal
2019/05/22 11:21:21.139 raft.go:857: [info] 5aa594b5d9b66c42 is starting a new election at term 3
2019/05/22 11:21:21.139 raft.go:684: [info] 5aa594b5d9b66c42 became pre-candidate at term 3
2019/05/22 11:21:21.139 raft.go:755: [info] 5aa594b5d9b66c42 received MsgPreVoteResp from 5aa594b5d9b66c42 at term 3
2019/05/22 11:21:21.139 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to 7f6143cbd22aca00 at term 3
2019/05/22 11:21:21.139 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to f82e563e5c75137e at term 3
2019/05/22 11:21:21.139 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to 9b116f88cab4dc9 at term 3
2019/05/22 11:21:21.139 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to 5440ff22fe632778 at term 3
2019/05/22 11:21:21.139 raft.go:782: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463, vote: 9b116f88cab4dc9] ignored MsgPreVote from f82e563e5c75137e [logterm: 3, index: 463] at term 3: lease is not expired (remaining ticks: 6)
2019/05/22 11:21:21.282 raft.go:1135: [info] 5aa594b5d9b66c42 no leader at term 3; dropping proposal
2019/05/22 11:21:21.439 raft.go:857: [info] 5aa594b5d9b66c42 is starting a new election at term 3
2019/05/22 11:21:21.439 raft.go:684: [info] 5aa594b5d9b66c42 became pre-candidate at term 3
2019/05/22 11:21:21.439 raft.go:755: [info] 5aa594b5d9b66c42 received MsgPreVoteResp from 5aa594b5d9b66c42 at term 3
2019/05/22 11:21:21.439 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to 7f6143cbd22aca00 at term 3
2019/05/22 11:21:21.439 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to f82e563e5c75137e at term 3
2019/05/22 11:21:21.439 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to 9b116f88cab4dc9 at term 3
2019/05/22 11:21:21.439 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to 5440ff22fe632778 at term 3
2019/05/22 11:21:21.439 raft.go:782: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463, vote: 9b116f88cab4dc9] ignored MsgPreVote from f82e563e5c75137e [logterm: 3, index: 463] at term 3: lease is not expired (remaining ticks: 6)
2019/05/22 11:21:21.441 raft.go:782: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463, vote: 9b116f88cab4dc9] ignored MsgPreVote from 7f6143cbd22aca00 [logterm: 3, index: 463] at term 3: lease is not expired (remaining ticks: 6)
2019/05/22 11:21:21.482 raft.go:1135: [info] 5aa594b5d9b66c42 no leader at term 3; dropping proposal
2019/05/22 11:21:21.683 raft.go:1135: [info] 5aa594b5d9b66c42 no leader at term 3; dropping proposal
2019/05/22 11:21:21.739 raft.go:857: [info] 5aa594b5d9b66c42 is starting a new election at term 3
2019/05/22 11:21:21.739 raft.go:684: [info] 5aa594b5d9b66c42 became pre-candidate at term 3
2019/05/22 11:21:21.739 raft.go:755: [info] 5aa594b5d9b66c42 received MsgPreVoteResp from 5aa594b5d9b66c42 at term 3
2019/05/22 11:21:21.739 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to 9b116f88cab4dc9 at term 3
2019/05/22 11:21:21.739 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to 5440ff22fe632778 at term 3
2019/05/22 11:21:21.739 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to 7f6143cbd22aca00 at term 3
2019/05/22 11:21:21.739 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to f82e563e5c75137e at term 3
2019/05/22 11:21:21.739 raft.go:782: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463, vote: 9b116f88cab4dc9] ignored MsgPreVote from f82e563e5c75137e [logterm: 3, index: 463] at term 3: lease is not expired (remaining ticks: 6)
2019/05/22 11:21:21.739 raft.go:755: [info] 5aa594b5d9b66c42 received MsgPreVoteResp from 7f6143cbd22aca00 at term 3
2019/05/22 11:21:21.739 raft.go:1148: [info] 5aa594b5d9b66c42 [quorum:3] has received 2 MsgPreVoteResp votes and 0 vote rejections
2019/05/22 11:21:21.884 raft.go:1135: [info] 5aa594b5d9b66c42 no leader at term 3; dropping proposal
2019/05/22 11:21:21.939 raft.go:782: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463, vote: 9b116f88cab4dc9] ignored MsgPreVote from 7f6143cbd22aca00 [logterm: 3, index: 463] at term 3: lease is not expired (remaining ticks: 2)
2019/05/22 11:21:22.039 raft.go:857: [info] 5aa594b5d9b66c42 is starting a new election at term 3
2019/05/22 11:21:22.039 raft.go:684: [info] 5aa594b5d9b66c42 became pre-candidate at term 3
2019/05/22 11:21:22.039 raft.go:755: [info] 5aa594b5d9b66c42 received MsgPreVoteResp from 5aa594b5d9b66c42 at term 3
2019/05/22 11:21:22.039 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to 9b116f88cab4dc9 at term 3
2019/05/22 11:21:22.039 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to 5440ff22fe632778 at term 3
2019/05/22 11:21:22.039 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to 7f6143cbd22aca00 at term 3
2019/05/22 11:21:22.039 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to f82e563e5c75137e at term 3
2019/05/22 11:21:22.039 raft.go:782: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463, vote: 9b116f88cab4dc9] ignored MsgPreVote from f82e563e5c75137e [logterm: 3, index: 463] at term 3: lease is not expired (remaining ticks: 6)
2019/05/22 11:21:22.085 raft.go:1135: [info] 5aa594b5d9b66c42 no leader at term 3; dropping proposal
2019/05/22 11:21:22.228 log.go:86: [warning] etcdserver: [timed out waiting for read index response]
2019/05/22 11:21:22.286 raft.go:1135: [info] 5aa594b5d9b66c42 no leader at term 3; dropping proposal
2019/05/22 11:21:22.339 raft.go:857: [info] 5aa594b5d9b66c42 is starting a new election at term 3
2019/05/22 11:21:22.339 raft.go:684: [info] 5aa594b5d9b66c42 became pre-candidate at term 3
2019/05/22 11:21:22.339 raft.go:755: [info] 5aa594b5d9b66c42 received MsgPreVoteResp from 5aa594b5d9b66c42 at term 3
2019/05/22 11:21:22.339 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to 9b116f88cab4dc9 at term 3
2019/05/22 11:21:22.339 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to 5440ff22fe632778 at term 3
2019/05/22 11:21:22.339 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to 7f6143cbd22aca00 at term 3
2019/05/22 11:21:22.339 raft.go:742: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463] sent MsgPreVote request to f82e563e5c75137e at term 3
2019/05/22 11:21:22.339 raft.go:782: [info] 5aa594b5d9b66c42 [logterm: 3, index: 463, vote: 9b116f88cab4dc9] ignored MsgPreVote from f82e563e5c75137e [logterm: 3, index: 463] at term 3: lease is not expired (remaining ticks: 6)
  1. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?

This occurred with TiDB 3.0.0-beta1; I'm confirming with TiDB 3.0.0-beta.1-40.

20190522T140201.000-0400.zip

@aphyr aphyr changed the title PD can't elect a leader during a n/2+1 network partition PD can get stuck electing a leader during a n/2+1 network partition May 22, 2019
@BusyJay
Copy link
Contributor

BusyJay commented May 23, 2019

It seems that the randomized election timeout doesn't work at all, n2 and n4 always start campaign every 300ms.

/cc @nolouch PTAL

@nolouch
Copy link
Member

nolouch commented May 23, 2019

Thanks, @aphyr, this is already known bug about PreVote in raft, there is an issue in detail: etcd-io/etcd#10355. We fix it in v3.0.0-rc.1 after upgrade etcd, can
you use the new version to try to test again?

@aphyr
Copy link
Author

aphyr commented May 23, 2019

Ah, excellent! Yeah, I've been having trouble reproducing this on 3.0.0-beta.1-40 too. I think it's likely fixed in later versions. I am still seeing slow leader elections (2+ minutes?), but the logs look different, so I think that might be a different issue.

@aphyr aphyr closed this as completed May 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants