You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a user trigger a server node to leave a cluster, the following error will happen in the other cluster nodes agent.server.raft: rejecting vote request since we have a leader
The cluster is keeping Quorum all the time, even after one node leave. This is crucial to have the cluster stable after the node left.
A leader leaving the cluster will always trigger an election (should happen once)
Analysis:
When calling consul leave, 2 possible scenarios could happen each will lead to the same issue:
The node is the cluster Leader:
The node is removed from raft servers list using s.autopilot.RemoveServer , this will lead to the node not receiving heart beat and no raft updates and an election happening to establish a new leader.
The node is removed from serf on both LAN and WAN (not relevant in this case)
The node will wait for 5 seconds (leave_drain_time) here, During the 5 seconds the node will still be running it's raft go routines (including the leader loop) and the following will happen:
The node will set its raft state to Follower because ShutdownOnRemove is false
After some time the node will timeout on updates/heartbeat and set it state to Candidate (this is because of 1)
The Candidate loop will try to trigger an election
Two possible cases at this point:
The other nodes will refuse to trigger an election because a leader is established and the requesting node is not a leader here The leaving node continue to run the Candidate loop and retry again and continue to fail to trigger an election until it shutdown.
No leader is established yet and request vote between the leaving node and other nodes will compete and only one is accepted (newer term). the worst case here would be that the leaving node establish leadership and shutdown (see 4) which trigger a second election, but this should not trigger more then 2 elections overall instead of 1 but it will stabilize at the end.
After 5 seconds the node will be stopped and all raft go routines are stopped too.
The node is a follower:
The node is removed from serf on both LAN and WAN using serf.Leave, this will set the node serf state to Left
Serf will trigger a reconcile based on the node serf state change here and remove the node from raft server list here, this will lead to the node not receiving heart beat and no raft updates
The node will wait for 5 seconds (leave_drain_time) here, During the 5 seconds the node will still be running it's raft go routines (including the follower loop) and the following will happen:
After some time the node will timeout on updates/heartbeat and set it state to Candidate (this is because of 2)
The Candidate loop will try to trigger an election
The other nodes will refuse to trigger an election because a leader is established and the requesting node is not a leader here
The leaving node continue to run the Candidate loop and retry again but fail until it shutdown
After 5 seconds the node will be stopped and all raft go routines are stopped too.
Workaround
The only possible workaround that can effectively reduce the number of errors is reducing the time window for this bug to happen by reducing leave_drain_time. That said this could lead to a more severe issues:
RPC connection not drained and RPC errors happening
In the case of consul leave on a leader node, the leader could not be able to replicate all of its raft logs
Therefor, the workaround is not advised.
To minimize the impact of having a possible unnecessary election and in general keep the cluster as stable as possible it's advised to replace all the follower nodes first (one node at a time to keep the Quorum) and replace the leader node at the end. This should trigger only 1 election (2 in the condition described in the leader scenario above)
Fix
Set raft config flag ShutdownOnRemove to true, this will lead to raft properly stopping raft go routine cleanly when removing the node from raft, the replication go routine is not affected by this.
The only caveat is to test thoroughly the interaction with features like enterprise autopilot and make sure it does not impact a single server scenario (The flag was historically set when adding the ability to consul to run as a single server)
The text was updated successfully, but these errors were encountered:
Summary
When a user trigger a server node to leave a cluster, the following error will happen in the other cluster nodes
agent.server.raft: rejecting vote request since we have a leader
How To reproduce
Use the script documented in #9755
Working Assumptions
Analysis:
When calling
consul leave
, 2 possible scenarios could happen each will lead to the same issue:The node is removed from raft servers list using
s.autopilot.RemoveServer
, this will lead to the node not receiving heart beat and no raft updates and an election happening to establish a new leader.The node is removed from serf on both LAN and WAN (not relevant in this case)
The node will wait for 5 seconds (
leave_drain_time
) here, During the 5 seconds the node will still be running it's raft go routines (including the leader loop) and the following will happen:Follower
becauseShutdownOnRemove
isfalse
Candidate
(this is because of 1)After 5 seconds the node will be stopped and all raft go routines are stopped too.
serf.Leave
, this will set the node serf state toLeft
leave_drain_time
) here, During the 5 seconds the node will still be running it's raft go routines (including the follower loop) and the following will happen:Candidate
(this is because of 2)Workaround
The only possible workaround that can effectively reduce the number of errors is reducing the time window for this bug to happen by reducing
leave_drain_time
. That said this could lead to a more severe issues:consul leave
on a leader node, the leader could not be able to replicate all of its raft logsTherefor, the workaround is not advised.
To minimize the impact of having a possible unnecessary election and in general keep the cluster as stable as possible it's advised to replace all the follower nodes first (one node at a time to keep the Quorum) and replace the leader node at the end. This should trigger only 1 election (2 in the condition described in the leader scenario above)
Fix
Set raft config flag
ShutdownOnRemove
totrue
, this will lead to raft properly stopping raft go routine cleanly when removing the node from raft, the replication go routine is not affected by this.The only caveat is to test thoroughly the interaction with features like enterprise autopilot and make sure it does not impact a single server scenario (The flag was historically set when adding the ability to consul to run as a single server)
The text was updated successfully, but these errors were encountered: