You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Autopilot.Stop method returns a chan which can be selected on to determine when it has actually been shut down. We could just have revokeLeadership wait on that chan and the issue will be resolved. However this is kind of an autopilot issue too so we could just fix the corresponding autopilot issue and then pull in the dependency.
The text was updated successfully, but these errors were encountered:
Overview of the Issue
The process of restoring a snapshot has the potential to result in Autopilot not executing on the leader server in a cluster.
Reproduction Steps
Steps to reproduce this issue, eg:
consul operator autopilot state
does not show the second serverNote that it is non-deterministic whether this will trigger it.
Details
The snapshot restore process requests that the leader reassert its leadership after the snapshot is restored here:
consul/agent/consul/snapshot_endpoint.go
Line 113 in 851c44c
The leader loop handles that request here:
consul/agent/consul/leader.go
Lines 271 to 272 in 851c44c
revokeLeadership
will stop autopilot here:consul/agent/consul/leader.go
Line 398 in 851c44c
establishLeadership
will restart autopilot here:consul/agent/consul/leader.go
Line 344 in 851c44c
The
Autopilot.Stop
method returns a chan which can be selected on to determine when it has actually been shut down. We could just haverevokeLeadership
wait on that chan and the issue will be resolved. However this is kind of an autopilot issue too so we could just fix the corresponding autopilot issue and then pull in the dependency.The text was updated successfully, but these errors were encountered: