Consul servers won't elect a leader #993

eirslett · 2015-06-02T10:40:25Z

I have 3 consul servers running (+ a handful of other nodes), and they can all speak to each other - or so I think; at least they're sending UDP messages between themselves.
The logs still show [ERR] agent: failed to sync remote state: No cluster leader, so even if the servers know about each other, it looks like they fail to perform an actual leader election...
Is there a way to trigger a leader election manually?

I'm running consul 0.5.2 on all nodes.

The text was updated successfully, but these errors were encountered:

eirslett · 2015-06-02T10:46:29Z

This is our consul.json config:

{
  "datacenter": "sit0",
  "server": true,
  "leave_on_terminate": true,
  "retry_join": [
    "mod02.finn.no",
    "mod04.finn.no",
    "mod06.finn.no"
  ],
  "bootstrap_expect": 3,
  "retry_join_wan": [
    "mod01.finn.no",
    "mod03.finn.no",
    "mod05.finn.no"
  ]
}

even numbers belong to the sit0 data center, odd numbers belong to the sit1 data center. So it should already be in bootstrap mode? (We always start the consul servers in bootstrap mode, I guess it cannot hurt?)

reversefold · 2015-06-07T00:38:09Z

I've had this same issue. I have brought up a cluster of 3 servers with -bootstrap-expect 3 and they come up and elect a leader just fine once I issue a join. If I then kill -TERM them all, wait for them all to shut down, then start them back up they still elect a leader fine. However, if I kill -INT them all when I bring them back up they never elect a leader.

eirslett · 2015-06-08T10:11:05Z

After some fiddling, I got it to work now. There are 2 suspects:

All the consul servers were started with the bootstrap_expect setting - does that lead to a kind of split-brain scenario, where they're all trying to initiate an election, and none of them succeed?
The WAN pool - it looks like the servers join the WAN pool before they have elected a leader. Could that be the problem?

reversefold · 2015-06-08T18:09:28Z

@eirslett what did you change to get it to work?

eirslett · 2015-06-08T18:11:00Z

I removed the "retry_join_wan" setting from all servers, and removed the "bootstrap_expect" setting from all servers except one of them. Then, I restarted the servers, and the leader election worked.

slackpad · 2015-06-09T17:32:13Z

The servers should all be started with the same bootstrap_expect setting
so they all know how many to wait for before running the first leader
election, so you shouldn't have to configure them differently.

Could you link to some logs in your original configuration so we can see
what's going on?

On Mon, Jun 8, 2015 at 11:11 AM, Eirik Sletteberg notifications@github.com
wrote:

I removed the "retry_join_wan" setting from all servers, and removed the
"bootstrap_expect" setting from all servers except one of them. Then, I
restarted the servers, and the leader election worked.

—
Reply to this email directly or view it on GitHub
#993 (comment).

reversefold · 2015-06-09T21:04:56Z

My configuration didn't have any wan connections at all so I doubt it had anything to do with it (unless it's automatic).

reversefold · 2015-06-09T21:54:37Z

Here's the log from the 3 servers in my cluster when this happens:

==> Caught signal: interrupt
==> Gracefully shutting down agent...
    2015/06/09 21:30:39 [INFO] consul: server starting leave
    2015/06/09 21:30:39 [INFO] serf: EventMemberLeave: devt-crate00.dc1 172.16.204.152
    2015/06/09 21:30:39 [INFO] consul: removing server devt-crate00.dc1 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 21:30:40 [INFO] serf: EventMemberLeave: devt-crate00 172.16.204.152
    2015/06/09 21:30:40 [INFO] consul: removing server devt-crate00 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 21:30:40 [INFO] serf: EventMemberLeave: devt-crate01 172.16.204.153
    2015/06/09 21:30:40 [INFO] consul: removing server devt-crate01 (Addr: 172.16.204.153:8300) (DC: dc1)
    2015/06/09 21:30:40 [WARN] raft: Heartbeat timeout reached, starting election
    2015/06/09 21:30:40 [INFO] raft: Node at 172.16.204.152:8300 [Candidate] entering Candidate state
    2015/06/09 21:30:41 [INFO] raft: Duplicate RequestVote for same term: 10
    2015/06/09 21:30:42 [WARN] raft: Election timeout reached, restarting election
    2015/06/09 21:30:42 [INFO] raft: Node at 172.16.204.152:8300 [Candidate] entering Candidate state
    2015/06/09 21:30:42 [INFO] raft: Election won. Tally: 2
    2015/06/09 21:30:42 [INFO] raft: Node at 172.16.204.152:8300 [Leader] entering Leader state
    2015/06/09 21:30:42 [INFO] consul: cluster leadership acquired
    2015/06/09 21:30:42 [INFO] consul: New leader elected: devt-crate00
    2015/06/09 21:30:42 [INFO] raft: pipelining replication to peer 172.16.204.154:8300
    2015/06/09 21:30:42 [WARN] consul: deregistering self (devt-crate00) should be done by follower
    2015/06/09 21:30:42 [INFO] consul: member 'devt-crate01' left, deregistering
    2015/06/09 21:30:43 [INFO] serf: EventMemberLeave: devt-crate02 172.16.204.154
    2015/06/09 21:30:43 [INFO] consul: removing server devt-crate02 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 21:30:43 [INFO] raft: Removed peer 172.16.204.154:8300, stopping replication (Index: 5825)
    2015/06/09 21:30:43 [INFO] consul: removed server 'devt-crate02' as peer
    2015/06/09 21:30:43 [INFO] consul: member 'devt-crate02' left, deregistering
    2015/06/09 21:30:43 [INFO] raft: aborting pipeline replication to peer 172.16.204.154:8300
    2015/06/09 21:30:43 [INFO] agent: requesting shutdown
    2015/06/09 21:30:43 [INFO] consul: shutting down server
    2015/06/09 21:30:43 [INFO] agent: shutdown complete
==> WARNING: Expect Mode enabled, expecting 3 servers
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting raft data migration...
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'devt-crate00'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
      Cluster Addr: 172.16.204.152 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2015/06/09 21:31:22 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 21:31:22 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 21:31:22 [INFO] serf: EventMemberJoin: devt-crate00 172.16.204.152
    2015/06/09 21:31:22 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 21:31:22 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 21:31:22 [INFO] serf: EventMemberJoin: devt-crate00.dc1 172.16.204.152
    2015/06/09 21:31:22 [INFO] raft: Node at 172.16.204.152:8300 [Follower] entering Follower state
    2015/06/09 21:31:22 [INFO] serf: Attempting re-join to previously known node: devt-crate02: 172.16.204.154:8301
    2015/06/09 21:31:22 [WARN] serf: Failed to re-join any previously known node
    2015/06/09 21:31:22 [INFO] serf: Attempting re-join to previously known node: devt-crate01: 172.16.204.153:8301
    2015/06/09 21:31:22 [INFO] consul: adding server devt-crate00 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 21:31:22 [INFO] consul: adding server devt-crate00.dc1 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 21:31:22 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 21:31:22 [INFO] serf: EventMemberJoin: devt-crate01 172.16.204.153
    2015/06/09 21:31:22 [INFO] serf: Re-joined to previously known node: devt-crate01: 172.16.204.153:8301
    2015/06/09 21:31:22 [INFO] consul: adding server devt-crate01 (Addr: 172.16.204.153:8300) (DC: dc1)
    2015/06/09 21:31:22 [INFO] serf: EventMemberJoin: devt-crate02 172.16.204.154
    2015/06/09 21:31:22 [INFO] consul: adding server devt-crate02 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 21:31:22 [INFO] serf: EventMemberJoin: devt-crate03 172.16.204.155
    2015/06/09 21:31:22 [ERR] http: Request /v1/catalog/nodes, error: No cluster leader
    2015/06/09 21:31:22 [INFO] agent.rpc: Accepted client: 127.0.0.1:49409
    2015/06/09 21:31:22 [INFO] agent: (LAN) joining: [172.16.204.152 172.16.204.153 172.16.204.154]
    2015/06/09 21:31:22 [INFO] agent: (LAN) joined: 3 Err: <nil>
    2015/06/09 21:31:23 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
    2015/06/09 21:31:38 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 21:32:05 [ERR] agent: failed to sync remote state: No cluster leader

==> Caught signal: interrupt
==> Gracefully shutting down agent...
    2015/06/09 22:01:41 [INFO] consul: server starting leave
    2015/06/09 22:01:41 [INFO] serf: EventMemberLeave: devt-crate01.dc1 172.16.204.153
    2015/06/09 22:01:41 [INFO] consul: removing server devt-crate01.dc1 (Addr: 172.16.204.153:8300) (DC: dc1)
    2015/06/09 22:01:41 [INFO] serf: EventMemberLeave: devt-crate01 172.16.204.153
    2015/06/09 22:01:41 [INFO] consul: removing server devt-crate01 (Addr: 172.16.204.153:8300) (DC: dc1)
    2015/06/09 22:01:41 [INFO] serf: EventMemberFailed: devt-crate02 172.16.204.154
    2015/06/09 22:01:41 [INFO] consul: removing server devt-crate02 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 22:01:42 [INFO] agent: requesting shutdown
    2015/06/09 22:01:42 [INFO] consul: shutting down server
    2015/06/09 22:01:42 [INFO] agent: shutdown complete
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting raft data migration...
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'devt-crate01'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
      Cluster Addr: 172.16.204.153 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2015/06/09 22:05:17 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:05:17 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:05:17 [INFO] serf: EventMemberJoin: devt-crate01 172.16.204.153
    2015/06/09 22:05:17 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:05:17 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:05:17 [INFO] serf: EventMemberJoin: devt-crate01.dc1 172.16.204.153
    2015/06/09 22:05:17 [INFO] raft: Node at 172.16.204.153:8300 [Follower] entering Follower state
    2015/06/09 22:05:17 [INFO] serf: Attempting re-join to previously known node: devt-crate03: 172.16.204.155:8301
    2015/06/09 22:05:17 [WARN] serf: Failed to re-join any previously known node
    2015/06/09 22:05:17 [INFO] serf: Attempting re-join to previously known node: devt-crate02: 172.16.204.154:8301
    2015/06/09 22:05:17 [INFO] consul: adding server devt-crate01 (Addr: 172.16.204.153:8300) (DC: dc1)
    2015/06/09 22:05:17 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 22:05:17 [INFO] serf: EventMemberJoin: devt-crate00 172.16.204.152
    2015/06/09 22:05:17 [INFO] serf: EventMemberJoin: devt-crate02 172.16.204.154
    2015/06/09 22:05:17 [INFO] serf: Re-joined to previously known node: devt-crate02: 172.16.204.154:8301
    2015/06/09 22:05:17 [INFO] consul: adding server devt-crate00 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 22:05:17 [INFO] consul: adding server devt-crate02 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 22:05:17 [ERR] http: Request /v1/catalog/nodes, error: No cluster leader
    2015/06/09 22:05:17 [INFO] agent.rpc: Accepted client: 127.0.0.1:44608
    2015/06/09 22:05:17 [INFO] agent: (LAN) joining: [172.16.204.152 172.16.204.153 172.16.204.154]
    2015/06/09 22:05:17 [INFO] agent: (LAN) joined: 3 Err: <nil>
    2015/06/09 22:05:18 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
    2015/06/09 22:05:36 [ERR] agent: failed to sync remote state: No cluster leader

==> Caught signal: interrupt
==> Gracefully shutting down agent...
    2015/06/09 22:01:41 [INFO] consul: server starting leave
    2015/06/09 22:01:41 [INFO] serf: EventMemberLeave: devt-crate02.dc1 172.16.204.154
    2015/06/09 22:01:41 [INFO] serf: EventMemberLeave: devt-crate02 172.16.204.154
    2015/06/09 22:01:41 [INFO] consul: removing server devt-crate02.dc1 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 22:01:41 [INFO] consul: removing server devt-crate02 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 22:01:41 [INFO] agent: requesting shutdown
    2015/06/09 22:01:41 [INFO] consul: shutting down server
    2015/06/09 22:01:41 [INFO] agent: shutdown complete
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting raft data migration...
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'devt-crate02'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
      Cluster Addr: 172.16.204.154 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2015/06/09 22:04:21 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:04:21 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:04:21 [INFO] serf: EventMemberJoin: devt-crate02 172.16.204.154
    2015/06/09 22:04:21 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:04:21 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:04:21 [INFO] serf: EventMemberJoin: devt-crate02.dc1 172.16.204.154
    2015/06/09 22:04:21 [INFO] raft: Node at 172.16.204.154:8300 [Follower] entering Follower state
    2015/06/09 22:04:21 [INFO] serf: Attempting re-join to previously known node: devt-crate03: 172.16.204.155:8301
    2015/06/09 22:04:21 [WARN] serf: Failed to re-join any previously known node
    2015/06/09 22:04:21 [INFO] serf: Attempting re-join to previously known node: devt-crate01: 172.16.204.153:8301
    2015/06/09 22:04:21 [INFO] consul: adding server devt-crate02 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 22:04:21 [INFO] consul: adding server devt-crate02.dc1 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 22:04:21 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 22:04:21 [WARN] serf: Failed to re-join any previously known node
    2015/06/09 22:04:21 [ERR] http: Request /v1/catalog/nodes, error: No cluster leader
    2015/06/09 22:04:21 [INFO] agent.rpc: Accepted client: 127.0.0.1:36990
    2015/06/09 22:04:21 [INFO] agent: (LAN) joining: [172.16.204.152 172.16.204.153 172.16.204.154]
    2015/06/09 22:04:21 [INFO] serf: EventMemberJoin: devt-crate00 172.16.204.152
    2015/06/09 22:04:21 [INFO] consul: adding server devt-crate00 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 22:04:21 [INFO] agent: (LAN) joined: 2 Err: <nil>
    2015/06/09 22:04:22 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
    2015/06/09 22:04:42 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 22:05:01 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 22:05:17 [INFO] serf: EventMemberJoin: devt-crate01 172.16.204.153
    2015/06/09 22:05:17 [INFO] consul: adding server devt-crate01 (Addr: 172.16.204.153:8300) (DC: dc1)
    2015/06/09 22:05:19 [ERR] agent: failed to sync remote state: No cluster leader

reversefold · 2015-06-09T21:58:09Z

Also, when this happens killing the servers with either QUIT or INT still leaves it in this state when they come back up, they won't elect a leader.

slackpad · 2015-06-09T22:07:50Z

@reversefold Those look like the same node's log got pasted three times - can you grab the logs for the other two nodes?

reversefold · 2015-06-09T22:10:47Z

Command-line I'm using:

consul agent -server -data-dir consul.data -config-dir etc -pid-file $consul.pid -bootstrap-expect 3 -rejoin

If I try killing them all and replacing -bootstrap-expect 3 with -bootstrap and just starting one node it elects itself leader but then drops it and ends up in the same situation...

    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot                                                                                                                                                              [9/1936]
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: EventMemberJoin: devt-crate00 172.16.204.152
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: EventMemberJoin: devt-crate00.dc1 172.16.204.152
    2015/06/09 22:06:50 [INFO] raft: Node at 172.16.204.152:8300 [Follower] entering Follower state
    2015/06/09 22:06:50 [WARN] serf: Failed to re-join any previously known node
    2015/06/09 22:06:50 [WARN] serf: Failed to re-join any previously known node
    2015/06/09 22:06:50 [INFO] consul: adding server devt-crate00 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 22:06:50 [INFO] consul: adding server devt-crate00.dc1 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 22:06:50 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 22:06:50 [ERR] http: Request /v1/catalog/nodes, error: No cluster leader
    2015/06/09 22:06:50 [INFO] agent.rpc: Accepted client: 127.0.0.1:50493
    2015/06/09 22:06:50 [INFO] agent: (LAN) joining: [172.16.204.152 172.16.204.153 172.16.204.154]
    2015/06/09 22:06:50 [INFO] agent: (LAN) joined: 1 Err: <nil>
    2015/06/09 22:06:51 [WARN] raft: Heartbeat timeout reached, starting election
    2015/06/09 22:06:51 [INFO] raft: Node at 172.16.204.152:8300 [Candidate] entering Candidate state
    2015/06/09 22:06:51 [INFO] raft: Election won. Tally: 1
    2015/06/09 22:06:51 [INFO] raft: Node at 172.16.204.152:8300 [Leader] entering Leader state
    2015/06/09 22:06:51 [INFO] consul: cluster leadership acquired
    2015/06/09 22:06:51 [INFO] consul: New leader elected: devt-crate00
    2015/06/09 22:06:51 [INFO] raft: Disabling EnableSingleNode (bootstrap)
    2015/06/09 22:06:51 [INFO] raft: Added peer 172.16.204.153:8300, starting replication
    2015/06/09 22:06:51 [INFO] raft: Added peer 172.16.204.154:8300, starting replication
    2015/06/09 22:06:51 [INFO] raft: Removed peer 172.16.204.153:8300, stopping replication (Index: 19)
    2015/06/09 22:06:51 [INFO] raft: Removed peer 172.16.204.154:8300, stopping replication (Index: 19)
    2015/06/09 22:06:51 [INFO] raft: Removed ourself, transitioning to follower
    2015/06/09 22:06:51 [ERR] raft: Failed to AppendEntries to 172.16.204.153:8300: dial tcp 172.16.204.153:8300: connection refused
    2015/06/09 22:06:51 [ERR] raft: Failed to AppendEntries to 172.16.204.154:8300: dial tcp 172.16.204.154:8300: connection refused
    2015/06/09 22:06:51 [ERR] raft: Failed to AppendEntries to 172.16.204.154:8300: dial tcp 172.16.204.154:8300: connection refused
    2015/06/09 22:06:52 [INFO] raft: Node at 172.16.204.152:8300 [Follower] entering Follower state
    2015/06/09 22:06:52 [INFO] consul: cluster leadership lost
    2015/06/09 22:06:52 [ERR] consul: failed to wait for barrier: node is not the leader
    2015/06/09 22:06:53 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
    2015/06/09 22:06:53 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 22:07:21 [ERR] agent: failed to sync remote state: No cluster leader

Note that no other consul servers are up at this point, so I don't understand why it's losing leadership...

$ ./bin/consul members
Node          Address              Status  Type    Build  Protocol  DC
devt-crate00  172.16.204.152:8301  alive   server  0.5.2  2         dc1

reversefold · 2015-06-09T22:14:28Z

@slackpad Apologies, I've updated the comment above with the other 2 logs.

reversefold · 2015-06-09T22:34:34Z

I've also tried removing the data directory on one node and starting it with bootstrap. It elects itself as leader but when I start up the other 2 servers they won't accept it as leader.

    2015/06/09 22:31:45 [WARN] raft: Rejecting vote from 172.16.204.152:8300 since our last term is greater (9, 1)

reversefold · 2015-06-09T22:37:34Z

Also worth mentioning that my config dir only has service entries in it.

slackpad · 2015-06-10T00:25:30Z

@reversefold - After digging more I think you are seeing the same thing as is being discussed on #750 and #454.

I was able to reproduce your situation locally and verify that the consul.data/raft/peers.json file ends up with null inside on all three servers. That's why a TERM sig doesn't lead to this because it doesn't formally leave the set of peers (see leave_on_terminate, which defaults to false).

I'll let @armon and/or @ryanuber weigh in on the best practice to follow to get into a working state. Maybe we should add a new "Failure of all Servers in a Multi-Server Cluster" section in the outage recovery docs since this seems to be confusing to people. I think it will end up being something similar to the manual bootstrapping procedure, picking one of the servers as the lead. In my local testing I was able to get them going again by manually editing the peers.json file, but I'm not sure if that's a safe thing to do.

ryanuber · 2015-06-10T01:39:08Z

@slackpad is correct here. @reversefold when you delete the entire data directory, you are also deleting the raft log entirely, which is why you get the error about the voting terms not matching up (servers with data joining a leader with no data). Editing the peers.json file and leaving the raft data intact is the current best practice for recovering in outage scenarios, and the null is described in #750, TL;DR being that it is purposely nulled out during a graceful leave for safety reasons.

reversefold · 2015-06-10T02:14:47Z

I understand, I was just throwing things at the wall to try to make the cluster come back up. This does seem to be a bug in the system, though. In general if I am shutting down a consul agent I want to do it "nicely" so the consul cluster doesn't complain about failing checks. I know I shut it down and I don't need spurious errors in my logs or being reported by the consul cluster.

Regardless of that, though, this is a state that the consul cluster is putting itself in. If, when a multi-node consul server cluster is shutdown with INT, the last server is leaving and is about to put itself into this state, it should throw a warning or error and not do that.

I've also tested this with a single node and the same problem does not happen, which I would expect given your explanation. If I have a single server with -bootstrap-expect 1 and I kill it with INT, making it "leave", then when it comes back up it should end up in the same state as the 3-node cluster did when its last server member was killed with INT, but it happily comes back up and elects itself leader again. In fact, if I bring up two servers, each with -bootstrap-expect 1 they work fine, then if I kill -INT them both, they both leave, then will come back up and elect a leader again. However, if I change to -bootstrap-expect 2 they then fail to elect a leader.

orclev · 2015-06-18T23:00:26Z

I'm seeing this same issue. Can someone either provide a sample or else point me to the documentation of what the peers.json is supposed to look like (you know, before it gets corrupted with a null value)?

slackpad · 2015-06-18T23:04:49Z

Hi @orclev - there's a sample in the Outage Recovery Guide. It's basically a list of IPs and port numbers for the peers:

[
  "10.0.1.8:8300",
  "10.0.1.6:8300",
  "10.0.1.7:8300"
]

orclev · 2015-06-19T18:16:05Z

@slackpad thanks, I somehow missed that. I figured it was probably somewhere in the docs and I was just missing it.

jwestboston · 2015-06-26T19:34:02Z

This bug is hitting me as well on 0.5.2

juaby · 2015-07-02T06:53:58Z

@slackpad
can you give a patch build to fix this issue instead of manually fix it since it is almost impossible to mannually fix something on production env?

slackpad · 2015-07-02T15:48:59Z

Hi @juaby. Unfortunately, there's not a good way we could automatically recover from the case where the servers have all left the cluster. I was just going to improve the documentation on how to recover from this case by manually editing the peers.json file per the comments above.

reversefold · 2015-07-03T22:35:46Z

If this is a state that a human can recognize and fix, then it's a state that a computer can recognize and fix. At the very least there should be a command-line option that will fix this state.

elephantfries · 2015-07-06T20:29:27Z

Perhaps we should always start with --bootstrap-expect=1. That way we would work around the bring up problem. The remaining servers will join subsequently, so in the end we'll get our redundancy back. I believe joining servers need to nuke their raft data. I know this is not recommended but in our case, manual fiddling with descriptors is not an option. @slackpad do you see any issues with this approach?

eirslett · 2015-07-06T20:50:17Z

I suppose --bootstrap-expect=1 would lead to split brain?

elephantfries · 2015-07-06T20:51:43Z

I hope not as the joining servers would run with --join whereas the first one would not

elephantfries · 2015-07-07T18:47:49Z

After a day of testing, this almost works. It starts in bootstrap-expect=1 and elects itself a leader. The others join and I have my cluster back. Unfortunately, I am running into a case, where it decides to give up as a leader. For some reason it detects long dead peers as active and wants to run election which it cannot win because well... the peers are really dead. Is this a bug or is there some reason for that?

http://pastebin.com/NR5RSvDq

pikeas · 2015-07-21T02:15:10Z

I've read through #750, #454 and this issue, but I don't feel like I'm any closer to understanding the Right Way (tm) to do things.

Here's the workflow I need and expect:

An easy way to start an N (typically 3 or 5) node cluster. bootstrap-expect is trying to satisfy this need.
An easy to way to automatically recover when a server disappears and reappears, which could occur for multiple reasons: network outage, server outage, server upgrade, consul upgrade, etc. Bonus points if this works even when consul is running in a container.

#2 is where everyone is getting stuck. Consul should Just Work even when all servers bounce due to a planned or unplanned outage - even if a split brain situation occurs temporarily (such as when node 2 / 3 disappears and the cluster loses quorum), the cluster should heal itself when any 2 (or all?) are back up, without requiring manual intervention.

If this is infeasible due to technical limitations in raft or Consul, I would like to see explicit documentation detailing which failure (and maintenance!) modes support automatic remediation, best practices around said maintenance, and a clear description of cases requiring manual remediation.

slackpad · 2016-12-06T16:55:50Z

Ok - that RPC error looks like it may have talked to an old server. I'll need to look at the stale issue as that seems to have been reported by you and another person.

If a single server of a 3 server cluster going down causes an outage that's likely from a stale peer in the Raft configuration. You can use consul force-leave <node name> command to kick it if it has recently been removed (but still shows in consul members, or the consul operator raft -remove-peer -address="<ip:port>" if it's stale an no longer known in consul members. consul operator raft -list-peers should let you inspect the configuration to see if this is the case.

haf · 2017-02-10T16:03:02Z

Our staging environment went down in a similar fashion; here's my write-up https://gist.github.com/haf/1983206cf11846f6f3f291f78acee5cf

rhyas · 2017-02-10T16:47:44Z

Raising my hand as another one hitting this issue. We had the same thing, where the current leader aws node died, a new one was spun up, and nothing converged. We ran into the same issue trying to manually fix with the "no leader" making it difficult to find out which raft node is dead and remove it. Also tried doing the peers.json recovery, and that failed because the server wouldn't even start with that file in a format as documented. ): Our ultimate solution/fix was to blow away all 3 nodes and let it bootstrap from scratch. This left it disconnected from all the agents, but doing a join from to the agents that were all still part of the old cluster, brought everything back into sync. (Services anyway, didn't check KV data) Our cluster is all 0.7.2+. We're still in test mode, so no production impact from it, just some slowed development cycles and an injection of a yellow flag to the consul solution rollout.

This is very easy to reproduce. Setup a new 3 node cluster with --bootstrap 3, wait until it's all converged with a leader, then kill off the leader (terminate the instance). The cluster will never recover.

dcrystalj · 2017-03-15T16:04:23Z

Isn't this the most basic feature consul should support? Unbelievable it's still not working. Any workarounds?

slackpad · 2017-03-15T16:10:38Z

We've got automation coming in Consul 0.8 that'll fix this - https://github.com/hashicorp/consul/blob/master/website/source/docs/guides/autopilot.html.markdown.

flypenguin · 2017-03-15T16:38:12Z

that isso good to hear :) . our workaround is to scratch consul data dirs on EVERY master host, and re-run puppet which then re-sets consul. our set-up automation can handle that pretty well, without this we'd have been lost a couple of times.

rsrini83 · 2017-04-20T07:22:32Z

Hi We are also facing this issue(no leader elected after system restart). However our consul instance is running in a docker container on multiple EC2 instances. Can any one suggest what is the simple workaround in case of dockerization ?

slackpad · 2017-05-02T22:15:09Z

Closing this out now that Autopilot is available in 0.8.x - https://www.consul.io/docs/guides/autopilot.html.

We've also (in 0.7.x):

Changed the default for leave_on_terminate and skip_leave_on_interrupt for servers to not make them leave when shut down, which is safer by default.
Removed the peer store, so there's not the confusing null.
Improved the outage documentation with more details and a peers.json example.

slackpad · 2017-05-02T22:16:29Z

We also (in 0.7.x) made this change:

Servers will now abort bootstrapping if they detect an existing cluster with configured Raft peers. This will help prevent safe but spurious leader elections when introducing new nodes with bootstrap_expect enabled into an existing cluster. [GH-2319]

edbergavera · 2017-05-14T21:48:43Z

@slackpad In our situation, we have 3-member consul deployed on Kubernetes cluster. Each member is in its own pod. We've recently made changes into our cluster and did a rolling-update. After that the 3 consuls are running fine as per status in Kubernetes but looking at the logs on each member it says no cluster leader. I am able to list all members with consul members (pls see below)

Node Address Status Type Build Protocol DC
consul-consul-0 100.96.2.7:8301 alive server 0.7.5 2 dc1
consul-consul-1 100.96.1.3:8301 alive server 0.7.5 2 dc1
consul-consul-2 100.96.3.6:8301 alive server 0.7.5 2 dc1

Should I try the peers.json file?

slackpad · 2017-05-18T15:27:34Z

Hi @edbergavera if the servers are trying to elect a leader and there are dead servers in the quorum from the rolling update that's preventing it, then you would need to use peers.json per https://www.consul.io/docs/guides/outage.html#manual-recovery-using-peers-json.

edbergavera · 2017-05-18T21:20:06Z

Hello James, I did follow the instructions described in outage document but to no avail. I think this is specific to Kubenetes pod issue with Consul. So, I ended up re-creating the cluster in Kubernetes and restored KVs and that worked. Thank you for your suggestion and looking into this.

…

On Thu, May 18, 2017 at 11:27 PM, James Phillips ***@***.***> wrote: Hi @edbergavera <https://github.com/edbergavera> if the servers are trying to elect a leader and there are dead servers in the quorum from the rolling update that's preventing it, then you would need to use peers.json per https://www.consul.io/docs/guides/outage.html#manual- recovery-using-peers-json. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#993 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABhWeY8zsS3IDPVUk-KQDGxPO2lsE48fks5r7GN0gaJpZM4E0aKV> .

-- *Eduardo D. Bergavera, Jr.* Linux Admin Email: edbergavera@gmail.com OpenID: https://launchpad.net/~edbergavera Github: https://github.com/edbergavera

eladitzhakian · 2017-06-05T08:42:38Z

Having this exact same issue with 0.8.1. A new leader is elected and then leadership is lost, election is restarted. Was able to recover using peers.json, praise the lord.

kyrelos · 2017-06-09T11:04:38Z

Experienced this issue when I activated raft_protocol version 3, reverting to raft_protocol version 2 fixed the issue. But still investigating why switch to v3 triggered the issue.

dgulinobw · 2017-08-07T22:15:16Z

Cluster of 5 running 0.9.0 will not elect a leader w/raft_protocol = 3, but will elect with raft_protocol = 2.

working config:
consul.json:
{
"bootstrap_expect": 5,
"retry_join": [ "a.a.a.a", "b.b.b.b", "c.c.c.c", "d.d.d.d", "e.e.e.e"],
"server": true,
"rejoin_after_leave": true,
"enable_syslog": true,
"data_dir": "/var/consul/data",
"datacenter": "us-east-1",
"recursor": "10.0.0.2",
"advertise_addrs": {
"serf_lan": "a.a.a.a:8301",
"serf_wan": "a.a.a.a:8302",
"rpc": "a.a.a.a:8300"
},
"bind_addr": "z.z.z.z",
"encrypt": "NANANANA",
"ui": true,
"encrypt_verify_incoming": true,
"encrypt_verify_outgoing": true,
"key_file": "/var/consul/data/pki/private/server.key",
"cert_file": "/var/consul/data/pki/certs/server.crt",
"ca_file": "/var/consul/data/pki/certs/ca.crt",
"raft_protocol": 2,
"protocol": 3
}

slackpad · 2017-08-07T22:24:14Z

Hi @dgulinobw can you please open a new issue and include a gist with the server logs when you see this? Thanks!

fix an issue like this hashicorp/consul#993

spuder · 2019-10-16T02:36:52Z

I also ran into this on a new consul cluster running 1.6.0

As soon as I made sure all the consul servers had both a default token and an agent token, the cluster was able to select a leader. Just having a default or agent token was insufficient.

token=11111111111
export CONSUL_HTTP_TOKEN=00000000000000
consul acl set-agent-token default $token
consul acl set-agent-token agent $token
cat /opt/consul/acl-tokens.json

kawsark · 2019-10-17T13:56:23Z

@spuder Interesting. What was your token policy for $token that you set for both default and agent?

ckvtvm · 2022-08-18T18:59:33Z

After a day of testing, this almost works. It starts in bootstrap-expect=1 and elects itself a leader. The others join and I have my cluster back. Unfortunately, I am running into a case, where it decides to give up as a leader. For some reason it detects long dead peers as active and wants to run election which it cannot win because well... the peers are really dead. Is this a bug or is there some reason for that?

http://pastebin.com/NR5RSvDq

you saved my day dear !

slackpad mentioned this issue Jun 17, 2015

Tricky to bootstrap cluster after stopped gracefully #750

Closed

slackpad modified the milestones: 0.7.4, 0.7.3 Jan 17, 2017

starlightcrafted pushed a commit to starlightcrafted/salt-consul that referenced this issue Mar 1, 2017

Fix for hashicorp/consul#993

927202f

dcrystalj mentioned this issue Mar 16, 2017

Deploy a Træfɪk cluster traefik/traefik#736

Closed

slackpad removed this from the Triaged milestone Apr 18, 2017

lachie83 mentioned this issue Apr 25, 2017

consul - unable to recover cluster helm/charts#965

Closed

slackpad closed this as completed May 2, 2017

MorphBonehunter mentioned this issue Oct 17, 2017

no leader selection in bootstrap with raft 3 proto #3580

Closed

abdennour added a commit to abdennour/vault-consul-goldfish-docker that referenced this issue Sep 13, 2019

fix: More HA for Consul

5319ac4

fix an issue like this hashicorp/consul#993

duckhan pushed a commit to duckhan/consul that referenced this issue Oct 24, 2021

Update CHANGELOG.md (hashicorp#993)

3cec6ef

ekmixon mentioned this issue Jul 7, 2023

[Snyk] Security upgrade babel-loader from 8.3.0 to 9.1.3 ekmixon/consul#516

Open

Consul servers won't elect a leader #993

Consul servers won't elect a leader #993

Comments

eirslett commented Jun 2, 2015

eirslett commented Jun 2, 2015

reversefold commented Jun 7, 2015

eirslett commented Jun 8, 2015

reversefold commented Jun 8, 2015

eirslett commented Jun 8, 2015

slackpad commented Jun 9, 2015

reversefold commented Jun 9, 2015

reversefold commented Jun 9, 2015

reversefold commented Jun 9, 2015

slackpad commented Jun 9, 2015

reversefold commented Jun 9, 2015

reversefold commented Jun 9, 2015

reversefold commented Jun 9, 2015

reversefold commented Jun 9, 2015

slackpad commented Jun 10, 2015

ryanuber commented Jun 10, 2015

reversefold commented Jun 10, 2015

orclev commented Jun 18, 2015

slackpad commented Jun 18, 2015

orclev commented Jun 19, 2015

jwestboston commented Jun 26, 2015

juaby commented Jul 2, 2015

slackpad commented Jul 2, 2015

reversefold commented Jul 3, 2015

elephantfries commented Jul 6, 2015

eirslett commented Jul 6, 2015

elephantfries commented Jul 6, 2015

elephantfries commented Jul 7, 2015

pikeas commented Jul 21, 2015

slackpad commented Dec 6, 2016 • edited Loading

haf commented Feb 10, 2017

rhyas commented Feb 10, 2017 • edited Loading

dcrystalj commented Mar 15, 2017

slackpad commented Mar 15, 2017

flypenguin commented Mar 15, 2017

rsrini83 commented Apr 20, 2017

slackpad commented May 2, 2017

slackpad commented May 2, 2017

edbergavera commented May 14, 2017 • edited Loading

slackpad commented May 18, 2017

edbergavera commented May 18, 2017 via email

eladitzhakian commented Jun 5, 2017 • edited Loading

kyrelos commented Jun 9, 2017

dgulinobw commented Aug 7, 2017

slackpad commented Aug 7, 2017

spuder commented Oct 16, 2019 • edited Loading

kawsark commented Oct 17, 2019

ckvtvm commented Aug 18, 2022

slackpad commented Dec 6, 2016 •

edited

Loading

rhyas commented Feb 10, 2017 •

edited

Loading

edbergavera commented May 14, 2017 •

edited

Loading

eladitzhakian commented Jun 5, 2017 •

edited

Loading

spuder commented Oct 16, 2019 •

edited

Loading