Servers can't agree on cluster leader after restart when gossiping on WAN #454

jippi · 2014-11-05T13:14:35Z

Hi,

I'm running consul in a all "WAN" environment, one DC. All my boxes is in the same rack, but do not have a private lan to gossip over.

The first time they join each other, with an empty /opt/consul directory they manage to join and agree on a leader.

If I restart the cluster, they still connect and find each other - but they never seem to agree on a leader

Node            Address              Status  Type    Build  Protocol
consul01        195.1xx.35.xx1:8301  alive   server  0.4.1  2
consul02        195.1xx.35.xx2:8301  alive   server  0.4.1  2
consul03        195.1xx.35.xx3:8301  alive   server  0.4.1  2

they just keep repeating 2014/11/05 13:09:41 [ERR] agent: failed to sync remote state: No cluster leader in the consul monitor output

All nodes are started with /usr/local/bin/consul agent -config-dir /etc/consul

server 1

{
  "advertise_addr": "195.1xx.35.xx1",
  "bind_addr": "195.1xx.35.xx1",
  "bootstrap_expect": 3,
  "client_addr": "0.0.0.0",
  "data_dir": "/opt/consul",
  "datacenter": "online",
  "domain": "consul",
  "log_level": "INFO",
  "ports": {
    "dns": 53
  },
  "recursor": "8.8.8.8",
  "rejoin_after_leave": true,
  "retry_join": [
    "195.1xx.35.xx1",
    "195.1xx.35.xx2",
    "195.1xx.35.xx3"
  ],
  "server": true,
  "start_join": [
    "195.1xx.35.xx1",
    "195.1xx.35.xx2",
    "195.1xx.35.xx3"
  ],
  "ui_dir": "/opt/consul/ui"
}

server 2

{
  "advertise_addr": "195.1xx.35.xx2",
  "bind_addr": "195.1xx.35.xx2",
  "bootstrap_expect": 3,
  "client_addr": "0.0.0.0",
  "data_dir": "/opt/consul",
  "datacenter": "online",
  "domain": "consul",
  "log_level": "INFO",
  "ports": {
    "dns": 53
  },
  "recursor": "8.8.8.8",
  "rejoin_after_leave": true,
  "retry_join": [
    "195.1xx.35.xx1",
    "195.1xx.35.xx2",
    "195.1xx.35.xx3"
  ],
  "server": true,
  "start_join": [
    "195.1xx.35.xx1",
    "195.1xx.35.xx2",
    "195.1xx.35.xx3"
  ],
  "ui_dir": "/opt/consul/ui"
}

server 3

{
  "advertise_addr": "195.1xx.35.xx3",
  "bind_addr": "195.1xx.35.xx3",
  "bootstrap_expect": 3,
  "client_addr": "0.0.0.0",
  "data_dir": "/opt/consul",
  "datacenter": "online",
  "domain": "consul",
  "log_level": "INFO",
  "ports": {
    "dns": 53
  },
  "recursor": "8.8.8.8",
  "rejoin_after_leave": true,
  "retry_join": [
    "195.1xx.35.xx1",
    "195.1xx.35.xx2",
    "195.1xx.35.xx3"
  ],
  "server": true,
  "start_join": [
    "195.1xx.35.xx1",
    "195.1xx.35.xx2",
    "195.1xx.35.xx3"
  ],
  "ui_dir": "/opt/consul/ui"
}

The text was updated successfully, but these errors were encountered:

armon · 2014-11-05T18:32:43Z

Can you provide more log output from the servers after a restart? Specifically, those prefixed with "raft:" are of interest. Also you don't usually need retry_join with start_join since they serve an overlapping purpose with different semantics. retry_join will continuously retry until it succeeds, while start_join exits if the join fails.

adrienbrault · 2014-11-21T16:36:01Z

I am having the same issue.

Note that this only happens with bootstrap_expect. If I have a single bootstrap: true node, the cluster is able to elect a new leader after stopping all the nodes and then starting them all up.

Here's my steps and the logs: https://gist.github.com/adrienbrault/ad8d13802913b095415a

armon · 2014-11-21T18:51:18Z

It looks like both of you are forcing the cluster into an outage state. The bootstrap_expect is only used for an initial bootstrap process. Once the cluster has established quorum, it is not expected to ever loose it again. A loss of quorum requires manual intervention (See: https://www.consul.io/docs/guides/outage.html).

This is what is happening:

Initial cluster start, with bootstrap, leader elected (3 raft peers, quorum size 2)
Server 1 leaves, (2 raft peers, quorum size 2)
Server 2 leaves (1 raft peer, quorum size 1) <= Outage! Without -bootstrap a single node cluster is not allowed for safety reasons (avoid a split-brain in the cluster)
Server 3 leaves (No servers)

When you are starting all the servers again you now have 3 servers again. But this time there is no leader, and no quorum. At this point, it is unsafe for any of the servers to gain leadership (split-brain risks), they will sit there until an operator intervenes.

The best way is to avoid causing an outage. If you cause quorum to be lost, manual intervention is required. Any other approach on the part of Consul would introduce safety issues.

hanshasselberg · 2014-11-21T21:27:52Z

Very interesting @armon! Thanks for the explanation.

adrienbrault · 2014-11-23T11:27:33Z

@armon What about being able to specify the expected quorum size ? It is up to the user to use a correct value, like it is for bootstrap_expect

armon · 2014-11-24T19:01:01Z

@adrienbrault Not currently. There are issues around changing the quorum size then once it is specified. The current approach I think makes the very reasonable trade off of being a zero-touch bootstrap, and zero-touch scale up and down as long as quorum isn't lost. With a sensible amount of redundancy, it should be incredibly unlikely that an operator needs to intervene.

chrismiller · 2014-12-03T21:55:09Z

I can understand why the cluster can't currently recover from losing its quorum, but I too would like to see a way to allow automatic recovery by compromising elsewhere, eg having a fixed quorum size (or a fixed maximum number of Consul servers?) and losing the ability to zero-touch scale up and down. Here's our use case:

We have a bunch of servers running in AWS. I'd like to have Consul servers running on three of them, and Consul clients on the rest. So far so good. The catch is that we shut down all but one of the servers each night (to save money while they are idle). It seems that even if I put a Consul server on the lone surviving machine, I'd have to jump through some extra hoops to have the cluster recover each morning? I don't see us wanting to dynamically add additional Consul servers anytime soon but automatic recovery even after a complete shutdown (intentional or otherwise) would be very beneficial to us. Thoughts?

francois · 2014-12-06T17:48:07Z

Reported the exact same issue when I interrupted the servers in #476. I expected bootstrap_expect to handle this for me.

armon · 2014-12-08T18:39:09Z

Basically we can safely provide one of two things:

Scale up/down without operator intervention (change quorum size)
Fixed quorum size, bootstrap without operator intervention

I don't see a way for us to safely provide both. Currently bootstrap_expect is giving a bit of both in the very special case of an "empty" cluster (no previous leader) because there is no safety concerns there.

So we can have a flag like -expect=3 and we can automatically bootstrap / recover when 3 servers are available. In that world, scaling down to 1 down will cause an outage.

nathanhruby · 2015-05-11T20:00:40Z

Hi,

I'm doing some testing of consul and just ran into this myself, since restarting the cluster at once is trivial with config management tools. I would expect this type of user error to be frequent, especially with folks used to tools that regain quorum when possible (mongo, heartbeat, etc..). I think "never loose quorum" is not tenable in the long run.

It seems like the 80% use case is the "Nothing bad happened, I just restarted my cluster and would like it to return to operation." In that case it appears that consul already has most of the information required to do so without having to sacrifice the zero-touch scale or bootstrap? It knows the previous server list from raft/peers.json and could remember who the last leader was as a replicated fact. Quorum re-convergence then could simply be "all the previous peer members are gossiping again, and I was the previous leader, let's try an election" ?

-n

mohitarora · 2015-05-12T02:20:27Z

Can we at least have steps documented somewhere for restoring the cluster in case we lose the quorum?

ryanbreen · 2015-05-12T02:21:50Z

@mohitarora Is that not covered here? http://www.consul.io/docs/guides/outage.html

mohitarora · 2015-05-12T02:36:27Z

@ryanbreen That didn't help. Here is what i did

Started first node in bootstrap mode ( this node will self-elect as leader, creating a basis for forming the cluster.)
Started second and third node in non bootstrap mode which i call a normal server
I wanted each server on equal footing so I did shutdown the bootstrapped consul instance and then re-enter the cluster as a normal server.

Everything looks good at this point.

I forced the cluster to lose quorum by restarting 2 of the 3 nodes at same time. Nodes came back online but leader was not selected on its own.

I want to know what should be my next step here.

Should i again start node 1 in bootstrap mode and re-execute the steps mentioned above?

ryanbreen · 2015-05-12T02:39:16Z

I would suggest bootstrapping with --bootstrap-expect=3 instead of a -bootstrap, per this guide: http://www.consul.io/docs/guides/bootstrapping.html

nathanhruby · 2015-05-12T03:55:22Z

bootstrap-expect doesn't seem work after a node has entered and left a cluster and has vestigial raft data left in the datadir.

mohitarora · 2015-05-12T04:13:03Z

Thanks @ryanbreen .

-bootstrap-expect is better than -bootstrap, I will start using that but both these are used when cluster is initialized.

I still need the steps for recovery once quorum is lost. In my case no leader was selected once all nodes came back to life after quorum loss.

saulshanabrook · 2015-05-12T23:11:39Z

I also had a similar question to @mohitarora. If we do lose quorum, and there is no leader, what do we do?

highlyunavailable · 2015-05-12T23:41:57Z

@saulshanabrook Then you're in an outage scenario (since you've lost quorum) and need to decide which server is authoritative, then follow the outage recovery guide.

One thing I've also found: If your leaders actually leave the cluster when shutdown cleanly (rather than kill -9 style killed), which is the default behavior, then they can't rejoin after a restart due to having notified the cluster they are leaving. @mohitarora try setting leave-on-terminate to false in your server config and then repeating your test of starting 3 servers then stopping 2 and bringing them back up.

saulshanabrook · 2015-05-13T00:11:59Z

@highlyunavailable What if all three go down at once? How do I restart then?

highlyunavailable · 2015-05-13T00:29:45Z

If they went down hard or if you had leave-on-terminate set to false you should be able to just start them back up in any order assuming their IPs didn't change. If they did you need to do outage recovery.

stuart-warren · 2015-05-26T13:06:57Z

I'm getting the same problem with bootstrap_expect: 3

I'm not following why it doesn't use this value to know when it's safe to vote a leader, whether there was ever a leader before or not...

Here are some logs as requested from the OP after a restart of a node
6 nodes, 3 are servers

$ consul monitor
2015/05/26 13:16:08 [INFO] raft: Restored from snapshot 62-41051-1432252097671
2015/05/26 13:16:08 [INFO] raft: Node at 10.97.13.132:8300 [Follower] entering Follower state
2015/05/26 13:16:08 [INFO] serf: EventMemberJoin: ithftz01.comp.com 10.97.13.132
2015/05/26 13:16:08 [INFO] consul: adding server ithftz01.comp.com (Addr: 10.97.13.132:8300) (DC: multitest)
2015/05/26 13:16:08 [INFO] serf: Attempting re-join to previously known node: ithftz03.comp.com: 10.97.13.134:8301
2015/05/26 13:16:08 [INFO] serf: EventMemberJoin: ithftz01.comp.com.multitest 10.97.13.132
2015/05/26 13:16:08 [WARN] serf: Failed to re-join any previously known node
2015/05/26 13:16:08 [INFO] consul: adding server ithftz01.comp.com.multitest (Addr: 10.97.13.132:8300) (DC: multitest)
2015/05/26 13:16:08 [INFO] serf: EventMemberJoin: ithftz02.comp.com 10.97.13.133
2015/05/26 13:16:08 [INFO] serf: EventMemberJoin: ithftz03.comp.com 10.97.13.134
2015/05/26 13:16:08 [INFO] consul: adding server ithftz02.comp.com (Addr: 10.97.13.133:8300) (DC: multitest)
2015/05/26 13:16:08 [INFO] serf: EventMemberJoin: ithfpz01.comp.com 10.97.13.137
2015/05/26 13:16:08 [WARN] memberlist: Refuting an alive message
2015/05/26 13:16:08 [INFO] consul: adding server ithftz03.comp.com (Addr: 10.97.13.134:8300) (DC: multitest)
2015/05/26 13:16:08 [INFO] serf: EventMemberJoin: ithfpz03.comp.com 10.97.13.135
2015/05/26 13:16:08 [INFO] serf: EventMemberJoin: ithfpz02.comp.com 10.97.13.136
2015/05/26 13:16:08 [INFO] serf: Re-joined to previously known node: ithftz03.comp.com: 10.97.13.134:8301
2015/05/26 13:16:08 [INFO] agent: Joining cluster...
2015/05/26 13:16:08 [INFO] agent: (LAN) joining: [10.97.13.134 10.97.13.133 10.97.13.132]
2015/05/26 13:16:08 [ERR] agent: failed to sync remote state: No cluster leader
2015/05/26 13:16:08 [INFO] agent: (LAN) joined: 3 Err: <nil>
2015/05/26 13:16:08 [INFO] agent: Join completed. Synced with 3 initial agents
2015/05/26 13:16:09 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
2015/05/26 13:16:13 [INFO] agent.rpc: Accepted client: 127.0.0.1:58283
2015/05/26 13:16:37 [ERR] agent: failed to sync remote state: No cluster leader
2015/05/26 13:16:52 [ERR] agent: failed to sync remote state: No cluster leader
2015/05/26 13:17:20 [ERR] agent: failed to sync remote state: No cluster leader

jwestboston · 2015-07-28T19:12:27Z

This is becoming a problem for me in production and I may have to move away from Consul. If I stop all 3 consul nodes (in my 3 node cluster), I cannot start the Consul cluster back up without a major headache.

Are there any thoughts as to how to properly handle this?

armon · 2015-07-28T19:38:10Z

Most of these problems are caused by our default behavior of attempting a graceful leave. Our mental model is that servers are long lived and don't shutdown for any reason other than unexpected power loss, or a graceful maintenance in which case you need to leave the cluster. In retrospect that was a bad default. Almost all of this can be avoided by just kill -9 the Consul server, in affect simulating power loss.

There is clearly a UX issue here with Consul that we need to address, but this behavior is not a bug. It is a manifestation of bad UX leading to operator error that is causing a quorum loss in a way that is predictable and expected. You can either tune the settings to be non-default behavior and force a non-graceful exit, or just "pull the plug" with kill.

This is a classic "damned if you do, damned if you don't", since if we change the defaults to the inverse, we will have a new corresponding ticket where anybody who was expected the leave to be graceful has now caused quorum loss by operator error in the reverse sense. I'm not sure what the best answer is here.

jwestboston · 2015-07-28T21:52:26Z

Hey @armon

I think there may actually be a bug here though, no? In order to fix the issue I have to:

Stop consul on all 3 server nodes.
Rewrite /var/lib/consul/raft/peers.json with the correct contents.
Start consul on all 3 server nodes.

The issue with peers.json is the actual contents seem to be written as "null". (Aka the string "null").

armon · 2015-07-28T22:09:36Z

@jwestboston From the perspective of Consul, all three servers have left that cluster. They should not rejoin any gossip peers or take part in any future replication. If they had not left the cluster, they would still have those peers and would rejoin the cluster on start.

Because all the servers or a majority of them have done this, it is an outage that now requires manual intervention. Does this make sense?

jwestboston · 2015-07-29T22:40:34Z

@armon Ahh .. yes .. that does make sense! :-) Thanks for the clarification. The peers.json file is a list of folks actually, expected to be in the cluster at the current time. Stopping consul == gracefully exiting the cluster == removal from that list across the cluster.

So really, indeed, things are operating as designed. And all we need (maybe? perhaps?) is an easier experience for cold restarting an existing Consul cluster.

stuart-warren · 2015-07-30T15:10:34Z

So we should potentially set skip_leave_on_interrupt to true?

rmullinnix · 2015-07-31T14:49:24Z

I have an ansible playbook that will bounce the cluster in this situation. It pushes out a valid peers.json based on the hosts file. (I had to do a kludge to get the double quotes right with sed). Then restarts the consul service (I run consul as an installed service on linux)

https://gist.github.com/rmullinnix/ebb5ef2bb877309aebcd

tkyang99 · 2016-02-09T03:49:26Z

I'm running into similar problems by just Ctrl-C shutting down agents and then starting them up again. So has this issue ever been solved?

slackpad · 2016-02-09T04:17:26Z

@tkyang99 do you have skip_leave_on_interrupt set to true? You'll want that for server configurations because you'll otherwise leave them from the cluster as you ctrl-c them, which will cause an outage.

mrwiora · 2016-02-25T09:52:46Z

@armon
I'm having a cluster of 3 nodes on an environment, which is subject to be stopped and restarted in an automated manner.
Since I had trouble with getting the consul cluster up and running again I was looking for the reason why no leader could be elected.

Regarding your post this seems to be an expected behaviour. Thanks for your clarification at this point!

Is there any way to prepare the consul cluster for such restarts - like shutting down 2 of 3 instances before stopping the instances?
Beside of telling the nodes that they should not leave gracefully - since the master would have been selected manually in a complicated way (stopping the running service, forcing bootstrapping on this node, starting the service, stopping the service, removing enforced bootstrap, starting the service again)

Is there a feature planned to be implemented in future releases?

Cheers,
µatthias

slackpad · 2016-03-10T06:18:26Z

Hi @mwiora it should be possible to kill -9 the servers, or stop them if the leave_on_terminate settings are set correctly and they will renegotiate leadership when they come back since none of the servers have left the quorum. We are working on some issues related to this, however - #1534.

…nterrupt - see hashicorp/consul#454 (comment)

chrisrana · 2016-05-09T11:35:02Z

Hi I am getting following error
Still num_peers = 0
Now I am getting error

2016/05/09 09:56:50 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:56:50 [ERR] consul: 'cf-vaultdemo-vault_consul-2' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:57:50 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:57:50 [ERR] consul: 'cf-vaultdemo-vault_consul-2' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:58:50 [ERR] consul: 'cf-vaultdemo-vault_consul-2' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:58:50 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:59:53 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:59:53 [ERR] consul: 'cf-vaultdemo-vault_consul-2' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.

mode, not adding Raft peer.
2016/05/09 09:57:55 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-2' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:58:55 [ERR] consul: 'cf-vaultdemo-vault_consul-1' and 'cf-vaultdemo-vault_consul-2' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:58:55 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-2' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:59:55 [ERR] consul: 'cf-vaultdemo-vault_consul-1' and 'cf-vaultdemo-vault_consul-2' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:59:55 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-2' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 10:00:55 [ERR] consul: 'cf-vaultdemo-vault_consul-1' and 'cf-vaultdemo-vault_consul-0' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 10:00:55 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-2' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.

Node 1:
{
"data_dir": "/var/vcap/store/consul",
"ui_dir": "/var/vcap/packages/consul-ui",
"node_name": "cf-vaultdemo-vault_consul-0",
"bind_addr": "0.0.0.0",
"client_addr": "0.0.0.0",
"advertise_addr": "10.20.0.252",
"leave_on_terminate": false,
"log_level": "INFO",
"domain": "consul",
"server": true,
"rejoin_after_leave": true,
"ports": {
"dns": 53
},
"disable_update_check": true,
"recursor": "10.20.0.40",
"start_join": [

],
"retry_join": [

],
"bootstrap_expect": 3
}

Node 2
{
"data_dir": "/var/vcap/store/consul",
"ui_dir": "/var/vcap/packages/consul-ui",
"node_name": "cf-vaultdemo-vault_consul-1",
"bind_addr": "0.0.0.0",
"client_addr": "0.0.0.0",
"advertise_addr": "10.20.0.254",
"leave_on_terminate": false,
"log_level": "INFO",
"domain": "consul",
"server": true,
"rejoin_after_leave": true,
"ports": {
"dns": 53
},
"disable_update_check": true,
"recursor": "10.20.0.40",
"start_join": [

],
"retry_join": [

],
"bootstrap_expect": 3
}

Node 3

{
"data_dir": "/var/vcap/store/consul",
"ui_dir": "/var/vcap/packages/consul-ui",
"node_name": "cf-vaultdemo-vault_consul-2",
"bind_addr": "0.0.0.0",
"client_addr": "0.0.0.0",
"advertise_addr": "10.20.0.32",
"leave_on_terminate": false,
"log_level": "INFO",
"domain": "consul",
"server": true,
"rejoin_after_leave": true,
"ports": {
"dns": 53
},
"disable_update_check": true,
"recursor": "10.20.0.40",
"start_join": [

],
"retry_join": [

],
"bootstrap_expect": 3

angelosanramon · 2016-09-06T19:54:51Z

I am having the same issue with Consul 0.6.4. After playing with it for a couple of days I found that the easiest way to fix this is:

login to one of the server node.
edit consul.conf
remove the bootstrap_expect key
add bootstrap: true
restart consul
This should elect a leader in the cluster. once leader is elected, remove bootstrap key and add bootstrap_expect key again.

On my Ansible playbook, I have a shell task that deals with this problem:
result=$(curl http://localhost:8500/v1/status/leader?token={TOKEN})
if [ -z "$result" -o "$result" == '""' ]; then
jq 'del(.bootstrap_expect) | .bootstrap=true' /etc/consul.conf > /tmp/consul.conf
mv -f /etc/consul.conf /etc/consul.conf.bak
mv -f /tmp/consul.conf /etc/consul.conf
service consul restart
mv -f /etc/consul.conf.bak /etc/consul.conf
fi

Hope this helps.

ljsommer · 2016-09-12T22:06:54Z

I was having an issue with a 3 node cluster so I figured I'd restart them to see if that addressed the issue.
"What could go wrong? They'll just re-elect a new leader."
I cannot get a new leader elected.

First I created a /data/raft/peers.json file populated as described in this guide:
https://www.consul.io/docs/guides/outage.html

I tried @angelosanramon suggestion which got me slightly further but not far enough.
`==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 'consul0.lx.pri'
Datacenter: 'dc1'
Server: true (bootstrap: true)
Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
Cluster Addr: 10.5.100.223 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas:

==> Log data will now stream in as it occurs:

2016/09/12 21:20:46 [INFO] raft: Restored from snapshot 108-1158239-1473679882905
2016/09/12 21:20:46 [INFO] serf: EventMemberJoin: consul0.lx.pri 10.5.100.223
2016/09/12 21:20:46 [INFO] serf: EventMemberJoin: consul0.lx.pri.dc1 10.5.100.223
2016/09/12 21:20:46 [INFO] raft: Node at 10.5.100.223:8300 [Follower] entering Follower state
2016/09/12 21:20:46 [INFO] consul: adding LAN server consul0.lx.pri (Addr: 10.5.100.223:8300) (DC: dc1)
2016/09/12 21:20:46 [INFO] consul: adding WAN server consul0.lx.pri.dc1 (Addr: 10.5.100.223:8300) (DC: dc1)
2016/09/12 21:20:46 [ERR] agent: failed to sync remote state: No cluster leader
2016/09/12 21:20:48 [WARN] raft: Heartbeat timeout reached, starting election
2016/09/12 21:20:48 [INFO] raft: Node at 10.5.100.223:8300 [Candidate] entering Candidate state
2016/09/12 21:20:48 [INFO] raft: Election won. Tally: 1
2016/09/12 21:20:48 [INFO] raft: Node at 10.5.100.223:8300 [Leader] entering Leader state
2016/09/12 21:20:48 [INFO] consul: cluster leadership acquired
2016/09/12 21:20:48 [INFO] consul: New leader elected: consul0.lx.pri
2016/09/12 21:20:48 [INFO] raft: Disabling EnableSingleNode (bootstrap)
2016/09/12 21:20:49 [INFO] raft: Removed ourself, transitioning to follower
2016/09/12 21:20:49 [INFO] raft: Node at 10.5.100.223:8300 [Follower] entering Follower state
2016/09/12 21:20:49 [ERR] consul.catalog: Register failed: node is not the leader
2016/09/12 21:20:49 [ERR] agent: failed to sync changes: node is not the leader
2016/09/12 21:20:49 [INFO] consul: cluster leadership lost
2016/09/12 21:20:49 [ERR] consul: failed to wait for barrier: node is not the leader
2016/09/12 21:20:51 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.`

At this point I am actually going to tear down my entire Consul cluster and start from scratch. This is definitely an issue.

slackpad · 2016-09-12T22:14:53Z

Hi @ljsommer sorry about that.

2016/09/12 21:20:49 [INFO] raft: Removed ourself, transitioning to follower

It looks like you have the leave entry in your Raft log, which is un-doing your peers.json change. This should be fixed in master as the peers.json contents are applied last, any chance you can try this with the 0.7.0-rc2 build?

ljsommer · 2016-09-12T22:18:17Z

@slackpad

I have the good fortune to be able to actually rebuild from scratch without losing any critical data, and yes I'll definitely be using the latest version of Consul to do it. When I get fully rebuilt I will be simulating this same scenario and documenting the results. I'll make sure and update this thread when I do with a step by step guide.

SukantGujar · 2017-07-18T13:25:28Z

Any updates on your tryst @ljsommer? We are also facing this issue. We have used the Consul on Kubernetes recipe from https://github.com/kelseyhightower/consul-on-kubernetes, hosted on GKE. Its a typical three peer cluster. GKE crashed all the nodes when scaling up the cluster and since then the nodes stopped electing leader. Finally I had to remove and re-deploy them.

jacohend · 2017-09-28T21:47:55Z

Facing this issue as well with Consul on Kubernetes

cmorent · 2017-10-03T09:33:44Z

Same here !

shantanugadgil · 2018-09-12T16:15:47Z

I too am using Nomad + Consul in a multi region (three AWS regions so far) mode with Cloud AutoJoin settings.
Each region has only one server for (as this is an experimental setup)

The option bootstrap_expect = 1 causes servers on either side to start printing the error message:

[ERR] nomad: 'us-west-2-server-10-xx-xx-xx.us-west-2' and 'us-east-1-server-xx-xx-xx-xx.us-east-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.

Not setting bootstrap_expect in the second region causes

[ERR] worker: failed to dequeue evaluation: No cluster leader
[ERR] http: Request /v1/agent/health?type=server, error: {"server":{"ok":false,"message":"No cluster leader"}}

At least setting up bootstrap_expect makes things work.

mud5150 · 2019-06-12T21:32:27Z

@armon This is definitely still an issue. Why was this bug closed?

hanshasselberg · 2019-06-12T21:47:46Z

@mud5150 this is a very old issue - closed in 2014. If you are still seeing this behaviour, could you create a new issue with your setup and a reference to this issue?

davidhessler · 2019-10-25T23:36:38Z

This is totally still an issue. It would be great if there was a way to manually designate a leader in this case. I'm having a production outage right now because of this. Thanks @hashicorp-support

kikitux · 2019-10-26T10:27:26Z

@davidhhessler you are commenting on a closed issue from 2014.

if you are having a issue, i would suggest open a new issue and provide the information required on the template, and share more information so development team can evaluate this.

if this is not a bug, then the best place would be use the community places:
https://www.consul.io/community.html

jippi changed the title ~~Servers can't agree on cluster leader after restart~~ Servers can't agree on cluster leader after restart when gossiping on WAN Nov 5, 2014

j1n6 mentioned this issue Jun 8, 2015

Tricky to bootstrap cluster after stopped gracefully #750

Closed

slackpad mentioned this issue Jun 10, 2015

Consul servers won't elect a leader #993

Closed

peterbroadhurst mentioned this issue Oct 15, 2015

Resient server cluster in a environment where containers get new IPs #1306

Closed

slackpad mentioned this issue Dec 18, 2015

All server nodes as Leaders randomly #1509

Closed

giabao added a commit to giabao/ansible-consul that referenced this issue Mar 14, 2016

support https://www.consul.io/docs/agent/options.html#skip_leave_on_i…

cc51d17

…nterrupt - see hashicorp/consul#454 (comment)

jippi closed this as completed Apr 2, 2016

nMustaki mentioned this issue Jun 21, 2016

fail to reboot the cluster #2073

Closed

hashicorp locked as resolved and limited conversation to collaborators Oct 26, 2019

Servers can't agree on cluster leader after restart when gossiping on WAN #454

Servers can't agree on cluster leader after restart when gossiping on WAN #454

Comments

jippi commented Nov 5, 2014

armon commented Nov 5, 2014

adrienbrault commented Nov 21, 2014

armon commented Nov 21, 2014

hanshasselberg commented Nov 21, 2014

adrienbrault commented Nov 23, 2014

armon commented Nov 24, 2014

chrismiller commented Dec 3, 2014

francois commented Dec 6, 2014

armon commented Dec 8, 2014

nathanhruby commented May 11, 2015

mohitarora commented May 12, 2015

ryanbreen commented May 12, 2015

mohitarora commented May 12, 2015

ryanbreen commented May 12, 2015

nathanhruby commented May 12, 2015

mohitarora commented May 12, 2015

saulshanabrook commented May 12, 2015

highlyunavailable commented May 12, 2015

saulshanabrook commented May 13, 2015

highlyunavailable commented May 13, 2015

stuart-warren commented May 26, 2015

jwestboston commented Jul 28, 2015

armon commented Jul 28, 2015

jwestboston commented Jul 28, 2015

armon commented Jul 28, 2015

jwestboston commented Jul 29, 2015

stuart-warren commented Jul 30, 2015

rmullinnix commented Jul 31, 2015

tkyang99 commented Feb 9, 2016

slackpad commented Feb 9, 2016

mrwiora commented Feb 25, 2016

slackpad commented Mar 10, 2016

chrisrana commented May 9, 2016

angelosanramon commented Sep 6, 2016

ljsommer commented Sep 12, 2016

slackpad commented Sep 12, 2016

ljsommer commented Sep 12, 2016

SukantGujar commented Jul 18, 2017

jacohend commented Sep 28, 2017

cmorent commented Oct 3, 2017

shantanugadgil commented Sep 12, 2018

mud5150 commented Jun 12, 2019

hanshasselberg commented Jun 12, 2019

davidhessler commented Oct 25, 2019

kikitux commented Oct 26, 2019