Bootstrap Expect Doesn't Elect a Leader #370

wkennington · 2014-09-27T06:44:25Z

I need to collect more data and see if I can reproduce from a clean cluster. My current setup expects 3 servers and usually works fine, electing a leader. Sometimes when starting the cluster the nodes get stuck in a loop where they all sit as followers and never elect a leader. Simply by restarting one of the servers I can kickstart the bootstrap process.

wkennington · 2014-09-27T06:47:31Z

Nevermind, they seem to be stuck in this state until I actually shut down all servers and bring them up again.

armon · 2014-09-29T05:04:48Z

I'm closing the ticket since it seems to be resolved? But please re-open if not!

wkennington · 2014-09-29T05:31:17Z

Unfortunately the only solution was to clear out the data dir on all of the nodes. So I'm sure this really isn't solved. Unfortunately I have no idea how to reproduce it. I'll try and collect more data on this.

armon · 2014-09-29T20:52:02Z

Maybe you had already bootstrapped once? The -bootstrap-expect only kicks in when there is no data (e.g. a fresh cluster). It is unsafe for us to do it once there is already previously committed data.

wkennington · 2014-09-29T21:00:27Z

Oh okay, I had no idea. I feel like there should be a command to force a
node into becoming the leader using http or RPC so that I don't have to
mess with my service files every time I reboot the cluster
On Sep 29, 2014 1:52 PM, "Armon Dadgar" notifications@github.com wrote:

Maybe you had already bootstrapped once? The -bootstrap-expect only kicks
in when there is no data (e.g. a fresh cluster). It is unsafe for us to do
it once there is already previously committed data.

—
Reply to this email directly or view it on GitHub
#370 (comment).

armon · 2014-09-29T21:11:13Z

It should re-heal automatically if you reboot the cluster and elect a leader. An outage is not (and cannot) be automatically recovered from. This happens when you loose more than the quorum of servers. This requires outage recovery, which is outlined in our docs.

wkennington · 2014-09-30T05:10:43Z

It doesn't seem to automatically heal if you shut down the cluster. Exactly how do I shut down the cluster safely. I always run into issues where I shut down all of the nodes by leaving, and then they fail to elect a leader at start up again. Is this because the final node which wasn't shutdown entered the outage state as a single quorum member?

armon · 2014-09-30T18:01:20Z

So it depends on what you mean. All of the server nodes ever leaving is not considered a standard operating case. The servers are long running, and if you expect to run more than one for HA, it is an outage scenario if only one is running. In the case of all the machines losing power / failing, when they start up it will automatically heal. In the case of all nodes leaving the cluster and shutting down, that is not considered a normal mode of operation.

Replication token needs acl write on all ns's

armon closed this as completed Sep 29, 2014

runswithd6s mentioned this issue Feb 25, 2015

consul should not 'leave' for init script 'stop' action voxpupuli/puppet-consul#85

Closed

duckhan pushed a commit to duckhan/consul that referenced this issue Oct 24, 2021

Merge pull request hashicorp#370 from hashicorp/replication-token-global

304aa72

Replication token needs acl write on all ns's

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bootstrap Expect Doesn't Elect a Leader #370

Bootstrap Expect Doesn't Elect a Leader #370

wkennington commented Sep 27, 2014

wkennington commented Sep 27, 2014

armon commented Sep 29, 2014

wkennington commented Sep 29, 2014

armon commented Sep 29, 2014

wkennington commented Sep 29, 2014

armon commented Sep 29, 2014

wkennington commented Sep 30, 2014

armon commented Sep 30, 2014

Bootstrap Expect Doesn't Elect a Leader #370

Bootstrap Expect Doesn't Elect a Leader #370

Comments

wkennington commented Sep 27, 2014

wkennington commented Sep 27, 2014

armon commented Sep 29, 2014

wkennington commented Sep 29, 2014

armon commented Sep 29, 2014

wkennington commented Sep 29, 2014

armon commented Sep 29, 2014

wkennington commented Sep 30, 2014

armon commented Sep 30, 2014