-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrap Expect Doesn't Elect a Leader #370
Comments
Nevermind, they seem to be stuck in this state until I actually shut down all servers and bring them up again. |
I'm closing the ticket since it seems to be resolved? But please re-open if not! |
Unfortunately the only solution was to clear out the data dir on all of the nodes. So I'm sure this really isn't solved. Unfortunately I have no idea how to reproduce it. I'll try and collect more data on this. |
Maybe you had already bootstrapped once? The -bootstrap-expect only kicks in when there is no data (e.g. a fresh cluster). It is unsafe for us to do it once there is already previously committed data. |
Oh okay, I had no idea. I feel like there should be a command to force a
|
It should re-heal automatically if you reboot the cluster and elect a leader. An outage is not (and cannot) be automatically recovered from. This happens when you loose more than the quorum of servers. This requires outage recovery, which is outlined in our docs. |
It doesn't seem to automatically heal if you shut down the cluster. Exactly how do I shut down the cluster safely. I always run into issues where I shut down all of the nodes by leaving, and then they fail to elect a leader at start up again. Is this because the final node which wasn't shutdown entered the outage state as a single quorum member? |
So it depends on what you mean. All of the server nodes ever leaving is not considered a standard operating case. The servers are long running, and if you expect to run more than one for HA, it is an outage scenario if only one is running. In the case of all the machines losing power / failing, when they start up it will automatically heal. In the case of all nodes leaving the cluster and shutting down, that is not considered a normal mode of operation. |
Replication token needs acl write on all ns's
I need to collect more data and see if I can reproduce from a clean cluster. My current setup expects 3 servers and usually works fine, electing a leader. Sometimes when starting the cluster the nodes get stuck in a loop where they all sit as followers and never elect a leader. Simply by restarting one of the servers I can kickstart the bootstrap process.
The text was updated successfully, but these errors were encountered: