-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consul Node stuck at Leaving status #6882
Comments
I've been having the same issue with Consul 1.6.2 running on k8s. I can do a rolling redeploy via a statefulset update and sometimes one of the nodes will show SerfStatus leaving and autopilot shows unhealthy. Like you said, only after I delete the container manually does it come back up again as healthy. Did you ever figure out what the problem was? |
we are having the same issue with one of the consul members becomes leaving when the underlying node is terminated. I have to kill the pod manually to let it back to alive. |
I am facing a similar issue, where one of the consul client see another as leaving while the latter is alive. |
Hey @lwei-wish & @mssawant May I ask which version(s) of consul y'all are running? It would also be helpful to any logs if you have them. |
hi @Amier3, I am running version 1.9.1, so whenever I delete a pod running Consul client agent, on restart it just fails to resolve the node name to new ip address and all the other node sees this restarted pod as failed.
Any help will be appreciated. |
Hello! It has been a while, I do not recollect which part exactly helped us solve the problem. I'm pasting the docker-compose and consul-config below if that helps. compose:
config:
|
Thanks @anshitabharti , thought |
we are having the same issue with one of the consul client see another as leaving while the latter is alive. |
Overview of the issue:
Sometimes one of the nodes SerfStatus is stuck as leaving state. Even though agent is started initially with retry-join, if it falls out of the cluster, it is unable to join back. When the node goes out the cluster, container is still up and running. To solve this issue the container has to be manually restarted which we want to avoid.
Even if just one of the node is in Leaving Status, the monitoring api v1/operator/autopilot/health responds with Healthy: false, even though all k/v operations can be executed without any issues. Because of Healthy: false the alerts kick in and creates panic if the cluster is actually unhealthy. What's the rationale behind considering the cluster unhealthy?
Consul version: 1.5.3, running inside docker containers, on openstack VMs.
The text was updated successfully, but these errors were encountered: