-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dead server not removed from consul manager's servers list when dead server's IP address alive as client #5650
Comments
I believe the problem is in a few places: consul/agent/consul/client_serf.go Line 77 in 9ef2829
consul/agent/consul/server_serf.go Line 146 in 9ef2829
consul/agent/router/serf_adapter.go Line 63 in 9ef2829
Basically when member update events come in (or potentially member join events), if they are servers we should ensure they are tracked as such and if they are not servers we should ensure that they are removed. |
Any update on this issue? |
1 similar comment
Any update on this issue? |
Is this issue fixed? any update? |
@rammohanganap Sorry for the extremely delayed response. Does this still occur for you in later releases 1.4.3+ or 1.7.0. 1.4.3 added this fix: #5317 That mostly resolved the case where a server that had been in the failed/left state and was reaped never got fully removed from the RPC routing system. 1.7.0 added this fix: #6420 That one ensures that the routing infrastructure ignores left servers in the event that they might still be reachable somehow. However looking at the code again, it looks like your case might be a little different. Calling force-leave on a server and replacing with a client might not hit any of the code points that have since been fixed. I think this function might need updating to remove the "server" from the routing infrastructure in the case of it not being a server but rather a client: consul/agent/consul/client_serf.go Lines 128 to 144 in 2282847
|
@mkeeler , we don't see this issue anymore. we can close it. |
When filing a bug, please include the following headings if possible. Any example text in this template can be deleted.
Overview of the Issue
Dead server not removed from consul manager's servers list when dead server's IP address alive as client. New server was added, killed and force left (consul force-leave ). The same IP address was assigned to a new client and we see consul manager is rebalancing to 4 servers even we have only 3 server even after 24h.
Reproduction Steps
Steps to reproduce this issue, eg:
Server log before adding a server:
Server Debug logs after adding new server node (ezk-0760588ca5d7c54e6-a-wo) and killing the consul process ezk-0760588ca5d7c54e6-a-wo:
From one of the server nodes called force-leave the dead node (ezk-0760588ca5d7c54e6-a-wo)
ezk-0760588ca5d7c54e6-a-wo joins as client now:
Debug logs from server after force-leave dead server:
Autopilot config:
Autopilot health:
After 24hrs i still see consul trying to rebalance with 4 servers.
Consul info for both Client and Server
Server info:
Operating system and Environment details
Amazon Linux AMI release 2018.03
Log Fragments
Included in reproduction steps
The text was updated successfully, but these errors were encountered: