Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pods not deregistering from service catalog on termination #1817

Closed
mr-miles opened this issue Dec 21, 2022 · 3 comments
Closed

pods not deregistering from service catalog on termination #1817

mr-miles opened this issue Dec 21, 2022 · 3 comments
Labels
type/bug Something isn't working

Comments

@mr-miles
Copy link
Contributor

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

I am using consul-k8s helm chart 1.0.2 with external servers v1.14.3.
I am using connect-inject with the transparent proxy and my pods are in the service mesh.

The individual pods start and register fine and receive traffic as expected, but I notice over time that the service catalog contains entries for pods which no longer exist. It's as if they don't deregister when they are stopped, however when I manually stop pods it all works correctly. Even more weirdly, consul servers claim the instances are still healthy.

If I manually deregister the zombie services via the http api, their details are still returned if I call /catalog/connect/ which is causing problems with the service mesh since the envoy clusters try to contact the non-existent pods.

I'm having trouble working out where to start troubleshooting - can you give any pointers as to what log entries to search for, either on the pod logs or the servers themselves?

Additionally I see entries like this in the consul-dataplane logs, but the mesh itself appears to be working correctly. Is that expected or is there additional transparent proxy config required?

[DEBUG] consul-dataplane.dns-proxy.udp: timeout waiting for read: error="read udp 127.0.0.1:8600: i/o timeout"

Additional Context

AWS EKS - 1.22

@mr-miles mr-miles added the type/bug Something isn't working label Dec 21, 2022
@mr-miles
Copy link
Contributor Author

I solved one of these issues (that the services still appeared as connect-enabled). I had to also explicitly deregister the corresponding sidecar proxies but now the service mesh works again

@mr-miles
Copy link
Contributor Author

also hashicorp/consul#15908

@david-yu
Copy link
Contributor

Closing as the PR is now merged: #2571. This should be released in 1.2.x, 1.1.x, and 1.0.x by mid August timeframe for our next set of patch releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants