pods not deregistering from service catalog on termination #1817

mr-miles · 2022-12-21T23:13:33Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

I am using consul-k8s helm chart 1.0.2 with external servers v1.14.3.
I am using connect-inject with the transparent proxy and my pods are in the service mesh.

The individual pods start and register fine and receive traffic as expected, but I notice over time that the service catalog contains entries for pods which no longer exist. It's as if they don't deregister when they are stopped, however when I manually stop pods it all works correctly. Even more weirdly, consul servers claim the instances are still healthy.

If I manually deregister the zombie services via the http api, their details are still returned if I call /catalog/connect/ which is causing problems with the service mesh since the envoy clusters try to contact the non-existent pods.

I'm having trouble working out where to start troubleshooting - can you give any pointers as to what log entries to search for, either on the pod logs or the servers themselves?

Additionally I see entries like this in the consul-dataplane logs, but the mesh itself appears to be working correctly. Is that expected or is there additional transparent proxy config required?

[DEBUG] consul-dataplane.dns-proxy.udp: timeout waiting for read: error="read udp 127.0.0.1:8600: i/o timeout"

Additional Context

AWS EKS - 1.22

mr-miles · 2022-12-22T11:31:56Z

I solved one of these issues (that the services still appeared as connect-enabled). I had to also explicitly deregister the corresponding sidecar proxies but now the service mesh works again

mr-miles · 2023-04-25T23:05:30Z

also hashicorp/consul#15908

david-yu · 2023-07-20T21:28:45Z

Closing as the PR is now merged: #2571. This should be released in 1.2.x, 1.1.x, and 1.0.x by mid August timeframe for our next set of patch releases.

mr-miles added the type/bug Something isn't working label Dec 21, 2022

Stephani0106 mentioned this issue Apr 18, 2023

Nodes and dead services remaining in the consul catalog #2065

Closed

mr-miles mentioned this issue Jun 2, 2023

Handle errors properly when services are de-registered from the catalog #2258

Closed

2 tasks

mr-miles mentioned this issue Jun 29, 2023

BUG+FIX: Endpoints controller fails to deregister services #2491

Closed

curtbushko mentioned this issue Jul 20, 2023

Handle errors properly when services are de-registered from the catalog #2571

Merged

2 tasks

david-yu closed this as completed Jul 20, 2023

david-yu mentioned this issue Jul 20, 2023

Orphan sidecar proxies affect service mesh hashicorp/consul#15908

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pods not deregistering from service catalog on termination #1817

pods not deregistering from service catalog on termination #1817

mr-miles commented Dec 21, 2022

mr-miles commented Dec 22, 2022

mr-miles commented Apr 25, 2023

david-yu commented Jul 20, 2023

pods not deregistering from service catalog on termination #1817

pods not deregistering from service catalog on termination #1817

Comments

mr-miles commented Dec 21, 2022

Community Note

Overview of the Issue

Additional Context

mr-miles commented Dec 22, 2022

mr-miles commented Apr 25, 2023

david-yu commented Jul 20, 2023