-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tproxy remove cleanup controller #476
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WOOT!! so good!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!! so nice to see all these red lines 😍
The only question I have is about the test you were backfilling for health checks when pod is pending. I'm not going to block on that, but I feel like it's better to address it in a separate PR since it's possible that the logic might change too (depending on what you think about my comment).
}, | ||
Subsets: []corev1.EndpointSubset{ | ||
corev1.EndpointSubset{ | ||
Addresses: []corev1.EndpointAddress{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my testing I saw that when a pod is pending, it will not be included in the list of endpoints addresses, and at that point this test will be the same as the "Empty Endpoints" one since we will have no link to the pod from the Endpoints object. I think that would mean that we need to re-think the logic around health checks when pod is pending. Perhaps, we just don't need to have that logic anymore?
For reference, I tested it by deploying the helm chart with 3 consul server instances on kind. Because there's only one node, only one instance will be running and the other two will be stuck in the Pending
phase. Then I looked at endpoints of the Consul server service, and saw that there was only address:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
iryna-consul-gk744 0/1 Running 0 4m58s
iryna-consul-server-0 0/1 Running 0 4m58s
iryna-consul-server-1 0/1 Pending 0 4m58s
iryna-consul-server-2 0/1 Pending 0 4m58s
$ kubectl get endpoints iryna-consul-server -o yaml
...
subsets:
- addresses:
- hostname: iryna-consul-server-0
ip: 10.244.0.33
nodeName: kind-control-plane
targetRef:
kind: Pod
name: iryna-consul-server-0
namespace: default
resourceVersion: "171854"
uid: f3f0ab69-e67c-49d1-a8e3-b013abd801e4
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha!! that makes things even more convenient. Delete more code from the PR 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooo! Great catch Iryna! I will remove that test and also remove the code in the endpoint controller which handles that specific case. Thanks!
The reasoning here of course, echoing Iryna, is that since the Pending pod is not in the endpoints list it would have never been referenced yet in the endpoints controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I've removed the test so this PR is only about the cleanup controller and will submit a 2nd PR to remove the PodPending handling (which is a no-op currently) from the endpoints controller!
Remove the cleanup controller and replace it with the supporting logic from #457.
Add icon to chart.yaml
Changes proposed in this PR:
mgr.Start(ctx)
to run the webhook with a context and remove signal handling from connect command, and associated tests.endpoints_controller_test
to provide coverage for creation of health checks when pods are in PodPending phase.How I've tested this PR:
Manually deploy with an application and force-kill pods.
Unit tests pass.
Run connect acceptance tests against this build.
How I expect reviewers to test this PR:
Code review
Test by manually deploying and force/killing pods
Note to reviewers: I'm not 150% certain that test coverage couldn't be improved. We have existing test coverage for the deregister path, and it works with acceptance tests. Suggestions?
Checklist: