-
Notifications
You must be signed in to change notification settings - Fork 40.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove last endpoint for kubernetes Service during graceful shutdown of final kube-apiserver #116685
Remove last endpoint for kubernetes Service during graceful shutdown of final kube-apiserver #116685
Conversation
Hi @czybjtu. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
fix it according to your suggestion, please take a look when you have time. @aojea |
b99e83c
to
3600d59
Compare
@@ -503,6 +503,16 @@ func TestLeaseRemoveEndpoints(t *testing.T) { | |||
expectUpdate: makeEndpointsArray("foo", []string{"4.3.2.2", "4.3.2.3", "4.3.2.4"}, []corev1.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}}), | |||
expectLeases: []string{"4.3.2.2", "4.3.2.3", "4.3.2.4"}, | |||
}, | |||
{ | |||
testName: "the last apiserver was shutdown", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testName: "the last apiserver was shutdown", | |
testName: "the last API server was shut down cleanly", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 if we know that this is the last kube-apiserver in a cluster, which should we do:
- remove the endpoint(s) representing us from the relevant EndpointSlice(s)
- delete those EndpointSlice(s) (the first API server to start up would need to make a new one)
From an architectural / philosophical point of view, I'm not sure which feels more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- remove the endpoint(s) representing us from the relevant EndpointSlice(s)
- delete those EndpointSlice(s) (the first API server to start up would need to make a new one)
the former, because you can never know if your are really the last one, or you are restarting, or .... I think that each apiserver should modify only its own IP, that is the one he is authoritative
This change is visible to end users. We should provide a release note. |
/retitle Remove last endpoint for kubernetes Service during graceful shutdown of final kube-apiserver |
r.StopReconciling() | ||
err = r.RemoveEndpoints(test.serviceName, netutils.ParseIPSloppy(test.ip), test.endpointPorts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this no affect the other tests?
we may add more test cases to exercise the races you mentioned during the review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r.StopReconciling()
has no side effect to current tests because the precondition for executing TestLeaseRemoveEndpoints
function is that the apiserver has already been shutdown. It's better to add test case for the scenario of apiserver start up.
we may add more test cases to exercise the races
Yes, I'll do this later.
/lgtm cancel the tests can be improved, also I couldn't look deeper to see if modifying the existing one can cause issues |
36ccb20
to
35519b1
Compare
35519b1
to
e567490
Compare
/retest-required |
/triage accepted |
kindly ping @aojea |
yeah, if you don't ping me I would nto remember sorry, let me check it |
/lgtm |
LGTM label has been added. Git tree hash: 0394723b279d416652ed5f1ee1d73fc921ceee6f
|
@aojea I have added release note - PTAL. Overall this looks good - thanks for making this happen. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: czybjtu, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
The IP of the last one apiserver isn't removed from k8s svc endpoints on shutdown. Clients (using in-cluster API mode) will try to continue talking/connecting to that instance even after the apiserver is dead.
Which issue(s) this PR fixes:
Fixes #115804
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: