-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete nodes if the GCE instance id changes and pods bound to the deleted node #368
Conversation
Welcome @CoderSherlock! |
Hi @CoderSherlock. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
a746517
to
2523280
Compare
/retest |
2523280
to
cb5cc03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Left a few minor comments
32b5aab
to
1881a60
Compare
0347d44
to
1881a60
Compare
54801a6
to
0e3c2a4
Compare
0e3c2a4
to
3ccabe0
Compare
3ccabe0
to
525eaee
Compare
…the deleted nodes
525eaee
to
d132a56
Compare
@@ -844,6 +861,22 @@ func shouldDeleteNode(ctx *controllerContext, node *v1.Node, getInstance func(*c | |||
klog.Errorf("Error retrieving instance %q: %v", node.Name, err) | |||
return false, err | |||
} | |||
|
|||
if inst != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can inst be nil if err was not nil, is this check unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Didn't see any possible where inst and err could be both nil... Except in one of the unit test, I have tweaked that test case and mentioned you there. Thanks for addressing this.
08168c6
to
63877ad
Compare
Added extra pod deletion to avoid old pods when pod gc controller not active Added testcase to test if a node should be deleted if the GCE instance ID changes
63877ad
to
dea9e67
Compare
@@ -1189,6 +1189,7 @@ func TestShouldDeleteNode(t *testing.T) { | |||
Name: "node-test", | |||
}, | |||
}, | |||
instance: &compute.Instance{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vinayakankugoyal Added an empty instance in this mock test case, without this instance, inst and err are both nil in unit test.
@@ -844,6 +861,22 @@ func shouldDeleteNode(ctx *controllerContext, node *v1.Node, getInstance func(*c | |||
klog.Errorf("Error retrieving instance %q: %v", node.Name, err) | |||
return false, err | |||
} | |||
|
|||
if inst != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Didn't see any possible where inst and err could be both nil... Except in one of the unit test, I have tweaked that test case and mentioned you there. Thanks for addressing this.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: CoderSherlock, vinayakankugoyal The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
After node's preemption, pods bound to the deleted node may not finish termination and will be in failed phase stays in the pod list. Especially in preemptible VMs, graceful shutdown (30 seconds) can meet the requirement to clean pods on time.
@bobbypage and I worked on cleaning pods bound to the old node when the new node is registered. Our code checks nodes with obsolete GCE instance id, list all pods bound to it, and delete these pods.