-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add MachineDrainRule "WaitCompleted" #11545
base: main
Are you sure you want to change the base?
Conversation
bab1c1b
to
211839e
Compare
@@ -498,6 +507,10 @@ func (r EvictionResult) ConditionMessage(nodeDrainStartTime *metav1.Time) string | |||
conditionMessage = fmt.Sprintf("%s\nAfter above Pods have been removed from the Node, the following Pods will be evicted: %s", | |||
conditionMessage, PodListToString(r.PodsToTriggerEvictionLater, 3)) | |||
} | |||
if len(r.PodsToWaitCompleted) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabriziopandini Is this something that should be specifically handled on higher levels when the condition bubbles up? (like we handled PDB's etc.)
(could also be a follow-up PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct, we have to handle the new use case in:
-
cluster-api/internal/controllers/machine/machine_controller_status.go
Lines 702 to 716 in d8fd3b5
if deletingCondition.Reason == clusterv1.MachineDeletingDrainingNodeV1Beta2Reason && time.Since(machine.Status.Deletion.NodeDrainStartTime.Time) > 5*time.Minute { | |
delayReasons := []string{} | |
if strings.Contains(deletingCondition.Message, "cannot evict pod as it would violate the pod's disruption budget.") { | |
delayReasons = append(delayReasons, "PodDisruptionBudgets") | |
} | |
if strings.Contains(deletingCondition.Message, "deletionTimestamp set, but still not removed from the Node") { | |
delayReasons = append(delayReasons, "Pods not terminating") | |
} | |
if strings.Contains(deletingCondition.Message, "failed to evict Pod") { | |
delayReasons = append(delayReasons, "Pod eviction errors") | |
} | |
if len(delayReasons) > 0 { | |
msg += fmt.Sprintf(", delay likely due to %s", strings.Join(delayReasons, ", ")) | |
} | |
} |
cluster-api/internal/controllers/machineset/machineset_controller_status.go
Lines 442 to 462 in c333db6
for _, machine := range machines { if !machine.GetDeletionTimestamp().IsZero() && time.Since(machine.GetDeletionTimestamp().Time) > time.Minute*15 { machineNames = append(machineNames, machine.GetName()) deletingCondition := v1beta2conditions.Get(machine, clusterv1.MachineDeletingV1Beta2Condition) if deletingCondition != nil && deletingCondition.Status == metav1.ConditionTrue && deletingCondition.Reason == clusterv1.MachineDeletingDrainingNodeV1Beta2Reason && machine.Status.Deletion != nil && time.Since(machine.Status.Deletion.NodeDrainStartTime.Time) > 5*time.Minute { if strings.Contains(deletingCondition.Message, "cannot evict pod as it would violate the pod's disruption budget.") { delayReasons.Insert("PodDisruptionBudgets") } if strings.Contains(deletingCondition.Message, "deletionTimestamp set, but still not removed from the Node") { delayReasons.Insert("Pods not terminating") } if strings.Contains(deletingCondition.Message, "failed to evict Pod") { delayReasons.Insert("Pod eviction errors") } } } }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabriziopandini This PR or follow-up? I lean towards follow-up, mostly just to not expand the scope of this PR further
211839e
to
533803e
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
8826d5b
to
e22bc4d
Compare
/test ? |
@vincepri: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test pull-cluster-api-e2e-main |
e22bc4d
to
1ebe3c2
Compare
/test pull-cluster-api-e2e-main |
1ebe3c2
to
ba555de
Compare
/test pull-cluster-api-e2e-main |
ba555de
to
6650be9
Compare
Signed-off-by: Vince Prignano <vince@prigna.com>
6650be9
to
dbf82f8
Compare
/test pull-cluster-api-e2e-main |
@vincepri: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
// "Drain" means that the Pods to which this MachineDrainRule applies will be drained. | ||
// If behavior is set to "Drain" the order in which Pods are drained can be configured | ||
// with the order field. When draining Pods of a Node the Pods will be grouped by order | ||
// and one group after another will be drained (by increasing order). Cluster API will | ||
// wait until all Pods of a group are terminated / removed from the Node before starting | ||
// with the next group. | ||
// "Skip" means that the Pods to which this MachineDrainRule applies will be skipped during drain. | ||
// "WaitCompleted" means that the pods to which this MachineDrainRule applies will never be evicted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a sentence that we assume Order 0 and that Order cannot be configured if behavior is WaitCompleted?
@@ -147,8 +147,8 @@ metadata: | |||
namespace: default | |||
spec: | |||
drain: | |||
# behavior can be Drain or Skip | |||
behavior: Drain | |||
# behavior can be Drain, Skip, or WaitCompleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's please add an additional user story above that covers WaitCompleted
@@ -259,6 +260,8 @@ The following describes the new algorithm that decides which Pods should be drai | |||
* \=\> use `behavior: Skip` | |||
* If the Pod has `cluster.x-k8s.io/drain: skip` | |||
* \=\> use `behavior: Skip` | |||
* If the Pod has `cluster.x-k8s.io/drain: wait-completed` | |||
* \=\> use `behavior: WaitCompleted` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* \=\> use `behavior: WaitCompleted` | |
* \=\> use `behavior: WaitCompleted` and `order: 0` |
@@ -498,6 +507,10 @@ func (r EvictionResult) ConditionMessage(nodeDrainStartTime *metav1.Time) string | |||
conditionMessage = fmt.Sprintf("%s\nAfter above Pods have been removed from the Node, the following Pods will be evicted: %s", | |||
conditionMessage, PodListToString(r.PodsToTriggerEvictionLater, 3)) | |||
} | |||
if len(r.PodsToWaitCompleted) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabriziopandini This PR or follow-up? I lean towards follow-up, mostly just to not expand the scope of this PR further
@@ -498,6 +512,10 @@ func (r EvictionResult) ConditionMessage(nodeDrainStartTime *metav1.Time) string | |||
conditionMessage = fmt.Sprintf("%s\nAfter above Pods have been removed from the Node, the following Pods will be evicted: %s", | |||
conditionMessage, PodListToString(r.PodsToTriggerEvictionLater, 3)) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should move the if for PodsToWaitCompleted above the one for PodsToTriggerEvictionLater
The message for PodsToTriggerEvictionLater contains "After above Pods ..." and this only makes sense if a potential "Waiting for the following Pods..." message is actually above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm okay this is a bit more complicated.
The idea is basicallly that this condition message surfaces where the drain is currently stuck (and which Pods we are going to evict next).
The problem with the current behavior is that "WaitCompleted"-Pods are sometimes blocking further progress and sometimes they don't (it depends on the order of the Pods that we have to evict).
I would suggest to treat WaitCompleted Pods similar to PodsToTriggerEviction.
This means splitting them up into podsToWaitCompletedNow
and podsToWaitCompletedLater
in EvictPods
(based on if ptr.Deref(pod.Status.DrainOrder, 0) == minDrainOrder
)
I would then also update the log statement in l.303 accordingly.
Once that is done we can also write the condition message here accordingly. I would suggest:
if len(r.PodsToWaitCompletedNow) > 0 {
kind := "Pod"
if len(r.PodsToWaitCompletedNow) > 1 {
kind = "Pods"
}
// Note: the code computing stale warning for the machine deleting condition is making assumptions on the format/content of this message.
// Same applies for other conditions where deleting is involved, e.g. MachineSet's Deleting and ScalingDown condition.
conditionMessage = fmt.Sprintf("%s\n* %s %s: waiting for completion",
conditionMessage, kind, PodListToString(r.PodsDeletionTimestampSet, 3))
}
if len(r.PodsToTriggerEvictionLater) > 0 {
conditionMessage = fmt.Sprintf("%s\nAfter above Pods have been removed from the Node, the following Pods will be evicted: %s",
conditionMessage, PodListToString(r.PodsToTriggerEvictionLater, 3))
}
if len(r.PodsToWaitCompletedLater) > 0 {
conditionMessage = fmt.Sprintf("%s\nAfter above Pods have been removed from the Node, waiting for the following Pods to complete without eviction: %s",
conditionMessage, PodListToString(r.PodsToWaitCompletedLater, 3))
}
Namespace: "unevictable-workload", | ||
NodeSelector: map[string]string{nodeOwnerLabelKey: "KubeadmControlPlane-" + controlplane.Name}, | ||
ModifyDeployment: func(deployment *appsv1.Deployment) { | ||
deployment.Spec.Template.Labels["cluster.x-k8s.io/drain"] = string(clusterv1.MachineDrainRuleDrainBehaviorWaitCompleted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deployment.Spec.Template.Labels["cluster.x-k8s.io/drain"] = string(clusterv1.MachineDrainRuleDrainBehaviorWaitCompleted) | |
deployment.Spec.Template.Labels["cluster.x-k8s.io/drain"] = "wait-completed" |
nit: I know both works, but maybe let's test the one that we document in the API (same in l.256)
@@ -207,7 +207,7 @@ func NodeDrainTimeoutSpec(ctx context.Context, inputGetter func() NodeDrainTimeo | |||
} | |||
|
|||
By("Deploy Deployment with unevictable Pods on control plane and MachineDeployment Nodes.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the test docs in l.79++
This should be
// NodeDrainTimeoutSpec goes through the following steps:
// * Create cluster with 3 CP & 1 worker Machine
// * Ensure Node label is set & NodeDrainTimeout is set to 0 (wait forever)
// * Deploy MachineDrainRules
// * Deploy Deployment with unevictable Pods on CP & MD Nodes
// * Deploy Deployment with unevictable Pods with `wait-completed` label on CP & MD Nodes
// * Deploy Deployment with evictable Pods with finalizer on CP & MD Nodes
// * Deploy additional resources if defined in input
// * Trigger Node drain by scaling down the control plane to 1 and MachineDeployments to 0
// * Get draining control plane and MachineDeployment Machines
// * Verify drain of Deployments with order 1
// * Verify drain of Deployments with order 5
// * Verify skipped Pods are still there and don't have a deletionTimestamp
// * Verify wait-completed Pods are still there and don't have a deletionTimestamp
// * Verify Node drains for control plane and MachineDeployment Machines are blocked (by PDBs and WaitCompleted Pods)
// * Set NodeDrainTimeout to 1s to unblock Node drain
// * Verify machine deletion is blocked by waiting for volume detachment (only if VerifyNodeVolumeDetach is enabled)
// * Unblocks waiting for volume detachment (only if VerifyNodeVolumeDetach is enabled)
// * Verify scale down succeeded because Node drains were unblocked.
(we'll have to change the order values, see comment below)
Expect(skippedMDPods.Items).To(HaveLen(1)) | ||
Expect(skippedMDPods.Items[0].DeletionTimestamp.IsZero()).To(BeTrue()) | ||
} | ||
|
||
By("Verify Node drains for control plane and MachineDeployment Machines are blocked (only by PDBs)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By("Verify Node drains for control plane and MachineDeployment Machines are blocked (only by PDBs)") | |
By("Verify Node drains for control plane and MachineDeployment Machines are blocked (by PDBs and WaitCompleted Pods)") |
@@ -233,6 +233,31 @@ func NodeDrainTimeoutSpec(ctx context.Context, inputGetter func() NodeDrainTimeo | |||
WaitForDeploymentAvailableInterval: input.E2EConfig.GetIntervals(specName, "wait-deployment-available"), | |||
}) | |||
} | |||
By("Deploy an unevictable Deployment with `wait-completed` label.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in general the test can work roughly as is.
But as we are now assuming order 0 for WaitCompleted Pods we have to change the order of some other Pods to make sure they can be drained (currently the drain of these Pods is blocked by the waitcompleted Pods).
tl;dr I think the Pods with drain order 1 should get order -5, the Pods with drain order 5 should get order -1
Then all of these Pods can be evicted / drained
|
||
// DeployPodAndWait will deploy a Deployment on a ControlPlane or MachineDeployment. | ||
func DeployPodAndWait(ctx context.Context, input DeployPodAndWaitInput) { | ||
Expect(input.DeploymentName).ToNot(BeNil(), "Need a deployment name in DeployUnevictablePod") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expect(input.DeploymentName).ToNot(BeNil(), "Need a deployment name in DeployUnevictablePod") | |
Expect(input.DeploymentName).ToNot(BeNil(), "Need a deployment name in DeployPodAndWait") |
nit, same in the few lines below
What this PR does / why we need it:
This PR adds the ability for drain to wait for the completion of specific pods. This is useful in scenario where drain is either handled outside the context of
kubectl drain
after a Node is cordoned, or for long running batch Jobs that should be allowed to terminate on their own.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #