-
Notifications
You must be signed in to change notification settings - Fork 39.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeletionTimeStamp not set for some evicted pods #54525
Comments
@kubernetes/sig-api-machinery-misc |
@patrickshan: Reiterating the mentions to trigger a notification: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig node |
/assign |
Most likely this is sig-node. |
Deletion timestamp is set by the apiserver, it cannot be set by clients |
Are they actually evicted/deleted or do they just have failed status? |
They are evicted and then marked as "Failed" status. For pods created through daemonset, their DeletionTimeStamp get set after "Failed" status set. But for pods created through deployment, their DeletionTimeStamp just keep zero value and never get updated. |
That means that no one deleted them. The ReplicaSet controller is
responsible for performing that deletion.
On Oct 26, 2017, at 9:33 PM, Patrick Shan <notifications@github.com> wrote:
They are evicted and marked as "Failed" status first. For pods created
through daemonset, their DeletionTimeStamp get set after "Failed" status
set. But for pods created through deployment, their DeletionTimeStamp just
keep zero value and never set.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#54525 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_pzHzsDIc9myC2sswt7sa5hK9ger_ks5swTLwgaJpZM4QFQOk>
.
|
I reproduced this. But I found the pods were successfully evicted and deleted only from |
I see the kubelet sync loop construct a pod status like what you describe if an internal module decides the pod should be evicted: kubernetes/pkg/kubelet/kubelet_pods.go Lines 1293 to 1305 in b00c15f
The kubelet then syncs status back to the API server: kubernetes/pkg/kubelet/status/status_manager.go Lines 437 to 488 in b00c15f
But unless the pod's deletion timestamp is already set, the kubelet won't delete the pod: kubernetes/pkg/kubelet/status/status_manager.go Lines 504 to 509 in b00c15f
@kubernetes/sig-node-bugs that doesn't seem like the kubelet does a complete job of evicting the pod from the API's perspective. Would you expect the kubelet to delete a pod directly in that case or to still go through posting a pod eviction (should pod disruption budget be honored in cases where the kubelet is out of resources?) |
I think this is intentional. AFAIk, kubelet's pod eviction includes failing the pod (i.e., setting the pod status) and reclaiming the resources used by the pod on the node. There is no "deleting the pod from the apiserver" involved in the eviction. Users/controllers and check the pod status to know what happened to the pod if needed. |
Yes, this is intentional. In order for evicted pods to be inspected after eviction, we do not remove the pod API object. Otherwise it would appear that the pod simply disappeared |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
If the controller that creates the evicted pod is scaled down, it should kill those evicted pods first before killing any others, right? Most workload controllers don't do that today.
DaemonSet controller actively deletes failed pods (#40330), to ensure that DaemonSet can recover from transient errors (#36482). Evicted DaemonSet pods get killed just because they're also failed pods. /remove-lifecycle stale |
For something like StatefulSet, it's actually necessary to immediately delete any Pods evicted by kubelet, so the Pod name can be reused. As @janetkuo also mentioned, DaemonSet does this as well. For such controllers, you're thus not gaining anything from kubelet leaving the Pod record. Even for something like ReplicaSet, it probably makes the most sense for the controller to delete Pods evicted by kubelet (though it doesn't do that now, see #60162) to avoid carrying along Failed Pods indefinitely. So I would argue that in pretty much all cases, Pods with If we can agree that some controller should delete them, the only question left is which controller? I suggest that the Node controller makes the most sense: delete any
With the |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
|
@schollii I suppose we don't document the list of things that do not happen during evictions. The out-of-resource documentation says "it terminates all of its containers and transitions its PodPhase to Failed". It doesn't explicitly call out that it does not set the deletion timestamp. Some googling says you can reference evicted pods with: |
@dashpole I saw the mentions of --field-selector=status.phase=Failed but the problem there is that the "reason" is actually what says "evicted", so there could be failed pods that were not evicted. And you cannot select on status.reason, I tried. So we are left with grepping and awking the output of get pods -o wide. This needs fixing. E.g. make status.reason selectable, or have a phase called Evicted (although I doubt this is acceptable because not backwards compat). Or just have a command |
should we be explicit in setting the deletion timestamp ? |
Use There's a "podgc" controller which deletes old pods, is it not triggering for evicted pods? How many do you accumulate? Why is it problematic?
I am not sure what the contract between kubelet / scheduler / controller is for evictions. Which entity is supposed to delete the pod? I assume they are not deleted by kubelet to give signal to scheduler/controller about the lack of fit? |
Should Deployment check and delete |
Or pod GC should come in and cover this for other Resources besides StatefulSet and DaemonSet? |
Just for someone who also interested in how failed pod deleted is done in StatefulSet controller: kubernetes/pkg/controller/statefulset/stateful_set_control.go Lines 384 to 394 in c5759ab
|
/triage accepted |
@matthyx: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
According to StackOverflow:
|
This issue has not been updated in over 1 year, and should be re-triaged. You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
When a node starts to evict pods under disk pressure, the DeletionTimestamp for some evicted pods are not set properly and still have zero value. It seems that pods created through Deployment are having this issue while pods created through DaemonSet have DeletionTimestamp set properly.
What you expected to happen:
Pods created through deployment should also have DeletionTimestamp set properly.
How to reproduce it (as minimally and precisely as possible):
Write an app to watch apiserver pods related events. Deploy a debian toolbox pod on one node using Deployment. Put that node under disk pressure like using more than 90% of disk space and consume more disk space from inside the toolbox pod (you can install some packages which uses lots of disk space like gnome-core for debian).
Anything else we need to know?:
You can find some events related with pod and they only have Phase updated to "Failed" but without setting DeletionTimestamp which still has zero value.
Environment:
kubectl version
): 1.8.1uname -a
): Linux ip-10-150-64-105 4.13.3-coreos Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Wed Sep 20 22:17:11 UTC 2017 x86_64 Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz GenuineIntel GNU/LinuxThe text was updated successfully, but these errors were encountered: