Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit events on 'Failed' daemon pods #40720

Merged
merged 2 commits into from
Feb 2, 2017

Conversation

janetkuo
Copy link
Member

@janetkuo janetkuo commented Jan 31, 2017

Follow up #40330 @erictune @mikedanese @Kargakis @lukaszo @kubernetes/sig-apps-bugs

@janetkuo janetkuo added the release-note-none Denotes a PR that doesn't merit a release note. label Jan 31, 2017
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 31, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-github-robot k8s-github-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jan 31, 2017
msg := fmt.Sprintf("Found failed daemon pod %s/%s on node %s, will try to kill it", pod.Namespace, node.Name, pod.Name)
glog.V(2).Infof(msg)
// Emit an event so that it's discoverable to users.
dsc.eventRecorder.Eventf(ds, v1.EventTypeWarning, "FailedDaemonPod", msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure this pod is not already marked for deletion (pod.DeletionTimestamp == nil) so that we won't end up with duplicate events.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 1, 2017
@janetkuo
Copy link
Member Author

janetkuo commented Feb 1, 2017

@Kargakis after adding the check for DeletionTimestamp I started thinking that maybe it's better to filter active pods (i.e. not failed, succeeded, or terminating) earlier when generating the node to daemon pods map (dsc.getNodesToDaemonPods). We may not even need to delete failed pods, since they won't be counted as running daemon pods, and a failed daemon pod won't block future daemon pod creation on the same node. WDYT?

@0xmichalis
Copy link
Contributor

@Kargakis after adding the check for DeletionTimestamp I started thinking that maybe it's better to filter active pods (i.e. not failed, succeeded, or terminating) earlier when generating the node to daemon pods map (dsc.getNodesToDaemonPods). We may not even need to delete failed pods, since they won't be counted as running daemon pods, and a failed daemon pod won't block future daemon pod creation on the same node. WDYT?

How are those going to be cleaned up?

@janetkuo
Copy link
Member Author

janetkuo commented Feb 1, 2017

How are those going to be cleaned up?

We can either not clean them up and let users to observe them and manually delete them, or we can clean them up for the users periodically (separate this clean up loop from the shouldSchedule-shouldContinueRunning switch case).

@mikedanese
Copy link
Member

/approve

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

The following people have approved this PR: janetkuo, mikedanese

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 1, 2017
@0xmichalis
Copy link
Contributor

We can either not clean them up and let users to observe them and manually delete them, or we can clean them up for the users periodically (separate this clean up loop from the shouldSchedule-shouldContinueRunning switch case).

I don't think letting admins clean up is an option and periodically killing them seems more complicated than what you have here. I am lgtming this, we can discuss further if you think what you have here is not good enough.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 2, 2017
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 40556, 40720)

@k8s-github-robot k8s-github-robot merged commit 4ecd52b into kubernetes:master Feb 2, 2017
@mikedanese
Copy link
Member

Please squash and ask for squashes of fix-up commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants