Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed containers are not always waiting #54593

Merged
merged 1 commit into from
Oct 26, 2017

Conversation

dashpole
Copy link
Contributor

fixes #54499
The issue was that a container that is removed (during pod deletion, for example), is assumed to be in a "waiting" state.
Instead, we should use the previous container state.
Fetching the most recent status is required to ensure that we accurately reflect the previous state. The status attached to a pod object is often stale.

I verified this by looking through the kubelet logs during a deletion, and verifying that the status updates do not transition from terminated -> pending.

cc @kubernetes/sig-node-bugs @sjenning @smarterclayton @derekwaynecarr @dchen1107

Fix an issue where pods were briefly transitioned to a "Pending" state during the deletion process.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 25, 2017
@dashpole
Copy link
Contributor Author

/assign @dchen1107 @sjenning

@dashpole
Copy link
Contributor Author

/unassign @feiskyer

@dashpole dashpole changed the title Removed containers are not waiting Removed containers are not always waiting Oct 25, 2017
@derekwaynecarr
Copy link
Member

commented on the issue where i meant the pr...

can we also do the following:
"it would be good if syncPod in status manager protected against an invalid transfer of phase..."

prior to calling updateStatus? if it detects an invalid state transition, panic?

@derekwaynecarr
Copy link
Member

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 26, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 26, 2017
@sjenning
Copy link
Contributor

confirmed fix, lgtm

@k8s-cherrypick-bot
Copy link

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 54597, 54593, 54081, 54271, 54600). If you want to cherry-pick this change to another branch, please follow the instructions here.

k8s-github-robot pushed a commit that referenced this pull request Oct 26, 2017
Automatic merge from submit-queue (batch tested with PRs 54597, 54593, 54081, 54271, 54600). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubelet: check for illegal container state transition

supersedes #54530

Puts a state transition check in the kubelet status manager to detect and block illegal transitions; namely from terminated to non-terminated.

@smarterclayton @derekwaynecarr @dashpole @joelsmith @frobware

I confirmed that the reproducer in #54499 does not work with this check in place. The erroneous kubelet status update is rejected:

```
status_manager.go:301] Status update on pod default/test aborted: terminated container test-container attempted illegal transition to non-terminated state
```

After fix #54593, I do not see the message with the above mentioned reproducer.
@joelsmith
Copy link
Contributor

I think k8s-merge-robot misinterpreted @sjenning's intent and closed this when #54597 merged. Could you re-open this @dashpole?

@smarterclayton
Copy link
Contributor

can we also do the following: "it would be good if syncPod in status manager protected against an invalid transfer of phase..."

I would like to also potentially guard or flag an error on the API server. If we prevent the transition, we could break components in unexpected ways, but if we don't prevent the transition, we can break EVERY workload controller.

@sjenning
Copy link
Contributor

@smarterclayton

I would like to also potentially guard or flag an error on the API server. If we prevent the transition, we could break components in unexpected ways, but if we don't prevent the transition, we can break EVERY workload controller.

I don't have a problem reopening #54530 (actually have to create a new PR now because branch changed 😞)

Did you see @dchen1107 concern there?
#54530 (comment)

@derekwaynecarr
Copy link
Member

i think @dchen1107 concern was not that validation was bad, but it alone was not sufficient as the kubelet would still have been unhappy. now that we have the kubelet checking, the fix identified, adding another guard on the apiserver itself also seems fine to me.

openshift-merge-robot added a commit to openshift/origin that referenced this pull request Oct 26, 2017
Automatic merge from submit-queue.

UPSTREAM: 54593: Removed containers are not always waiting

xref kubernetes/kubernetes#54593

fixes #17011
@dchen1107
Copy link
Member

+1 on #54593 (comment) Once validation code is included, both API server and Kubelet will perform the same checking.

@dchen1107
Copy link
Member

/lgtm

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dashpole, dchen1107, derekwaynecarr

Associated issue: 54499

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 54593, 54607, 54539, 54105). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit ccd1703 into kubernetes:master Oct 26, 2017
@dashpole dashpole deleted the fix_pending branch October 26, 2017 18:14
@jpbetz jpbetz added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. and removed cherrypick-candidate labels Oct 30, 2017
k8s-github-robot pushed a commit that referenced this pull request Nov 2, 2017
…93-upstream-release-1.8

Automatic merge from submit-queue.

Automated cherry pick of #54593

Cherry pick of #54593 on release-1.8.

#54593: fix #54499. Removed containers are not waiting

```release-note
Fix an issue where pods were briefly transitioned to a "Pending" state during the deletion process.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pods moving from Succeeded to Pending