Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove graceful node shutdown e2e job #24249

Merged

Conversation

bobbypage
Copy link
Member

The graceful shutdown feature is now beta and the test is running under
existing node-kubelet-serial job. As a result, the dedicated graceful
node shutdown e2e job is no longer necessary. Followup to
#24154

The graceful shutdown feature is now beta and the test is running under
existing node-kubelet-serial job. As a result, the dedicated graceful
node shutdown e2e job is no longer necessary. Followup to
kubernetes#24154
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 4, 2021
@bobbypage
Copy link
Member Author

/assign @SergeyKanzhelev

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/config Issues or PRs related to code in /config area/jobs sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Nov 4, 2021
@k8s-ci-robot k8s-ci-robot requested review from karan and MHBauer November 4, 2021 21:35
@bobbypage
Copy link
Member Author

/cc @wzshiming

@SergeyKanzhelev
Copy link
Member

Do we know how disruptive for the kubelet the dbus notification? Is the expectation that the kubelet must keep working after cancelled shutdown? May it affect tests that run after it? Should we somehow cleanup after the test passed, like maybe restart the kubelet?

@SergeyKanzhelev
Copy link
Member

/triage accepted
/priority backlog
/kind cleanup

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/backlog Higher priority than priority/awaiting-more-evidence. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Nov 4, 2021
@bobbypage
Copy link
Member Author

bobbypage commented Nov 4, 2021

Do we know how disruptive for the kubelet the dbus notification?

The dbus notification will trigger to the kubelet to shutdown all of the pods. However since the the test is running in serial, it should be graceful shutdown test which receives the signal (and only affects that test).

Is the expectation that the kubelet must keep working after cancelled shutdown?

Yes, it is the expectation. After cancelled shutdown we expect kubelet to back to NodeReady and test waits for that condition to be true.

https://github.com/kubernetes/kubernetes/blob/508e67937e3644ae7a844bf572cde5ef29543690/test/e2e_node/node_shutdown_linux_test.go#L173-L176

May it affect tests that run after it? Should we somehow cleanup after the test passed, like maybe restart the kubelet?

I don't expect that it should affect tests running after it assuming that tests passes and the node gets back into ReadyCondition. In fact it's running in serial right now and everything seems to be fine. https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-serial

The one consideration is that if for some reason we sent dbus signal and the node went to not ready and did not recover as expected then kubelet would be left in a strange state. So maybe to be safe, it may make sense to always restart kubelet at the end of the test. Perhaps it makes sense for every serial test to do that so every serial test can start with "fresh" kubelet :)

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want to universally restart kubelet after serial test. Let's not block this PR on it.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 4, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobbypage, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 4, 2021
@k8s-ci-robot k8s-ci-robot merged commit 95c9925 into kubernetes:master Nov 4, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Nov 4, 2021
@k8s-ci-robot
Copy link
Contributor

@bobbypage: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

  • key node-kubelet.yaml using file config/jobs/kubernetes/sig-node/node-kubelet.yaml

In response to this:

The graceful shutdown feature is now beta and the test is running under
existing node-kubelet-serial job. As a result, the dedicated graceful
node shutdown e2e job is no longer necessary. Followup to
#24154

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants