[v0.14.x] Referencing the drainTimeout value in the NodeDrainer daemonset #1733

HarryStericker · 2019-09-11T13:44:10Z

In a previous PR (#1722) I had attempted to stop node drainer scheduling on cordoned nodes once the timeout had elapsed (300s/5m). This is beneficial to me as otherwise nodeDrainer tries to redeploy itself onto the node after the 5 minutes have elapsed, and goes into a restart loop as kubelet is preventing things from being scheduled. When this happens our alerts go off.

This proved more difficult than planned, and I can not think of any other way to do so than placing a taint on all nodes that the daemonset does not have a toleration for. This is IMO not feasible and way too heavyweight.

We already have drainTimeout specified here in cluster.yml so it makes sense to use this value in the drain command also. This way I can increase the timeout allowing the node to drain and terminate within the allotted time.

k8s-ci-robot · 2019-09-11T13:44:17Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign davidmccormick
You can assign the PR to them by writing /assign @davidmccormick in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov-io · 2019-09-11T14:22:56Z

Codecov Report

Merging #1733 into v0.14.x will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff            @@
##           v0.14.x    #1733   +/-   ##
========================================
  Coverage    25.28%   25.28%           
========================================
  Files           98       98           
  Lines         5087     5087           
========================================
  Hits          1286     1286           
  Misses        3659     3659           
  Partials       142      142

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3c850d9...ac566f0. Read the comment docs.

dominicgunn · 2019-09-12T13:20:42Z

/lgtm

davidmccormick · 2019-09-18T15:39:00Z

Many thanks for updating both the 0.13.x and 0.14.x branches!

[v0.14.x] Utilizing the parameterized drain timeout value

ac566f0

k8s-ci-robot requested review from cknowles and davidmccormick September 11, 2019 13:44

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 11, 2019

dominicgunn added this to the v0.14.2 milestone Sep 12, 2019

k8s-ci-robot assigned dominicgunn Sep 12, 2019

k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 12, 2019

HarryStericker changed the title ~~[v0.14.x] Utilizing the parameterized drain timeout value~~ [v0.14.x] Referencing the drainTimeout value in the NodeDrainer daemonset Sep 12, 2019

davidmccormick merged commit 7ae7d36 into kubernetes-retired:v0.14.x Sep 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v0.14.x] Referencing the drainTimeout value in the NodeDrainer daemonset #1733

[v0.14.x] Referencing the drainTimeout value in the NodeDrainer daemonset #1733

HarryStericker commented Sep 11, 2019

k8s-ci-robot commented Sep 11, 2019

codecov-io commented Sep 11, 2019

dominicgunn commented Sep 12, 2019

davidmccormick commented Sep 18, 2019

[v0.14.x] Referencing the drainTimeout value in the NodeDrainer daemonset #1733

[v0.14.x] Referencing the drainTimeout value in the NodeDrainer daemonset #1733

Conversation

HarryStericker commented Sep 11, 2019

k8s-ci-robot commented Sep 11, 2019

codecov-io commented Sep 11, 2019

Codecov Report

dominicgunn commented Sep 12, 2019

davidmccormick commented Sep 18, 2019