NodeController should add NoSchedule taints and we should get rid of getNodeConditionPredicate() #42001

davidopp · 2017-02-23T20:26:17Z

getNodeConditionPredicate() in plugin/pkg/scheduler/factory/factory.go makes our code hard to understand because it hides the node-condition-based filtering in the node lister, which is totally non-obvious.

We should get rid of this function and have NodeController add NoSchedule taints for these situations instead. (Alas, we might not actually be able to get rid of getNodeConditionPredicate() completely until we get rid of the Unschedulable field of PodSpec, but at least we can get rid of all the other code in this function.) If there are pods that should still be able to schedule in any of these situations (e.g. DaemonSet pods?) we should add tolerations in admission control for them (e.g. see pkg/controller/daemon/daemoncontroller.go).

cc/ @kubernetes/sig-scheduling-misc @kubernetes/sig-cluster-lifecycle-misc
cc/ @gmarek @kevin-wangzefeng

Sub-tasks according to the design doc:

Add node taints label and feature flag (Task 0: Added node taints labels and feature flags #49547)
In the node controller, taint Nodes according to the Node Condition (Task 1: Tainted node by condition. #49257)
In DaemonSet, update DaemonSetController to Tolerant new Taints (Task 2: Added toleration to DaemonSet pods for node condition taints #50186)
In admissionController, add MemoryPressure/DiskPressure toleration for no BestEffort pod (Task 3: Add MemoryPressure toleration for no BestEffort pod. #50180)
In scheduler, disable the current behaviour of filtering out Nodes. Instead, pods will not be scheduled to tainted nodes if a toleration does not exist in its PodSpec (Task 4: Ignored node condition predicates if TaintsByCondition enabled. #50185)
Add e2e test for this feature (Apply algorithm in scheduler by feature gates. #52723)
Update doc for this feature (Add documentation for TaintNodesByCondition website#5352)

resouer · 2017-02-25T15:27:15Z

until we get rid of the Unschedulable field of PodSpec

I think you are referring Unschedulable field of NodeSpec?

davidopp · 2017-02-25T20:10:18Z

I think you are referring Unschedulable field of NodeSpec?

Yes sorry, that was a typo.

gmarek · 2017-02-27T18:36:56Z

@resouer - please CC me to all PRs

bgrant0607 · 2017-03-31T20:55:35Z

See also #29178

davidopp · 2017-04-09T20:58:22Z

Note that memory and disk pressure are checked in separate predicates, not in the NodeConditionPredicate. These should be represented as NoSchedule taints too.

kubernetes/plugin/pkg/scheduler/algorithm/predicates/predicates.go

Line 1239 in e7dfdd5

 func CheckNodeMemoryPressurePredicate(pod *v1.Pod, meta interface{}, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) { 

davidopp · 2017-06-22T05:57:17Z

Setting NoSchedule taints corresponding to the node conditions that today cause pods not to schedule onto a node, should also allow us to use tolerations rather than special-case code in the DaemonSet controller (and thereby make it easier to have DaemonSet pods be scheduled by the default scheduler.)

davidopp · 2017-06-26T05:59:30Z

ref/ #42002

k82cn · 2017-07-18T07:56:34Z

~~Sub-tasks according to the design doc, I'll create PR for them accordingly:~~

The task list was moved to the issue description here.

wanghaoran1988 · 2017-07-28T14:16:00Z

/cc @mdshuai

Automatic merge from submit-queue (batch tested with PRs 49870, 49416, 49872, 49892, 49908) Renamed zoneNotReadyOrUnreachableTainer to zoneNoExecuteTainer. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: partially fixes #42001 **Release note**: ```release-note None ```

k82cn · 2017-08-02T02:58:52Z

/reopen

k8s-ci-robot · 2017-08-02T02:58:53Z

@k82cn: you can't re-open an issue/PR unless you authored it or you are assigned to it.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k82cn · 2017-08-02T02:59:04Z

/assign

gmarek · 2017-08-04T12:16:31Z

We won't, but it'll take time for us to get there. We'll roll out Tainting as alpha (obviously), later move it to beta and GA in 1.10 at the earliest. We can't remove Condition checking logic before that.

dsc is a daemon set right? New taints are NoSchedule only, so running Pods won't be affected (at least on the control plane level). Kubelet can still decide to evict them based on problem it observes.

yastij · 2017-08-04T13:08:53Z

@gmarek - seems logic for the timeline.

From @k82cn's sub-tasks

In DaemonSet, update DaemonSetController to Tolerant new Taints

Since new Taints are NoSchedule Only, Daemon set shouldn't have toleration for these ?

It should instead respect the new taints when applied and not schedule on these nodes, or I'm I missing something ? @lukaszo

k82cn · 2017-08-04T13:19:22Z

The task for DS in my mind:

For all DS pods, it should tolerant MemoryPressure & DiskPressure
For critical DS pods, it should tolerant OutOfDisk

And if the taint by condition feature is enabled, we should check OutOfDisk by taints instead of predicates; the refactor is trying to make this simpler :).

yastij · 2017-08-04T13:31:22Z

@k82cn - You're right, I'll send a PR after yours get merged :)

jamiehannaford · 2017-08-04T15:33:00Z

@k82cn Do we have a to do list so folks can help out with kubernetes/community#819? I'd like to help when I get some time.

@gmarek I have a question about timeline. If alpha makes it into v1.8, will I be able to schedule to NotReady nodes (no CNI) if I reference the alpha tolerations? Or do you mean it won't be functional at all until v1.10?

gmarek · 2017-08-04T15:48:38Z

It means you'll need to flip feature gate flag for it to work.

Automatic merge from submit-queue (batch tested with PRs 50119, 48366, 47181, 41611, 49547) Task 0: Added node taints labels and feature flags **What this PR does / why we need it**: Added node taint const for node condition. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001 **Release note**: ```release-note None ```

k82cn · 2017-08-04T23:16:39Z

@jamiehannaford , here is the task list: #42001 (comment)

I'm working on related PRs for them; please ping me when you got time :).

derekwaynecarr · 2017-08-09T05:39:03Z

@k82cn - i anticipate we will also add a CPUPressure condition which will also mean that no BestEffort pod can be scheduled to that node if the kubelet is running with a static cpu assignment policy.

/cc @sjenning

Automatic merge from submit-queue Task 2: Added toleration to DaemonSet pods for node condition taints **What this PR does / why we need it**: If TaintByCondition was enabled, added toleration to DaemonSet pods for node condition taints. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001 **Release note**: ```release-note None ```

Automatic merge from submit-queue Task 3: Add MemoryPressure toleration for no BestEffort pod. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001 **Release note**: ```release-note After 1.8, admission controller will add 'MemoryPressure' toleration to Guaranteed and Burstable pods. ```

Automatic merge from submit-queue (batch tested with PRs 51228, 50185, 50940, 51544, 51543) Task 4: Ignored node condition predicates if TaintsByCondition enabled. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of kubernetes#42001 **Release note**: ```release-note None ```

Automatic merge from submit-queue (batch tested with PRs 51574, 51534, 49257, 44680, 48836) Task 1: Tainted node by condition. **What this PR does / why we need it**: Tainted node by condition for MemoryPressure, OutOfDisk and so on. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001 **Release note**: ```release-note Tainted nodes by conditions as following: * 'node.kubernetes.io/network-unavailable=:NoSchedule' if NetworkUnavailable is true * 'node.kubernetes.io/disk-pressure=:NoSchedule' if DiskPressure is true * 'node.kubernetes.io/memory-pressure=:NoSchedule' if MemoryPressure is true * 'node.kubernetes.io/out-of-disk=:NoSchedule' if OutOfDisk is true ```

Automatic merge from submit-queue (batch tested with PRs 52723, 53271). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Apply algorithm in scheduler by feature gates. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001 **Release note**: ```release-note Apply algorithm in scheduler by feature gates. ```

Automatic merge from submit-queue (batch tested with PRs 53278, 53184). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Added integration test for TaintNodeByCondition. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001 **Release note**: ```release-note Added integration test for TaintNodeByCondition. ```

fejta-bot · 2018-01-06T11:32:35Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

gmarek · 2018-01-09T09:05:01Z

I think it's done (as alpha). @davidopp - do we want to keep this issue open?

bsalamat · 2018-01-12T02:26:05Z

We can close this.

davidopp added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Feb 23, 2017

davidopp mentioned this issue Feb 23, 2017

DaemonSet pods should be scheduled by default scheduler, not DaemonSet controller #42002

Closed

davidopp added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Feb 23, 2017

resouer self-assigned this Feb 25, 2017

resouer mentioned this issue Mar 2, 2017

Use NoSchedule taint in Node controller instead of filter node in scheduler #42406

Closed

davidopp mentioned this issue Mar 30, 2017

kubeadm 1.6.0 (only 1.6.0!!) is broken due to unconfigured CNI making kubelet NotReady #43815

Closed

wanghaoran1988 mentioned this issue May 15, 2017

Support scheduling tolerating workloads on NotReady Nodes #45717

Closed

davidopp mentioned this issue Jul 5, 2017

Make "kubectl drain" use taint instead of Unschedulable #44944

Closed

k82cn mentioned this issue Jul 18, 2017

Desig doc of 'Taint node by condition'. kubernetes/community#819

Merged

This was referenced Jul 20, 2017

Task 1: Tainted node by condition. #49257

Merged

Task 0: Added node taints labels and feature flags #49547

Merged

This was referenced Jul 30, 2017

DaemonSet should respect node status like mem and disk pressure #49784

Closed

Renamed zoneNotReadyOrUnreachableTainer to zoneNoExecuteTainer. #49870

Merged

resouer removed their assignment Aug 1, 2017

k82cn mentioned this issue Aug 1, 2017

Moved node condition check into Predicats. #49932

Merged

k8s-github-robot closed this as completed in #49870 Aug 2, 2017

k8s-ci-robot assigned k82cn Aug 2, 2017

This was referenced Aug 5, 2017

Task 3: Add MemoryPressure toleration for no BestEffort pod. #50180

Merged

Task 4: Ignored node condition predicates if TaintsByCondition enabled. #50185

Merged

Task 2: Added toleration to DaemonSet pods for node condition taints #50186

Merged

sjenning mentioned this issue Aug 9, 2017

CPU manager phase 1 #49186

Closed

6 tasks

k82cn mentioned this issue Sep 19, 2017

Apply algorithm in scheduler by feature gates. #52723

Merged

k82cn mentioned this issue Sep 28, 2017

Added integration test for TaintNodeByCondition. #53184

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2018

gmarek closed this as completed Jan 15, 2018

k82cn mentioned this issue Apr 10, 2018

Taint vs. NodeCondition #62311

Closed

guevara mentioned this issue Apr 29, 2019

详解 Kubernetes DaemonSet 的实现原理 guevara/read-it-later#3507

Open

yf2008 mentioned this issue May 8, 2020

详解 Kubernetes DaemonSet 的实现原理 yf2008/duty-machine#132

Closed

rbjorklin mentioned this issue Mar 4, 2022

node conditions update correctly but taints never apply to the node kubernetes/node-problem-detector#640

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NodeController should add NoSchedule taints and we should get rid of getNodeConditionPredicate() #42001

NodeController should add NoSchedule taints and we should get rid of getNodeConditionPredicate() #42001

davidopp commented Feb 23, 2017 •

edited by k82cn

Loading

resouer commented Feb 25, 2017

davidopp commented Feb 25, 2017

gmarek commented Feb 27, 2017

bgrant0607 commented Mar 31, 2017

davidopp commented Apr 9, 2017

davidopp commented Jun 22, 2017

davidopp commented Jun 26, 2017

k82cn commented Jul 18, 2017 •

edited

Loading

wanghaoran1988 commented Jul 28, 2017 •

edited

Loading

k82cn commented Aug 2, 2017

k8s-ci-robot commented Aug 2, 2017

k82cn commented Aug 2, 2017

gmarek commented Aug 4, 2017

yastij commented Aug 4, 2017 •

edited

Loading

k82cn commented Aug 4, 2017

yastij commented Aug 4, 2017

jamiehannaford commented Aug 4, 2017

gmarek commented Aug 4, 2017

k82cn commented Aug 4, 2017

derekwaynecarr commented Aug 9, 2017

fejta-bot commented Jan 6, 2018

gmarek commented Jan 9, 2018

bsalamat commented Jan 12, 2018

NodeController should add NoSchedule taints and we should get rid of getNodeConditionPredicate() #42001

NodeController should add NoSchedule taints and we should get rid of getNodeConditionPredicate() #42001

Comments

davidopp commented Feb 23, 2017 • edited by k82cn Loading

resouer commented Feb 25, 2017

davidopp commented Feb 25, 2017

gmarek commented Feb 27, 2017

bgrant0607 commented Mar 31, 2017

davidopp commented Apr 9, 2017

davidopp commented Jun 22, 2017

davidopp commented Jun 26, 2017

k82cn commented Jul 18, 2017 • edited Loading

wanghaoran1988 commented Jul 28, 2017 • edited Loading

k82cn commented Aug 2, 2017

k8s-ci-robot commented Aug 2, 2017

k82cn commented Aug 2, 2017

gmarek commented Aug 4, 2017

yastij commented Aug 4, 2017 • edited Loading

k82cn commented Aug 4, 2017

yastij commented Aug 4, 2017

jamiehannaford commented Aug 4, 2017

gmarek commented Aug 4, 2017

k82cn commented Aug 4, 2017

derekwaynecarr commented Aug 9, 2017

fejta-bot commented Jan 6, 2018

gmarek commented Jan 9, 2018

bsalamat commented Jan 12, 2018

davidopp commented Feb 23, 2017 •

edited by k82cn

Loading

k82cn commented Jul 18, 2017 •

edited

Loading

wanghaoran1988 commented Jul 28, 2017 •

edited

Loading

yastij commented Aug 4, 2017 •

edited

Loading