Skip to content

Commit

Permalink
Working nodeFit feature
Browse files Browse the repository at this point in the history
  • Loading branch information
RyanDevlin committed Apr 30, 2021
1 parent 161f66a commit a5d298f
Show file tree
Hide file tree
Showing 25 changed files with 1,220 additions and 139 deletions.
32 changes: 32 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@ should include `ReplicaSet` to have pods created by Deployments excluded.
|`namespaces`|(see [namespace filtering](#namespace-filtering))|
|`thresholdPriority`|int (see [priority filtering](#priority-filtering))|
|`thresholdPriorityClassName`|string (see [priority filtering](#priority-filtering))|
|`nodeFit`|string (see [node fit filtering](#node-fit-filtering))|

**Example:**
```yaml
Expand Down Expand Up @@ -204,6 +205,7 @@ strategy evicts pods from `overutilized nodes` (those with usage above `targetTh
|`numberOfNodes`|int|
|`thresholdPriority`|int (see [priority filtering](#priority-filtering))|
|`thresholdPriorityClassName`|string (see [priority filtering](#priority-filtering))|
|`nodeFit`|string (see [node fit filtering](#node-fit-filtering))|

**Example:**

Expand Down Expand Up @@ -253,6 +255,7 @@ node.
|`thresholdPriorityClassName`|string (see [priority filtering](#priority-filtering))|
|`namespaces`|(see [namespace filtering](#namespace-filtering))|
|`labelSelector`|(see [label filtering](#label-filtering))|
|`nodeFit`|string (see [node fit filtering](#node-fit-filtering))|

**Example:**

Expand Down Expand Up @@ -291,6 +294,7 @@ podA gets evicted from nodeA.
|`thresholdPriorityClassName`|string (see [priority filtering](#priority-filtering))|
|`namespaces`|(see [namespace filtering](#namespace-filtering))|
|`labelSelector`|(see [label filtering](#label-filtering))|
|`nodeFit`|string (see [node fit filtering](#node-fit-filtering))|

**Example:**

Expand Down Expand Up @@ -320,6 +324,7 @@ and will be evicted.
|`thresholdPriorityClassName`|string (see [priority filtering](#priority-filtering))|
|`namespaces`|(see [namespace filtering](#namespace-filtering))|
|`labelSelector`|(see [label filtering](#label-filtering))|
|`nodeFit`|string (see [node fit filtering](#node-fit-filtering))|

**Example:**

Expand Down Expand Up @@ -348,6 +353,7 @@ include soft constraints.
|`thresholdPriority`|int (see [priority filtering](#priority-filtering))|
|`thresholdPriorityClassName`|string (see [priority filtering](#priority-filtering))|
|`namespaces`|(see [namespace filtering](#namespace-filtering))|
|`nodeFit`|string (see [node fit filtering](#node-fit-filtering))|

**Example:**

Expand Down Expand Up @@ -379,6 +385,7 @@ which determines whether init container restarts should be factored into that ca
|`thresholdPriority`|int (see [priority filtering](#priority-filtering))|
|`thresholdPriorityClassName`|string (see [priority filtering](#priority-filtering))|
|`namespaces`|(see [namespace filtering](#namespace-filtering))|
|`nodeFit`|string (see [node fit filtering](#node-fit-filtering))|

**Example:**

Expand Down Expand Up @@ -411,6 +418,7 @@ to `Running` and `Pending`.
|`thresholdPriorityClassName`|string (see [priority filtering](#priority-filtering))|
|`namespaces`|(see [namespace filtering](#namespace-filtering))|
|`labelSelector`|(see [label filtering](#label-filtering))|
|`nodeFit`|string (see [node fit filtering](#node-fit-filtering))|

**Example:**

Expand Down Expand Up @@ -551,6 +559,30 @@ strategies:
- {key: environment, operator: NotIn, values: [dev]}
```


### Node Fit filtering

All strategies are able to configure a `nodeFit` boolean parameter. If set to `true` the descheduler will consider whether or not the pods that meet eviction criteria will fit on other nodes before evicting them. If a pod cannot be rescheduled to another node, it will not be evicted. Currently the following criteria are considered when setting `nodeFit` to `true`:
- A `nodeSelector` on the pod
- Any `Tolerations` on the pod and any `Taints` on the other nodes
- `nodeAffinity` on the pod
- Whether any of the other nodes are marked as `unschedulable`

E.g.

```yaml
apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
"PodLifeTime":
enabled: true
params:
podLifeTime:
maxPodLifeTimeSeconds: 86400
nodeFit: true
```


## Pod Evictions

When the descheduler decides to evict pods from a node, it employs the following general mechanism:
Expand Down
Binary file added kind
Binary file not shown.
1 change: 1 addition & 0 deletions pkg/api/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ type StrategyParameters struct {
ThresholdPriority *int32
ThresholdPriorityClassName string
LabelSelector *metav1.LabelSelector
NodeFit bool
}

type Percentage float64
Expand Down
1 change: 1 addition & 0 deletions pkg/api/v1alpha1/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ type StrategyParameters struct {
ThresholdPriority *int32 `json:"thresholdPriority"`
ThresholdPriorityClassName string `json:"thresholdPriorityClassName"`
LabelSelector *metav1.LabelSelector `json:"labelSelector"`
NodeFit bool `json:"nodeFit"`
}

type Percentage float64
Expand Down
23 changes: 23 additions & 0 deletions pkg/descheduler/evictions/evictions.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ import (
"k8s.io/client-go/tools/record"
"k8s.io/klog/v2"
"sigs.k8s.io/descheduler/metrics"
nodeutil "sigs.k8s.io/descheduler/pkg/descheduler/node"
podutil "sigs.k8s.io/descheduler/pkg/descheduler/pod"
"sigs.k8s.io/descheduler/pkg/utils"

Expand All @@ -47,6 +48,7 @@ type nodePodEvictedCount map[*v1.Node]int

type PodEvictor struct {
client clientset.Interface
nodes []*v1.Node
policyGroupVersion string
dryRun bool
maxPodsToEvictPerNode int
Expand Down Expand Up @@ -74,6 +76,7 @@ func NewPodEvictor(

return &PodEvictor{
client: client,
nodes: nodes,
policyGroupVersion: policyGroupVersion,
dryRun: dryRun,
maxPodsToEvictPerNode: maxPodsToEvictPerNode,
Expand Down Expand Up @@ -164,6 +167,7 @@ func evictPod(ctx context.Context, client clientset.Interface, pod *v1.Pod, poli

type Options struct {
priority *int32
nodeFit *bool
}

// WithPriorityThreshold sets a threshold for pod's priority class.
Expand All @@ -175,6 +179,17 @@ func WithPriorityThreshold(priority int32) func(opts *Options) {
}
}

// WithNodeFit sets whether or not to consider taints, node selectors,
// and pod affinity when evicting. A pod who's tolerations, node selectors,
// and affinity match a node other than the one it is currently running on
// is evictable.
func WithNodeFit(nodeFit bool) func(opts *Options) {
return func(opts *Options) {
var a bool = nodeFit
opts.nodeFit = &a
}
}

type constraint func(pod *v1.Pod) error

type evictable struct {
Expand Down Expand Up @@ -225,6 +240,14 @@ func (pe *PodEvictor) Evictable(opts ...func(opts *Options)) *evictable {
return nil
})
}
if options.nodeFit != nil && *options.nodeFit {
ev.constraints = append(ev.constraints, func(pod *v1.Pod) error {
if !nodeutil.PodFitsAnyOtherNode(pod, pe.nodes) {
return fmt.Errorf("pod does not fit on any other node because of nodeSelector(s), Taint(s), or nodes marked as unschedulable")
}
return nil
})
}

return ev
}
Expand Down
Loading

0 comments on commit a5d298f

Please sign in to comment.