RemovePodsViolatingNodeAffinity should not rely on node capacity when check if pod fits on current node #873

seleznev · 2022-07-07T07:40:26Z

What version of descheduler are you using?

descheduler version: 0.24.1

Does this issue reproduce with the latest release?

Yes.

Which descheduler CLI options are you using?

- --policy-config-file
- /policy-dir/policy.yaml
- --v
- "4"

Please provide a copy of your descheduler policy config file

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
evictLocalStoragePods: true
ignorePvcPods: true
maxNoOfPodsToEvictPerNamespace: 1
maxNoOfPodsToEvictPerNode: 1
strategies:
  LowNodeUtilization:
    enabled: false
    params:
      nodeResourceUtilizationThresholds:
        targetThresholds:
          cpu: 50
          memory: 50
          pods: 50
        thresholds:
          cpu: 20
          memory: 20
          pods: 20
  RemoveDuplicates:
    enabled: false
  RemovePodsViolatingInterPodAntiAffinity:
    enabled: false
  RemovePodsViolatingNodeAffinity:
    enabled: true
    params:
      nodeAffinityType:
      - requiredDuringSchedulingIgnoredDuringExecution
  RemovePodsViolatingNodeTaints:
    enabled: false

What k8s version are you using (kubectl version)?

kubectl version Output

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.14", GitCommit:"57a3aa3f13699cf3db9c52d228c18db94fa81876", GitTreeState:"clean", BuildDate:"2021-12-15T14:52:33Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.14", GitCommit:"57a3aa3f13699cf3db9c52d228c18db94fa81876", GitTreeState:"clean", BuildDate:"2021-12-15T14:47:10Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}

What did you do?

Steps to reproduce:

Create 2 k8s nodes with same labels and same capacity.
Create Deployment with corresponding NodeAffinity, replicas: 1 and requests >50% of node's capacity (so, you can't schedule 2 pods on the same node).
Run descheduler with enabled RemovePodsViolatingNodeAffinity strategy.

What did you expect to see?

Descheduler do nothing.

What did you see instead?

Descheduler evicts pod every iteration (because it fits on another node, but not on current one).

First iteration (I removed unrelevant messages about another pods and nodes):

I0707 07:15:07.523600       1 node_affinity.go:91] "Processing node" node="node1"
I0707 07:15:07.524743       1 node.go:166] "Pod does not fit on node" pod="test/app-69d6bbd5b4-bzh9j" node="node1"
I0707 07:15:07.524768       1 node.go:168] "insufficient cpu"
I0707 07:15:07.525726       1 node.go:163] "Pod fits on node" pod="test/app-69d6bbd5b4-bzh9j" node="node2"
I0707 07:15:07.525953       1 node_affinity.go:108] "Evicting pod" pod="test/app-69d6bbd5b4-bzh9j"
I0707 07:15:07.565879       1 evictions.go:163] "Evicted pod" pod="test/app-69d6bbd5b4-bzh9j" reason="NodeAffinity" strategy="NodeAffinity" node="node1"
I0707 07:15:07.566870       1 event.go:294] "Event occurred" object="test/app-69d6bbd5b4-bzh9j" fieldPath="" kind="Pod" apiVersion="v1" type="Normal" reason="Descheduled" message="pod evicted by sigs.k8s.io/deschedulerNodeAffinity"

Second:

I0707 07:20:07.464621       1 node_affinity.go:91] "Processing node" node="node2"
I0707 07:20:07.465559       1 node.go:183] "Pod does not fit on node" pod="test/app-69d6bbd5b4-4msq7" node="node2"
I0707 07:20:07.465597       1 node.go:185] "insufficient cpu"
I0707 07:20:07.467974       1 node.go:163] "Pod fits on node" pod="test/app-69d6bbd5b4-4msq7" node="node1"
I0707 07:20:07.468864       1 node_affinity.go:108] "Evicting pod" pod="test/app-69d6bbd5b4-4msq7"
I0707 07:20:07.545268       1 evictions.go:163] "Evicted pod" pod="test/app-69d6bbd5b4-4msq7" reason="NodeAffinity" strategy="NodeAffinity" node="node2"
I0707 07:20:07.546559       1 event.go:294] "Event occurred" object="test/app-69d6bbd5b4-4msq7" fieldPath="" kind="Pod" apiVersion="v1" type="Normal" reason="Descheduled" message="pod evicted by sigs.k8s.io/deschedulerNodeAffinity"

The text was updated successfully, but these errors were encountered:

JaneLiuL · 2022-07-07T13:07:13Z

hi @seleznev thanks for raise this issue.
it seems that RemovePodsViolatingNodeAffinity need to calcuate is it pod fix another node, then start to evict.
so it means that each time before we evict we need to finger out is it there have other node fit to be schedule.

damemi · 2022-07-07T13:53:31Z

@JaneLiuL I think @seleznev is actually reporting the opposite. It seems that RemovePodsViolatingNodeAffinity is evicting pods that do not violate the NodeAffinity solely because they don't fit on their current node (for whatever reason). The bug here is that pods should not be evicted just based on their requests in the current node.

Pretty sure it's doing this because:

RemovePodsViolatingNodeAffinity calls !PodFitsOnCurrentNode() (intending to check if the pod does not fit based on its node selectors, which is what that function used to do. However,
The NodeFit feature implementation changed this function to call NodeFit().
Now, calling NodeFit() checks the node selector and resource requests, failing on either.

So what needs to happen is that the NodeFit implementation needs to somehow decouple the node selector check from the fitsRequests check. We should also add a test for this. cc @RyanDevlin @ingvagabund

JaneLiuL · 2022-07-08T01:53:06Z

thanks @damemi , looks clear to me now, i would like to take add some tests on it.

uniemimu · 2022-08-12T12:31:07Z

2. The NodeFit feature implementation [changed this function to call NodeFit()](https://github.com/kubernetes-sigs/descheduler/commit/16eb9063b6525bd7fc3cfd57b506ce2843818804#diff-fc4033754cd376d78b09ff84a934296f26e774702505885309268f816ba3817cR178-R179).

And there is the problem and cause for many issues. The same sort of baked-in NodeFit() call exists also in PodFitsAnyNode function. This means, that those strategies (or plugins) which use these functions in the filtering phase, have de-facto hard-coded NodeFit functionality which cannot be turned off. This then creates even more issues, since the NodeFit does have a controlling configuration boolean, which would imply it could be turned off.. but it can't. That boolean only has effect during the actual eviction step. Which is unfortunate as it prevents usage of these strategies (or plugins) for use-cases where getting an eviction done is more important than avoiding it entirely.

If those NodeFit() calls weren't hard-coded in these functions, meaning if that function would only be called when it is configured to be enabled, then issue #845 would be fixed and issue #863 would at least have a workaround by disabling NodeFit.

Issue #640 probably requires a bit more, but it is very much related, and removal of the PodFitsAnyNode check and relying on the configurable boolean which controls the eviction step NodeFit() -call, is part of the offered PR.

ingvagabund · 2022-08-16T11:34:12Z

This is definitely a bug in https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/descheduler/node/node.go#L122. Invoking fitsRequest needs to be made conditional:

if pod.Spec.Name != node.Name {
	// Check if the pod can fit on a node based off it's requests
	ok, reqErrors := fitsRequest(nodeIndexer, pod, node)
	if !ok {
		errors = append(errors, reqErrors...)
	}
}

There's no reason to check whether a pod fits resources of its assigned node.

ingvagabund · 2022-08-16T11:43:08Z

Given this issue is not about decoupling the NodeFit, I am dropping the generic approach comment into #640 (comment) instead to follow the discussion there.

seleznev added the kind/bug Categorizes issue or PR as related to a bug. label Jul 7, 2022

ingvagabund mentioned this issue Aug 16, 2022

NodeFit: do not check whether node fitsRequest when a pod is already assigned to the node #916

Merged

k8s-ci-robot closed this as completed in #916 Aug 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RemovePodsViolatingNodeAffinity should not rely on node capacity when check if pod fits on current node #873

RemovePodsViolatingNodeAffinity should not rely on node capacity when check if pod fits on current node #873

seleznev commented Jul 7, 2022

JaneLiuL commented Jul 7, 2022

damemi commented Jul 7, 2022

JaneLiuL commented Jul 8, 2022 •

edited

Loading

uniemimu commented Aug 12, 2022

ingvagabund commented Aug 16, 2022 •

edited

Loading

ingvagabund commented Aug 16, 2022

RemovePodsViolatingNodeAffinity should not rely on node capacity when check if pod fits on current node #873

RemovePodsViolatingNodeAffinity should not rely on node capacity when check if pod fits on current node #873

Comments

seleznev commented Jul 7, 2022

JaneLiuL commented Jul 7, 2022

damemi commented Jul 7, 2022

JaneLiuL commented Jul 8, 2022 • edited Loading

uniemimu commented Aug 12, 2022

ingvagabund commented Aug 16, 2022 • edited Loading

ingvagabund commented Aug 16, 2022

JaneLiuL commented Jul 8, 2022 •

edited

Loading

ingvagabund commented Aug 16, 2022 •

edited

Loading