RemovePodsViolatingTopologySpreadConstraint includeSoftConstraints: true not evicting #564

jsalatiel · 2021-05-09T19:09:58Z

What version of descheduler are you using?

descheduler version

image: k8s.gcr.io/descheduler/descheduler:v0.20.0
helm.sh/chart: descheduler-0.20.0

Does this issue reproduce with the latest release?
Yes.

Please provide a copy of your descheduler policy config file

  policy.yaml: |
    apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
    strategies:
      LowNodeUtilization:
        enabled: false
        params:
          nodeResourceUtilizationThresholds:
            targetThresholds:
              cpu: 50
              memory: 50
              pods: 50
            thresholds:
              cpu: 20
              memory: 20
              pods: 20
      RemoveDuplicates:
        enabled: true
      RemovePodsViolatingInterPodAntiAffinity:
        enabled: true
      RemovePodsViolatingNodeAffinity:
        enabled: true
        params:
          nodeAffinityType:
          - requiredDuringSchedulingIgnoredDuringExecution
      RemovePodsViolatingNodeTaints:
        enabled: true
      RemovePodsViolatingTopologySpreadConstraint:
        enabled: true
        params:
          includeSoftConstraints: true

What k8s version are you using (kubectl version)?

kubectl version Output

$ kubectl version Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.9", GitCommit:"9dd794e454ac32d97cde41ae10be801ae98f75df", GitTreeState:"clean", BuildDate:"2021-03-18T01:09:28Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.9", GitCommit:"9dd794e454ac32d97cde41ae10be801ae98f75df", GitTreeState:"clean", BuildDate:"2021-03-18T01:00:06Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}

What did you do?
I have created a deployment with the following topologySpread

      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: whoami
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
      - labelSelector:
          matchLabels:
            app: whoami
        maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule

I have 6 nodes in 2 zones. 3 nodes per zone. The deployment has 4 instances.
At start the topologySpreadConstraints split the pods even in the zones. 2 pods per zone in 4 different nodes.
If I put all nodes from zone 1 in drain mode, since the policy for zones is ScheduleAnyway, one of the pods will be schedule in zone2 and the other will be pending ( due DoNotSchedule policy for same node ).
After uncordon the nodes from zone1, the pending node is immediately created as expected.
So now I have:
3 pods running on zone 2
1 pod running on zone1
thus maxSkew > 1

What did you expect to see?
If I run descheduler job, I expect to see one of the pods from zone2 be rescheduler to zone1.

What did you see instead?
Nothing happens. Logs from descheduler does not show anything related to ScheduleAnyway constraint. Isn't there what includeSoftContraint should do?

I0509 19:00:18.131606       1 topologyspreadconstraint.go:109] "Processing namespaces for topology spread constraints"
I0509 19:00:18.327425       1 request.go:591] Throttling request took 195.699934ms, request: GET:https://10.237.0.1:443/api/v1/namespaces/default/pods
I0509 19:00:18.334047       1 topologyspreadconstraint.go:183] "Skipping topology constraint because it is already balanced" constraint={MaxSkew:1 TopologyKey:kubernetes.io/hostname WhenUnsatisfiable:DoNotSchedule LabelSelector:&LabelSelector{MatchLabels:map[string]string{app: whoami,role: frontend,},MatchExpressions:[]LabelSelectorRequirement{},}}
I0509 19:00:18.334095       1 topologyspreadconstraint.go:183] "Skipping topology constraint because it is already balanced" constraint={MaxSkew:1 TopologyKey:kubernetes.io/hostname WhenUnsatisfiable:DoNotSchedule LabelSelector:&LabelSelector{MatchLabels:map[string]string{app: whoami,role: frontend,},MatchExpressions:[]LabelSelectorRequirement{},}}
I0509 19:00:18.334130       1 topologyspreadconstraint.go:183] "Skipping topology constraint because it is already balanced" constraint={MaxSkew:1 TopologyKey:kubernetes.io/hostname WhenUnsatisfiable:DoNotSchedule LabelSelector:&LabelSelector{MatchLabels:map[string]string{app: whoami,role: frontend,},MatchExpressions:[]LabelSelectorRequirement{},}}
I0509 19:00:18.334147       1 topologyspreadconstraint.go:183] "Skipping topology constraint because it is already balanced" constraint={MaxSkew:1 TopologyKey:kubernetes.io/hostname WhenUnsatisfiable:DoNotSchedule LabelSelector:&LabelSelector{MatchLabels:map[string]string{app: whoami,role: frontend,},MatchExpressions:[]LabelSelectorRequirement{},}}

The text was updated successfully, but these errors were encountered:

damemi · 2021-05-09T20:21:57Z

Hi,
The includeSoftConstraints parameter is not available in the v0.20.0 image. It will be included in the v0.21.0 release, which will be published soon. In the meantime, you could try this by building your own image from the master branch which should resolve your issue.

jsalatiel · 2021-05-09T21:47:55Z

Thanks. I will wait for the next release.

jsalatiel · 2021-05-09T22:35:56Z

I was able to compile it from master , but I am still getting the same behaviour.

Descheduler version {Major:0 Minor:20+ GitVersion:v20210509-v0.20.0-110-g2a3529c54 GitBranch:master GitSha1:2a3529c5431c528c1e727caffb2485145b259f4d BuildDate:2021-05-09T19:09:14-0300 GoVersion:go1.16.4 Compiler:gc Platform:linux/amd64}

damemi · 2021-05-10T16:03:15Z

Thanks for taking the time to test that on master @jsalatiel
I have opened #565 which adds your scenario as a test case (and it looks like we were also missing test cases for singular soft constraints). I do see that failing too, so there is indeed a bug here

damemi · 2021-05-10T17:51:14Z

@jsalatiel, sorry, I had a mistake in the test which was causing it to fail. I updated the test in order to match your scenario and it is passing.

The test does the following:

create 6 nodes: 3 labeled with zoneA and 3 with zoneB
Create 4 pods: 1 on each of the zoneA nodes and the last on a zoneB node
Each pod has these constraints:

      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: whoami
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
      - labelSelector:
          matchLabels:
            app: whoami
        maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule

Enable soft constraint checking
Expect 1 eviction

When I add logging, I see that the pod evicted is from zoneA, so this does appear to be working and the zones are balanced. Is there any other information you can provide, such as the node yamls?

damemi · 2021-05-10T17:53:23Z

If you could also provide the Descheduler logs from when you ran it in master (with --v=4, if possible), that can help us figure out why you are hitting this too

jsalatiel · 2021-05-10T19:19:29Z

Hi @damemi , after enabling debug I noticed that the problem is because all my pods mount /etc/localtime from the host.

I0510 19:07:59.089224 1 evictions.go:260] "Pod lacks an eviction annotation and fails the following checks" pod="default/whoami-65fdc87f6d-c5n2t" checks="pod has local storage and descheduler is not configured with evictLocalStoragePods"

Thanks for the help and sorry for taking you time.

damemi · 2021-05-10T19:22:01Z

No problem! Glad we solved it :)

jsalatiel added the kind/bug Categorizes issue or PR as related to a bug. label May 9, 2021

jsalatiel closed this as completed May 9, 2021

jsalatiel reopened this May 9, 2021

damemi mentioned this issue May 10, 2021

Add test cases for soft constraints/multi constraints #565

Merged

jsalatiel closed this as completed May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RemovePodsViolatingTopologySpreadConstraint includeSoftConstraints: true not evicting #564

RemovePodsViolatingTopologySpreadConstraint includeSoftConstraints: true not evicting #564

jsalatiel commented May 9, 2021

damemi commented May 9, 2021

jsalatiel commented May 9, 2021

jsalatiel commented May 9, 2021

damemi commented May 10, 2021

damemi commented May 10, 2021

damemi commented May 10, 2021

jsalatiel commented May 10, 2021

damemi commented May 10, 2021

RemovePodsViolatingTopologySpreadConstraint includeSoftConstraints: true not evicting #564

RemovePodsViolatingTopologySpreadConstraint includeSoftConstraints: true not evicting #564

Comments

jsalatiel commented May 9, 2021

damemi commented May 9, 2021

jsalatiel commented May 9, 2021

jsalatiel commented May 9, 2021

damemi commented May 10, 2021

damemi commented May 10, 2021

damemi commented May 10, 2021

jsalatiel commented May 10, 2021

damemi commented May 10, 2021