Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RemovePodsViolatingTopologySpreadConstraint includeSoftConstraints: true not evicting #564

Closed
jsalatiel opened this issue May 9, 2021 · 8 comments · Fixed by #565
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@jsalatiel
Copy link

What version of descheduler are you using?

descheduler version

  • image: k8s.gcr.io/descheduler/descheduler:v0.20.0
  • helm.sh/chart: descheduler-0.20.0

Does this issue reproduce with the latest release?
Yes.

Please provide a copy of your descheduler policy config file

  policy.yaml: |
    apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
    strategies:
      LowNodeUtilization:
        enabled: false
        params:
          nodeResourceUtilizationThresholds:
            targetThresholds:
              cpu: 50
              memory: 50
              pods: 50
            thresholds:
              cpu: 20
              memory: 20
              pods: 20
      RemoveDuplicates:
        enabled: true
      RemovePodsViolatingInterPodAntiAffinity:
        enabled: true
      RemovePodsViolatingNodeAffinity:
        enabled: true
        params:
          nodeAffinityType:
          - requiredDuringSchedulingIgnoredDuringExecution
      RemovePodsViolatingNodeTaints:
        enabled: true
      RemovePodsViolatingTopologySpreadConstraint:
        enabled: true
        params:
          includeSoftConstraints: true

What k8s version are you using (kubectl version)?

kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.9", GitCommit:"9dd794e454ac32d97cde41ae10be801ae98f75df", GitTreeState:"clean", BuildDate:"2021-03-18T01:09:28Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.9", GitCommit:"9dd794e454ac32d97cde41ae10be801ae98f75df", GitTreeState:"clean", BuildDate:"2021-03-18T01:00:06Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}

What did you do?
I have created a deployment with the following topologySpread

      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: whoami
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
      - labelSelector:
          matchLabels:
            app: whoami
        maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule

I have 6 nodes in 2 zones. 3 nodes per zone. The deployment has 4 instances.
At start the topologySpreadConstraints split the pods even in the zones. 2 pods per zone in 4 different nodes.
If I put all nodes from zone 1 in drain mode, since the policy for zones is ScheduleAnyway, one of the pods will be schedule in zone2 and the other will be pending ( due DoNotSchedule policy for same node ).
After uncordon the nodes from zone1, the pending node is immediately created as expected.
So now I have:
3 pods running on zone 2
1 pod running on zone1
thus maxSkew > 1

What did you expect to see?
If I run descheduler job, I expect to see one of the pods from zone2 be rescheduler to zone1.

What did you see instead?
Nothing happens. Logs from descheduler does not show anything related to ScheduleAnyway constraint. Isn't there what includeSoftContraint should do?

I0509 19:00:18.131606       1 topologyspreadconstraint.go:109] "Processing namespaces for topology spread constraints"
I0509 19:00:18.327425       1 request.go:591] Throttling request took 195.699934ms, request: GET:https://10.237.0.1:443/api/v1/namespaces/default/pods
I0509 19:00:18.334047       1 topologyspreadconstraint.go:183] "Skipping topology constraint because it is already balanced" constraint={MaxSkew:1 TopologyKey:kubernetes.io/hostname WhenUnsatisfiable:DoNotSchedule LabelSelector:&LabelSelector{MatchLabels:map[string]string{app: whoami,role: frontend,},MatchExpressions:[]LabelSelectorRequirement{},}}
I0509 19:00:18.334095       1 topologyspreadconstraint.go:183] "Skipping topology constraint because it is already balanced" constraint={MaxSkew:1 TopologyKey:kubernetes.io/hostname WhenUnsatisfiable:DoNotSchedule LabelSelector:&LabelSelector{MatchLabels:map[string]string{app: whoami,role: frontend,},MatchExpressions:[]LabelSelectorRequirement{},}}
I0509 19:00:18.334130       1 topologyspreadconstraint.go:183] "Skipping topology constraint because it is already balanced" constraint={MaxSkew:1 TopologyKey:kubernetes.io/hostname WhenUnsatisfiable:DoNotSchedule LabelSelector:&LabelSelector{MatchLabels:map[string]string{app: whoami,role: frontend,},MatchExpressions:[]LabelSelectorRequirement{},}}
I0509 19:00:18.334147       1 topologyspreadconstraint.go:183] "Skipping topology constraint because it is already balanced" constraint={MaxSkew:1 TopologyKey:kubernetes.io/hostname WhenUnsatisfiable:DoNotSchedule LabelSelector:&LabelSelector{MatchLabels:map[string]string{app: whoami,role: frontend,},MatchExpressions:[]LabelSelectorRequirement{},}}

@jsalatiel jsalatiel added the kind/bug Categorizes issue or PR as related to a bug. label May 9, 2021
@damemi
Copy link
Contributor

damemi commented May 9, 2021

Hi,
The includeSoftConstraints parameter is not available in the v0.20.0 image. It will be included in the v0.21.0 release, which will be published soon. In the meantime, you could try this by building your own image from the master branch which should resolve your issue.

@jsalatiel
Copy link
Author

Thanks. I will wait for the next release.

@jsalatiel
Copy link
Author

I was able to compile it from master , but I am still getting the same behaviour.

Descheduler version {Major:0 Minor:20+ GitVersion:v20210509-v0.20.0-110-g2a3529c54 GitBranch:master GitSha1:2a3529c5431c528c1e727caffb2485145b259f4d BuildDate:2021-05-09T19:09:14-0300 GoVersion:go1.16.4 Compiler:gc Platform:linux/amd64}

@damemi
Copy link
Contributor

damemi commented May 10, 2021

Thanks for taking the time to test that on master @jsalatiel
I have opened #565 which adds your scenario as a test case (and it looks like we were also missing test cases for singular soft constraints). I do see that failing too, so there is indeed a bug here

@damemi
Copy link
Contributor

damemi commented May 10, 2021

@jsalatiel, sorry, I had a mistake in the test which was causing it to fail. I updated the test in order to match your scenario and it is passing.

The test does the following:

  • create 6 nodes: 3 labeled with zoneA and 3 with zoneB
  • Create 4 pods: 1 on each of the zoneA nodes and the last on a zoneB node
  • Each pod has these constraints:
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: whoami
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
      - labelSelector:
          matchLabels:
            app: whoami
        maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
  • Enable soft constraint checking
  • Expect 1 eviction

When I add logging, I see that the pod evicted is from zoneA, so this does appear to be working and the zones are balanced. Is there any other information you can provide, such as the node yamls?

@damemi
Copy link
Contributor

damemi commented May 10, 2021

If you could also provide the Descheduler logs from when you ran it in master (with --v=4, if possible), that can help us figure out why you are hitting this too

@jsalatiel
Copy link
Author

Hi @damemi , after enabling debug I noticed that the problem is because all my pods mount /etc/localtime from the host.

I0510 19:07:59.089224 1 evictions.go:260] "Pod lacks an eviction annotation and fails the following checks" pod="default/whoami-65fdc87f6d-c5n2t" checks="pod has local storage and descheduler is not configured with evictLocalStoragePods"

Thanks for the help and sorry for taking you time.

@damemi
Copy link
Contributor

damemi commented May 10, 2021

No problem! Glad we solved it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants