TopologySpreadConstraint does not balance domains on first run #601

a7i · 2021-07-13T13:07:58Z

What version of descheduler are you using?

descheduler version: v0.21.0

Does this issue reproduce with the latest release?
Yes

Which descheduler CLI options are you using?
--logging-format text --policy-config-file /policy-dir/policy.yaml --v 4

Please provide a copy of your descheduler policy config file

    apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
    evictLocalStoragePods: true
    strategies:
      RemovePodsViolatingTopologySpreadConstraint:
        enabled: true
        params:
          includeSoftConstraints: true
          labelSelector:
            matchExpressions:
            - {key: descheduler.k8s.io/opt-in, operator: In, values: ["true"]}

What k8s version are you using (kubectl version)?

kubectl version Output

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:11:29Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.8", GitCommit:"5575935422cc1cf5169dfc8847cb587aa47bac5a", GitTreeState:"clean", BuildDate:"2021-06-16T12:53:07Z", GoVersion:"go1.15.13", Compiler:"gc", Platform:"linux/amd64"}

What did you do?
I have a deployment with 15 replicas with the following topologySpreadConstraint.

      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:

I simulate an AZ failure and end up with [8, 7, 0] Pods across 3 AZs. After AZ recovery, I would expect Descheduler to balance domains to [5, 5, 5] but it takes 2 run to get to the ideal state.

You can simulate this via a TestCase in the unit test for TopologySpreadConstraint:

               {
                       name: "3 domains size [8 7 0], maxSkew=1, should move 5 to get [5 5 5]",
                       nodes: []*v1.Node{
                               test.BuildTestNode("n1", 2000, 3000, 10, func(n *v1.Node) { n.Labels["zone"] = "zoneA" }),
                               test.BuildTestNode("n2", 2000, 3000, 10, func(n *v1.Node) { n.Labels["zone"] = "zoneB" }),
                               test.BuildTestNode("n3", 2000, 3000, 10, func(n *v1.Node) { n.Labels["zone"] = "zoneC" }),
                       },
                       pods: createTestPods([]testPodList{
                               {
                                       count:       8,
                                       node:        "n1",
                                       labels:      map[string]string{"foo": "bar"},
                                       constraints: getDefaultTopologyConstraints(1),
                               },
                               {
                                       count:  7,
                                       node:   "n2",
                                       labels: map[string]string{"foo": "bar"},
                                       constraints: getDefaultTopologyConstraints(1),
                               },
                       }),
                       expectedEvictedCount: 5,
                       strategy: api.DeschedulerStrategy{},
                       namespaces: []string{"ns1"},
               },

What did you expect to see?
After AZ recovery, I would expect 5 evictions to get to [5, 5, 5].

What did you see instead?
After AZ recovery, I saw 3 evictions.

The text was updated successfully, but these errors were encountered:

a7i · 2021-07-14T02:43:29Z

/assign

seanmalloy · 2021-08-20T05:29:58Z

Looks like this was fixed in #602. @a7i please feel free to reopen if you fell I've close this issue prematurely.

/close

k8s-ci-robot · 2021-08-20T05:30:08Z

@seanmalloy: Closing this issue.

In response to this:

Looks like this was fixed in #602. @a7i please feel free to reopen if you fell I've close this issue prematurely.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

a7i added the kind/bug Categorizes issue or PR as related to a bug. label Jul 13, 2021

a7i mentioned this issue Jul 14, 2021

TopologySpreadConstraint: advance above avg index when at ideal average #602

Merged

k8s-ci-robot assigned a7i Jul 14, 2021

k8s-ci-robot closed this as completed Aug 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TopologySpreadConstraint does not balance domains on first run #601

TopologySpreadConstraint does not balance domains on first run #601

a7i commented Jul 13, 2021

a7i commented Jul 14, 2021

seanmalloy commented Aug 20, 2021

k8s-ci-robot commented Aug 20, 2021

TopologySpreadConstraint does not balance domains on first run #601

TopologySpreadConstraint does not balance domains on first run #601

Comments

a7i commented Jul 13, 2021

a7i commented Jul 14, 2021

seanmalloy commented Aug 20, 2021

k8s-ci-robot commented Aug 20, 2021