Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TopologySpreadConstraint does not balance domains on first run #601

Closed
a7i opened this issue Jul 13, 2021 · 3 comments
Closed

TopologySpreadConstraint does not balance domains on first run #601

a7i opened this issue Jul 13, 2021 · 3 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@a7i
Copy link
Contributor

a7i commented Jul 13, 2021

What version of descheduler are you using?

descheduler version: v0.21.0

Does this issue reproduce with the latest release?
Yes

Which descheduler CLI options are you using?
--logging-format text --policy-config-file /policy-dir/policy.yaml --v 4

Please provide a copy of your descheduler policy config file

    apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
    evictLocalStoragePods: true
    strategies:
      RemovePodsViolatingTopologySpreadConstraint:
        enabled: true
        params:
          includeSoftConstraints: true
          labelSelector:
            matchExpressions:
            - {key: descheduler.k8s.io/opt-in, operator: In, values: ["true"]}

What k8s version are you using (kubectl version)?

kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:11:29Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.8", GitCommit:"5575935422cc1cf5169dfc8847cb587aa47bac5a", GitTreeState:"clean", BuildDate:"2021-06-16T12:53:07Z", GoVersion:"go1.15.13", Compiler:"gc", Platform:"linux/amd64"}

What did you do?
I have a deployment with 15 replicas with the following topologySpreadConstraint.

      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:

I simulate an AZ failure and end up with [8, 7, 0] Pods across 3 AZs. After AZ recovery, I would expect Descheduler to balance domains to [5, 5, 5] but it takes 2 run to get to the ideal state.

You can simulate this via a TestCase in the unit test for TopologySpreadConstraint:

               {
                       name: "3 domains size [8 7 0], maxSkew=1, should move 5 to get [5 5 5]",
                       nodes: []*v1.Node{
                               test.BuildTestNode("n1", 2000, 3000, 10, func(n *v1.Node) { n.Labels["zone"] = "zoneA" }),
                               test.BuildTestNode("n2", 2000, 3000, 10, func(n *v1.Node) { n.Labels["zone"] = "zoneB" }),
                               test.BuildTestNode("n3", 2000, 3000, 10, func(n *v1.Node) { n.Labels["zone"] = "zoneC" }),
                       },
                       pods: createTestPods([]testPodList{
                               {
                                       count:       8,
                                       node:        "n1",
                                       labels:      map[string]string{"foo": "bar"},
                                       constraints: getDefaultTopologyConstraints(1),
                               },
                               {
                                       count:  7,
                                       node:   "n2",
                                       labels: map[string]string{"foo": "bar"},
                                       constraints: getDefaultTopologyConstraints(1),
                               },
                       }),
                       expectedEvictedCount: 5,
                       strategy: api.DeschedulerStrategy{},
                       namespaces: []string{"ns1"},
               },

What did you expect to see?
After AZ recovery, I would expect 5 evictions to get to [5, 5, 5].

What did you see instead?
After AZ recovery, I saw 3 evictions.

@a7i a7i added the kind/bug Categorizes issue or PR as related to a bug. label Jul 13, 2021
@a7i
Copy link
Contributor Author

a7i commented Jul 14, 2021

/assign

@seanmalloy
Copy link
Member

Looks like this was fixed in #602. @a7i please feel free to reopen if you fell I've close this issue prematurely.

/close

@k8s-ci-robot
Copy link
Contributor

@seanmalloy: Closing this issue.

In response to this:

Looks like this was fixed in #602. @a7i please feel free to reopen if you fell I've close this issue prematurely.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants