Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: close of closed channel #741

Closed
amadeusjposchmann opened this issue Feb 25, 2022 · 2 comments
Closed

panic: close of closed channel #741

amadeusjposchmann opened this issue Feb 25, 2022 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@amadeusjposchmann
Copy link

What version of descheduler are you using?

descheduler version: 0.23.1

Does this issue reproduce with the latest release?
Yes

Which descheduler CLI options are you using?
Defaults from Helm chart

Please provide a copy of your descheduler policy config file
Defaults from Helm chart:

  strategies:
    RemoveDuplicates:
      enabled: true
    RemovePodsViolatingNodeTaints:
      enabled: true
    RemovePodsViolatingNodeAffinity:
      enabled: true
      params:
        nodeAffinityType:
        - requiredDuringSchedulingIgnoredDuringExecution
    RemovePodsViolatingInterPodAntiAffinity:
      enabled: true
    LowNodeUtilization:
      enabled: true
      params:
        nodeResourceUtilizationThresholds:
          thresholds:
            cpu: 20
            memory: 20
            pods: 20
          targetThresholds:
            cpu: 50
            memory: 50
            pods: 50

What k8s version are you using (kubectl version)?

kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:30:48Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.9", GitCommit:"88ccc7584c4460cf46e4180be83e7c4ab7a8ffb4", GitTreeState:"clean", BuildDate:"2022-01-30T02:57:18Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}

What did you do?
Installed the Helm chart without any custom values.

What did you expect to see?
Pods finisihing with state "Succeeded".

What did you see instead?
Mixture of "Succeeded" and "Failed" pods. Failing pods panic with:

panic: close of closed channel

goroutine 90 [running]:
sigs.k8s.io/descheduler/pkg/descheduler.Run.func1()
	/go/src/sigs.k8s.io/descheduler/pkg/descheduler/descheduler.go:76 +0x3c
created by sigs.k8s.io/descheduler/pkg/descheduler.Run
	/go/src/sigs.k8s.io/descheduler/pkg/descheduler/descheduler.go:74 +0x165

Feels like something race-conditiony. Some Pods succeed, some fail. Full log below.

Full log (redacted private pod names)
I0225 10:38:01.989179       1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1645785481\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1645785481\" (2022-02-25 09:38:01 +0000 UTC to 2023-02-25 09:38:01 +0000 UTC (now=2022-02-25 10:38:01.989142099 +0000 UTC))"
I0225 10:38:01.989228       1 secure_serving.go:200] Serving securely on [::]:10258
I0225 10:38:01.989437       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0225 10:38:02.053976       1 reflector.go:219] Starting reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:134
I0225 10:38:02.053993       1 reflector.go:255] Listing and watching *v1.Pod from k8s.io/client-go/informers/factory.go:134
I0225 10:38:02.054077       1 reflector.go:219] Starting reflector *v1.PriorityClass (0s) from k8s.io/client-go/informers/factory.go:134
I0225 10:38:02.054082       1 reflector.go:255] Listing and watching *v1.PriorityClass from k8s.io/client-go/informers/factory.go:134
I0225 10:38:02.053978       1 reflector.go:219] Starting reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:134
I0225 10:38:02.054130       1 reflector.go:255] Listing and watching *v1.Namespace from k8s.io/client-go/informers/factory.go:134
I0225 10:38:02.154082       1 node.go:46] "Node lister returned empty list, now fetch directly"
I0225 10:38:02.168948       1 descheduler.go:266] Building a pod evictor
I0225 10:38:02.169159       1 nodeutilization.go:167] "Node is overutilized" node="aks-default-77236376-vmss000000" usage=map[cpu:1290m memory:1214Mi pods:23] usagePercentage=map[cpu:67.89473684210526 memory:22.63010614415712 pods:20.90909090909091]
I0225 10:38:02.169183       1 nodeutilization.go:167] "Node is overutilized" node="aks-default-77236376-vmss000001" usage=map[cpu:1304m memory:1395Mi pods:24] usagePercentage=map[cpu:68.63157894736842 memory:26.004117027264567 pods:21.818181818181817]
I0225 10:38:02.169202       1 nodeutilization.go:167] "Node is overutilized" node="aks-default-77236376-vmss000002" usage=map[cpu:1560m memory:1276Mi pods:18] usagePercentage=map[cpu:82.10526315789474 memory:23.78584467870221 pods:16.363636363636363]
I0225 10:38:02.169219       1 lownodeutilization.go:101] "Criteria for a node under utilization" CPU=20 Mem=20 Pods=20
I0225 10:38:02.169228       1 lownodeutilization.go:102] "Number of underutilized nodes" totalNumber=0
I0225 10:38:02.169238       1 lownodeutilization.go:115] "Criteria for a node above target utilization" CPU=50 Mem=50 Pods=50
I0225 10:38:02.169245       1 lownodeutilization.go:116] "Number of overutilized nodes" totalNumber=3
I0225 10:38:02.169254       1 lownodeutilization.go:119] "No node is underutilized, nothing to do here, you might tune your thresholds further"
I0225 10:38:02.169266       1 duplicates.go:110] "Processing node" node="aks-default-77236376-vmss000000"
I0225 10:38:02.169369       1 duplicates.go:110] "Processing node" node="aks-default-77236376-vmss000001"
I0225 10:38:02.169478       1 duplicates.go:110] "Processing node" node="aks-default-77236376-vmss000002"
I0225 10:38:02.169549       1 pod_antiaffinity.go:92] "Processing node" node="aks-default-77236376-vmss000000"
I0225 10:38:02.169695       1 pod_antiaffinity.go:92] "Processing node" node="aks-default-77236376-vmss000001"
I0225 10:38:02.169827       1 pod_antiaffinity.go:92] "Processing node" node="aks-default-77236376-vmss000002"
I0225 10:38:02.169911       1 node_affinity.go:86] "Executing for nodeAffinityType" nodeAffinity="requiredDuringSchedulingIgnoredDuringExecution"
I0225 10:38:02.169922       1 node_affinity.go:91] "Processing node" node="aks-default-77236376-vmss000000"
I0225 10:38:02.169946       1 node.go:169] "Pod fits on node" pod="uat/-5bc45656c-7nfpd" node="aks-default-77236376-vmss000000"
I0225 10:38:02.169957       1 node.go:169] "Pod fits on node" pod="uat/-5dd64cc47f-2j4dj" node="aks-default-77236376-vmss000000"
I0225 10:38:02.169970       1 node.go:169] "Pod fits on node" pod="uat/-76b76d5b84-cw7n2" node="aks-default-77236376-vmss000000"
I0225 10:38:02.169980       1 node.go:169] "Pod fits on node" pod="uat/-5b4976477f-tjl5w" node="aks-default-77236376-vmss000000"
I0225 10:38:02.169993       1 node.go:169] "Pod fits on node" pod="uat/-7df8bc57df-6zr7r" node="aks-default-77236376-vmss000000"
I0225 10:38:02.170000       1 node.go:169] "Pod fits on node" pod="uat/-644cd554bc-ft65m" node="aks-default-77236376-vmss000000"
I0225 10:38:02.170008       1 node.go:169] "Pod fits on node" pod="uat/-5d5bdbbd66-b9zzl" node="aks-default-77236376-vmss000000"
I0225 10:38:02.170041       1 node.go:169] "Pod fits on node" pod="uat/-78669c5f76-7z7db" node="aks-default-77236376-vmss000000"
I0225 10:38:02.170054       1 node.go:169] "Pod fits on node" pod="uat/-755d6d849d-zslp7" node="aks-default-77236376-vmss000000"
I0225 10:38:02.170059       1 node.go:169] "Pod fits on node" pod="cert-manager/cert-manager-cainjector-7d55bf8f78-kzpjf" node="aks-default-77236376-vmss000000"
I0225 10:38:02.170067       1 node_affinity.go:91] "Processing node" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170075       1 node.go:169] "Pod fits on node" pod="uat/-5d5bdbbd66-2tk78" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170082       1 node.go:169] "Pod fits on node" pod="uat/-5dd64cc47f-vmvn4" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170093       1 node.go:169] "Pod fits on node" pod="uat/-644cd554bc-bwfd8" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170101       1 node.go:169] "Pod fits on node" pod="cert-manager/cert-manager-6d6bb4f487-8r7l4" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170118       1 node.go:169] "Pod fits on node" pod="uat/-5bc45656c-dqd5l" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170123       1 node.go:169] "Pod fits on node" pod="uat/-7df8bc57df-vx6g9" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170134       1 node.go:169] "Pod fits on node" pod="uat/-76b76d5b84-wtwr7" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170140       1 node.go:169] "Pod fits on node" pod="cert-manager/cert-manager-webhook-577f77586f-9nj5b" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170145       1 node.go:169] "Pod fits on node" pod="prometheus/prometheus-kube-prometheus-operator-788dfbdccc-hd7mv" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170161       1 node.go:169] "Pod fits on node" pod="uat/-78669c5f76-jfv85" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170175       1 node.go:169] "Pod fits on node" pod="prometheus/alertmanager-prometheus-kube-prometheus-alertmanager-0" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170195       1 node.go:169] "Pod fits on node" pod="uat/-755d6d849d-fkmdj" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170200       1 node.go:169] "Pod fits on node" pod="prometheus/prometheus-kube-state-metrics-57c988498f-45b9s" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170205       1 node.go:169] "Pod fits on node" pod="uat/-5b4976477f-tss9m" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170209       1 node_affinity.go:91] "Processing node" node="aks-default-77236376-vmss000002"
I0225 10:38:02.170216       1 node.go:169] "Pod fits on node" pod="uat/-5d5bdbbd66-6jddv" node="aks-default-77236376-vmss000002"
I0225 10:38:02.170221       1 node.go:169] "Pod fits on node" pod="uat/-5b4976477f-s5ths" node="aks-default-77236376-vmss000002"
I0225 10:38:02.170226       1 node.go:169] "Pod fits on node" pod="uat/-644cd554bc-8d4lp" node="aks-default-77236376-vmss000002"
I0225 10:38:02.170233       1 node.go:169] "Pod fits on node" pod="uat/-7df8bc57df-hjnw5" node="aks-default-77236376-vmss000002"
I0225 10:38:02.170268       1 node.go:169] "Pod fits on node" pod="uat/-5bc45656c-xxrsf" node="aks-default-77236376-vmss000002"
I0225 10:38:02.170273       1 node.go:169] "Pod fits on node" pod="uat/-76b76d5b84-6lprk" node="aks-default-77236376-vmss000002"
I0225 10:38:02.170279       1 node.go:169] "Pod fits on node" pod="uat/-78669c5f76-mgd75" node="aks-default-77236376-vmss000002"
I0225 10:38:02.170284       1 node.go:169] "Pod fits on node" pod="uat/-755d6d849d-4t97x" node="aks-default-77236376-vmss000002"
I0225 10:38:02.170290       1 node.go:169] "Pod fits on node" pod="uat/-5dd64cc47f-n7nz5" node="aks-default-77236376-vmss000002"
I0225 10:38:02.170296       1 node_taint.go:93] "Processing node" node="aks-default-77236376-vmss000000"
I0225 10:38:02.170351       1 node_taint.go:93] "Processing node" node="aks-default-77236376-vmss000001"
I0225 10:38:02.170394       1 node_taint.go:93] "Processing node" node="aks-default-77236376-vmss000002"
I0225 10:38:02.170456       1 descheduler.go:291] "Number of evicted pods" totalEvicted=0
I0225 10:38:02.170538       1 reflector.go:225] Stopping reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:134
I0225 10:38:02.170590       1 reflector.go:225] Stopping reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:134
I0225 10:38:02.170618       1 reflector.go:225] Stopping reflector *v1.PriorityClass (0s) from k8s.io/client-go/informers/factory.go:134
panic: close of closed channel

goroutine 90 [running]:
sigs.k8s.io/descheduler/pkg/descheduler.Run.func1()
/go/src/sigs.k8s.io/descheduler/pkg/descheduler/descheduler.go:76 +0x3c
created by sigs.k8s.io/descheduler/pkg/descheduler.Run
/go/src/sigs.k8s.io/descheduler/pkg/descheduler/descheduler.go:74 +0x165

@amadeusjposchmann amadeusjposchmann added the kind/bug Categorizes issue or PR as related to a bug. label Feb 25, 2022
@damemi
Copy link
Contributor

damemi commented Feb 25, 2022

This seems like a duplicate of #728 which was fixed by #731. We will publish a patch release with this fix soon, sorry for the inconvenience.

In the meantime, the workaround is to run the descheduler as a Deployment instead of a cronjob (if possible for your environment).

Closing this issue as a duplicate, please feel free to reopen if you have more to discuss regarding it. Thanks!
/close

@k8s-ci-robot
Copy link
Contributor

@damemi: Closing this issue.

In response to this:

This seems like a duplicate of #728 which was fixed by #731. We will publish a patch release with this fix soon, sorry for the inconvenience.

In the meantime, the workaround is to run the descheduler as a Deployment instead of a cronjob (if possible for your environment).

Closing this issue as a duplicate, please feel free to reopen if you have more to discuss regarding it. Thanks!
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants