Descheduler | Log Data Evicted Pod Totals - Specific Strategy & Total of all Strategies #501

martinwoods · 2021-02-18T14:38:04Z

Is your feature request related to a problem? Please describe.

Evicted pods - While testing the descheduler in an EKS cluster running stateful pods with a node affinity of "preferredDuringSchedulingIgnoredDuringExecution"

We had the following evictions early in the log data for LowNodeUtilization which fine, so 2 pods evicted in this case:

I0211 15:05:03.181180       1 evictions.go:117] Evicted pod: "???-dev-mongodb-secondary-1" in namespace "???-dev" (LowNodeUtilization)
I0211 15:05:03.181497       1 lownodeutilization.go:302] Evicted pod: "???-dev-mongodb-secondary-1"
I0211 15:05:03.181525       1 lownodeutilization.go:315] updated node usage: api.ResourceThresholds{"cpu":54.85714285714286, "memory":44.85913009880116, "pods":17.142857142857142}
I0211 15:05:03.181602       1 lownodeutilization.go:267] 1 pods evicted from node "ip-??-??-8-78.us-west-2.compute.internal" with usage map[cpu:54.85714285714286 memory:44.85913009880116 pods:17.142857142857142]
I0211 15:05:03.181712       1 lownodeutilization.go:252] evicting pods from node "ip-??-??-91-42.us-west-2.compute.internal" with usage: api.ResourceThresholds{"cpu":66.28571428571429, "memory":59.13248967569243, "pods":17.142857142857142}
I0211 15:05:03.181922       1 lownodeutilization.go:255] allPods:6, nonRemovablePods:5, removablePods:1
I0211 15:05:03.181984       1 lownodeutilization.go:262] evicting pods based on priority, if they have same priority, they'll be evicted based on QoS tiers
I0211 15:05:03.182281       1 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"???-dev", Name:"???-dev-mongodb-secondary-1", UID:"5d7a3627-5740-466c-96f2-57a7ce0c8007", APIVersion:"v1", ResourceVersion:"2482168", FieldPath:""}): type: 'Normal' reason: 'Descheduled' pod evicted by sigs.k8s.io/descheduler (LowNodeUtilization)
I0211 15:05:03.200648       1 evictions.go:117] Evicted pod: "???-dev-mongodb-primary-0" in namespace "???-dev" (LowNodeUtilization)
I0211 15:05:03.200923       1 lownodeutilization.go:302] Evicted pod: "???-dev-mongodb-primary-0"
I0211 15:05:03.200969       1 lownodeutilization.go:315] updated node usage: api.ResourceThresholds{"cpu":49.14285714285715, "memory":40.78102736254651, "pods":14.285714285714285}
I0211 15:05:03.201056       1 lownodeutilization.go:267] 1 pods evicted from node "ip-??-??-91-42.us-west-2.compute.internal" with usage map[cpu:49.14285714285715 memory:40.78102736254651 pods:14.285714285714285]
I0211 15:05:03.201206       1 lownodeutilization.go:139] Total number of pods evicted: 2

But later in the log data we then see an eviction number mentioned for the strategy RemovePodsViolatingNodeAffinity, again it states that there are 2 evictions:

I0211 15:05:05.257202       1 node_affinity.go:70] Executing for nodeAffinityType: requiredDuringSchedulingIgnoredDuringExecution
I0211 15:05:05.257339       1 node_affinity.go:75] Processing node: "ip-??-??-15-12.us-west-2.compute.internal"
I0211 15:05:05.443666       1 request.go:581] Throttling request took 186.209023ms, request: GET:https://0.1.0.0:123/api/v1/pods?fieldSelector=spec.nodeName%3Dip-??-??-15-12.us-west-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0211 15:05:05.451940       1 node_affinity.go:75] Processing node: "ip-??-??-28-65.us-west-2.compute.internal"
I0211 15:05:05.643616       1 request.go:581] Throttling request took 191.549799ms, request: GET:https://0.1.0.0:123/api/v1/pods?fieldSelector=spec.nodeName%3Dip-??-??-28-65.us-west-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0211 15:05:05.654971       1 node.go:137] Pod name-env-ambassador-f859b7448-c8jnf fits on node ip-??-??-28-65.us-west-2.compute.internal
I0211 15:05:05.655004       1 node.go:137] Pod tiller-deploy-987c5d4b4-z9gnh fits on node ip-??-??-28-65.us-west-2.compute.internal
I0211 15:05:05.655059       1 node.go:137] Pod tiller-deploy-7848bd544b-nb4sf fits on node ip-??-??-28-65.us-west-2.compute.internal
I0211 15:05:05.655404       1 node.go:137] Pod tiller-deploy-5686b45c86-6zbzq fits on node ip-??-??-28-65.us-west-2.compute.internal
I0211 15:05:05.655431       1 node_affinity.go:75] Processing node: "ip-??-??-32-197.us-west-2.compute.internal"
I0211 15:05:05.843622       1 request.go:581] Throttling request took 188.067107ms, request: GET:https://0.1.0.0:123/api/v1/pods?fieldSelector=spec.nodeName%3Dip-??-??-32-197.us-west-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0211 15:05:05.851515       1 node_affinity.go:75] Processing node: "ip-??-??-38-203.us-west-2.compute.internal"
I0211 15:05:06.043641       1 request.go:581] Throttling request took 191.968145ms, request: GET:https://0.1.0.0:123/api/v1/pods?fieldSelector=spec.nodeName%3Dip-??-??-38-203.us-west-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0211 15:05:06.056948       1 node.go:137] Pod name-env-ambassador-f859b7448-99kfh fits on node ip-??-??-38-203.us-west-2.compute.internal
I0211 15:05:06.057351       1 node.go:137] Pod name-env-ambassador-redis-88fcd86d8-5vfmh fits on node ip-??-??-38-203.us-west-2.compute.internal
I0211 15:05:06.057472       1 node.go:137] Pod name-env-mongodb-backup-mgob-0 fits on node ip-??-??-38-203.us-west-2.compute.internal
I0211 15:05:06.057562       1 node.go:137] Pod name-env-nginx-ingress-controller-867979b6df-gm9h5 fits on node ip-??-??-38-203.us-west-2.compute.internal
I0211 15:05:06.057656       1 node.go:137] Pod external-dns-66757d8c4f-t96jz fits on node ip-??-??-38-203.us-west-2.compute.internal
I0211 15:05:06.057759       1 node.go:137] Pod prometheus-kube-state-metrics-5d7c775cbc-lxphb fits on node ip-??-??-38-203.us-west-2.compute.internal
I0211 15:05:06.057848       1 node.go:137] Pod prometheus-server-7bf7dfc69b-vcp9c fits on node ip-??-??-38-203.us-west-2.compute.internal
I0211 15:05:06.057968       1 node_affinity.go:75] Processing node: "ip-??-??-41-72.us-west-2.compute.internal"
I0211 15:05:06.243615       1 request.go:581] Throttling request took 185.422714ms, request: GET:https://0.1.0.0:123/api/v1/pods?fieldSelector=spec.nodeName%3Dip-??-??-41-72.us-west-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0211 15:05:06.253468       1 node.go:137] Pod name-env-mongodb-arbiter-0 fits on node ip-??-??-41-72.us-west-2.compute.internal
I0211 15:05:06.253500       1 node.go:137] Pod name-env-mongodb-secondary-0 fits on node ip-??-??-41-72.us-west-2.compute.internal
I0211 15:05:06.253541       1 node_affinity.go:75] Processing node: "ip-??-??-68-232.us-west-2.compute.internal"
I0211 15:05:06.443609       1 request.go:581] Throttling request took 189.98173ms, request: GET:https://0.1.0.0:123/api/v1/pods?fieldSelector=spec.nodeName%3Dip-??-??-68-232.us-west-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0211 15:05:06.455219       1 node_affinity.go:75] Processing node: "ip-??-??-8-78.us-west-2.compute.internal"
I0211 15:05:06.643603       1 request.go:581] Throttling request took 188.175458ms, request: GET:https://0.1.0.0:123/api/v1/pods?fieldSelector=spec.nodeName%3Dip-??-??-8-78.us-west-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0211 15:05:06.653723       1 node.go:137] Pod name-env-mongodb-secondary-1 fits on node ip-??-??-8-78.us-west-2.compute.internal
I0211 15:05:06.653785       1 node_affinity.go:75] Processing node: "ip-??-??-83-142.us-west-2.compute.internal"
I0211 15:05:06.843632       1 request.go:581] Throttling request took 189.614987ms, request: GET:https://0.1.0.0:123/api/v1/pods?fieldSelector=spec.nodeName%3Dip-??-??-83-142.us-west-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0211 15:05:06.855780       1 node.go:137] Pod name-env-ambassador-f859b7448-8j6x9 fits on node ip-??-??-83-142.us-west-2.compute.internal
I0211 15:05:06.855837       1 node.go:137] Pod prometheus-alertmanager-7d95895fdb-z86d5 fits on node ip-??-??-83-142.us-west-2.compute.internal
I0211 15:05:06.855879       1 node.go:137] Pod prometheus-pushgateway-bc74885b9-9jfnw fits on node ip-??-??-83-142.us-west-2.compute.internal
I0211 15:05:06.855992       1 node_affinity.go:75] Processing node: "ip-??-??-91-42.us-west-2.compute.internal"
I0211 15:05:07.043614       1 request.go:581] Throttling request took 187.476659ms, request: GET:https://0.1.0.0:123/api/v1/pods?fieldSelector=spec.nodeName%3Dip-??-??-91-42.us-west-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0211 15:05:07.053614       1 node.go:137] Pod name-env-mongodb-primary-0 fits on node ip-??-??-91-42.us-west-2.compute.internal
I0211 15:05:07.053663       1 node_affinity.go:107] Evicted 2 pods

But from looking at the log above there does NOT seem to be an eviction mentioned in the RemovePodsViolatingNodeAffinity (node_affinity), with "fits on node"
and the fact we're NOT using the affinity "requiredDuringSchedulingIgnoredDuringExecution"

So why does it mention the last line node_affinity.go:107] Evicted 2 pods?

Is it adding the number already evicted from the previous strategy LowNodeUtilization for a final total?

Describe the solution you'd like
If we could have something along the lines of a eviction total for:

Specific Strategy
Total of all Strategies

As mentioned by @damemi on slack:
"Node affinity does only check requiredDuringSchedulingIgnoredDuringExecution, but I believe all strategies share a common podEvictor, which may be logging the total number of pods evicted across all strategies."

Describe alternatives you've considered
N/A

What version of descheduler are you using?

descheduler version: 0.19.0

Additional context
As discussed on slack - https://kubernetes.slack.com/archives/C09TP78DV/p1613593842089500

The text was updated successfully, but these errors were encountered:

damemi · 2021-02-18T14:54:02Z

/cc @ingvagabund
it looks like node affinity logs podEvictor.TotalEvicted(), but since podEvictor is shared this would be referring to the total across all strategies correct?
/kind bug

ingvagabund · 2021-02-19T09:56:27Z

since podEvictor is shared this would be referring to the total across all strategies correct?

Yes, that's correct. We might extend the pod evictor to also collect "metrics" for individual strategies and runs and report smth like:

Number of evicted pods (in this strategy): XX
Number of evicted pods in this run: XX
Total number of evicted pods in all strategies: XX

damemi · 2021-02-19T15:38:32Z

For now, I opened #502 to move this log message out of any strategies and into the main loop

damemi · 2021-02-19T21:51:38Z

I think the main issue here was closed, and opened a new one to track possibly improving podEvictor #503

martinwoods added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 18, 2021

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 18, 2021

damemi removed the kind/feature Categorizes issue or PR as related to a new feature. label Feb 18, 2021

damemi mentioned this issue Feb 19, 2021

Move 'total pods evicted' log message to main loop #502

Merged

k8s-ci-robot closed this as completed in #502 Feb 19, 2021

damemi mentioned this issue Feb 19, 2021

Improve podEvictor statistics #503

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Descheduler | Log Data Evicted Pod Totals - Specific Strategy & Total of all Strategies #501

Descheduler | Log Data Evicted Pod Totals - Specific Strategy & Total of all Strategies #501

martinwoods commented Feb 18, 2021 •

edited

Loading

damemi commented Feb 18, 2021

ingvagabund commented Feb 19, 2021 •

edited

Loading

damemi commented Feb 19, 2021

damemi commented Feb 19, 2021

Descheduler | Log Data Evicted Pod Totals - Specific Strategy & Total of all Strategies #501

Descheduler | Log Data Evicted Pod Totals - Specific Strategy & Total of all Strategies #501

Comments

martinwoods commented Feb 18, 2021 • edited Loading

damemi commented Feb 18, 2021

ingvagabund commented Feb 19, 2021 • edited Loading

damemi commented Feb 19, 2021

damemi commented Feb 19, 2021

martinwoods commented Feb 18, 2021 •

edited

Loading

ingvagabund commented Feb 19, 2021 •

edited

Loading