Descheduler | LowNodeUtilization Strategy - all nodes over utilized, no obvious message stating pods cannot be evicted #500

martinwoods · 2021-02-18T14:24:49Z

Is your feature request related to a problem? Please describe.

It's not obvious from looking at the log data from the container that there are no nodes available to evict pods to

We have the example log data where a number of nodes show "is over utilized with usage" but because all nodes are over utilized NO pods are evicted

I0217 12:00:13.698734       1 lownodeutilization.go:200] Node "ip-??-??-??-??.eu-west-1.compute.internal" is over utilized with usage: api.ResourceThresholds{"cpu":91.42857142857143, "memory":41.426692748347975, "pods":86.20689655172414}
I0217 12:00:13.698915       1 lownodeutilization.go:200] Node "ip-??-??-??-??.eu-west-1.compute.internal" is over utilized with usage: api.ResourceThresholds{"cpu":99.14285714285714, "memory":36.14741291652306, "pods":37.93103448275862}
I0217 12:00:13.699248       1 lownodeutilization.go:200] Node "ip-??-??-??-??.eu-west-1.compute.internal" is over utilized with usage: api.ResourceThresholds{"cpu":83.42857142857143, "memory":63.22589876113411, "pods":37.142857142857146}

No pods being evicted is fine - looking at the cluster it makes sense, but initially it was NOT so clear.

Would it be wise to add some log data to the logging that this is what is happening here?

Describe the solution you'd like
As suggest by @damemi (discussed on kubernetes slack channel #sig-scheduling) we could add the example text "All nodes overutilized, no evictions possible".

We might even be able to optimize the strategy for unnecessary work with that check too

Describe alternatives you've considered
N/A

What version of descheduler are you using?

descheduler version: 0.19.0

Additional context
As discussed here on slack - https://kubernetes.slack.com/archives/C09TP78DV/p1613593842089500

The text was updated successfully, but these errors were encountered:

lixiang233 · 2021-02-19T03:54:18Z

@martinwoods It's supposed to log No node is underutilized, nothing to do here......, can you check your logs again to make sure that there's no more message? FYI, we do a few check here to avoid some unnecessary work.

martinwoods · 2021-02-19T10:28:24Z

@lixiang233 thanks for the reply

Yes you're right the message No node is underutilized, nothing to do here is logged e.g.

I0219 09:45:08.502255       1 lownodeutilization.go:105] No node is underutilized, nothing to do here, you might tune your thresholds further

And I have seen this message before raising this issue and I'll explain why I raised this even though this message does exist

Lets take my example yaml:

    LowNodeUtilization:
      enabled: true
      params:
        nodeResourceUtilizationThresholds:
          thresholds:
            cpu: 20
            memory: 20
            pods: 20
          targetThresholds:
            cpu: 70
            memory: 70
            pods: 75

As per the functionality there are two types of thresholds, the first is thresholds and the second targetThresholds

From looking at the log the message No node is underutilized, nothing to do here would refer to the first threshold thresholds as in Underutilized

As you can see from the below log the message is preceeded by the settings for underutilization:

    I0219 09:45:08.502197       1 lownodeutilization.go:101] Criteria for a node under utilization: CPU: 20, Mem: 20, Pods: 20
    I0219 09:45:08.502255       1 lownodeutilization.go:105] No node is underutilized, nothing to do here, you might tune your thresholds further

And the second threshold targetThresholds should have a different message as it's related to Overutilization as in the settings below

    targetThresholds:
        cpu: 70
        memory: 70
        pods: 75

As per the github documentation here:

"There is another configurable threshold, targetThresholds, that is used to compute those potential nodes from where pods could be evicted.
If a node's usage is above targetThreshold for any (cpu, memory, or number of pods), the node is considered over utilized"

Is my understanding correct?

If it is, again I don't think it's clear from the log data that the nodes which are overutilized don't have the cabability to evict pods to elsewhere

Look forward to your reply

Many thanks

damemi · 2021-02-19T14:16:41Z

Perhaps the documentation could be more explicit about the fact that overutilized nodes only evict toward underutilized nodes. If that distinction is clear, I think the logs tell you everything you need to know

lixiang233 · 2021-02-20T02:15:23Z

@martinwoods In this strategy, nodes are devided into 3 types: underutilized, overutilized and appropriately utilized. As @damemi said, pods can only be evicted from overutilized nodes to underutilized nodes, thus we should make sure that both number of underutilized nodes and overutilized nodes are non-zero before evicting any pod, number of underutilized nodes is checked first, so in your case, you only got messages related to underutilization.

@damemi +1 for mention this in documentation and I noticed that current logs can be optimized, I'll help to fix this.

martinwoods added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 18, 2021

lixiang233 mentioned this issue Feb 20, 2021

Log and README optimization for LowNodeUtilization #504

Merged

k8s-ci-robot closed this as completed in #504 Feb 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Descheduler | LowNodeUtilization Strategy - all nodes over utilized, no obvious message stating pods cannot be evicted #500

Descheduler | LowNodeUtilization Strategy - all nodes over utilized, no obvious message stating pods cannot be evicted #500

martinwoods commented Feb 18, 2021 •

edited

Loading

lixiang233 commented Feb 19, 2021

martinwoods commented Feb 19, 2021

damemi commented Feb 19, 2021

lixiang233 commented Feb 20, 2021 •

edited

Loading

Descheduler | LowNodeUtilization Strategy - all nodes over utilized, no obvious message stating pods cannot be evicted #500

Descheduler | LowNodeUtilization Strategy - all nodes over utilized, no obvious message stating pods cannot be evicted #500

Comments

martinwoods commented Feb 18, 2021 • edited Loading

lixiang233 commented Feb 19, 2021

martinwoods commented Feb 19, 2021

damemi commented Feb 19, 2021

lixiang233 commented Feb 20, 2021 • edited Loading

martinwoods commented Feb 18, 2021 •

edited

Loading

lixiang233 commented Feb 20, 2021 •

edited

Loading