Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Descheduler | LowNodeUtilization Strategy - all nodes over utilized, no obvious message stating pods cannot be evicted #500

Closed
martinwoods opened this issue Feb 18, 2021 · 4 comments · Fixed by #504
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@martinwoods
Copy link

martinwoods commented Feb 18, 2021

Is your feature request related to a problem? Please describe.

It's not obvious from looking at the log data from the container that there are no nodes available to evict pods to

We have the example log data where a number of nodes show "is over utilized with usage" but because all nodes are over utilized NO pods are evicted

I0217 12:00:13.698734       1 lownodeutilization.go:200] Node "ip-??-??-??-??.eu-west-1.compute.internal" is over utilized with usage: api.ResourceThresholds{"cpu":91.42857142857143, "memory":41.426692748347975, "pods":86.20689655172414}
I0217 12:00:13.698915       1 lownodeutilization.go:200] Node "ip-??-??-??-??.eu-west-1.compute.internal" is over utilized with usage: api.ResourceThresholds{"cpu":99.14285714285714, "memory":36.14741291652306, "pods":37.93103448275862}
I0217 12:00:13.699248       1 lownodeutilization.go:200] Node "ip-??-??-??-??.eu-west-1.compute.internal" is over utilized with usage: api.ResourceThresholds{"cpu":83.42857142857143, "memory":63.22589876113411, "pods":37.142857142857146}

No pods being evicted is fine - looking at the cluster it makes sense, but initially it was NOT so clear.

Would it be wise to add some log data to the logging that this is what is happening here?

Describe the solution you'd like
As suggest by @damemi (discussed on kubernetes slack channel #sig-scheduling) we could add the example text "All nodes overutilized, no evictions possible".

We might even be able to optimize the strategy for unnecessary work with that check too

Describe alternatives you've considered
N/A

What version of descheduler are you using?

descheduler version: 0.19.0

Additional context
As discussed here on slack - https://kubernetes.slack.com/archives/C09TP78DV/p1613593842089500

@martinwoods martinwoods added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 18, 2021
@lixiang233
Copy link
Contributor

@martinwoods It's supposed to log No node is underutilized, nothing to do here......, can you check your logs again to make sure that there's no more message? FYI, we do a few check here to avoid some unnecessary work.

@martinwoods
Copy link
Author

@lixiang233 thanks for the reply

Yes you're right the message No node is underutilized, nothing to do here is logged e.g.

I0219 09:45:08.502255       1 lownodeutilization.go:105] No node is underutilized, nothing to do here, you might tune your thresholds further

And I have seen this message before raising this issue and I'll explain why I raised this even though this message does exist

Lets take my example yaml:

    LowNodeUtilization:
      enabled: true
      params:
        nodeResourceUtilizationThresholds:
          thresholds:
            cpu: 20
            memory: 20
            pods: 20
          targetThresholds:
            cpu: 70
            memory: 70
            pods: 75

As per the functionality there are two types of thresholds, the first is thresholds and the second targetThresholds

From looking at the log the message No node is underutilized, nothing to do here would refer to the first threshold thresholds as in Underutilized

As you can see from the below log the message is preceeded by the settings for underutilization:

    I0219 09:45:08.502197       1 lownodeutilization.go:101] Criteria for a node under utilization: CPU: 20, Mem: 20, Pods: 20
    I0219 09:45:08.502255       1 lownodeutilization.go:105] No node is underutilized, nothing to do here, you might tune your thresholds further

And the second threshold targetThresholds should have a different message as it's related to Overutilization as in the settings below

    targetThresholds:
        cpu: 70
        memory: 70
        pods: 75

As per the github documentation here:

"There is another configurable threshold, targetThresholds, that is used to compute those potential nodes from where pods could be evicted.
If a node's usage is above targetThreshold for any (cpu, memory, or number of pods), the node is considered over utilized"

Is my understanding correct?

If it is, again I don't think it's clear from the log data that the nodes which are overutilized don't have the cabability to evict pods to elsewhere

Look forward to your reply

Many thanks

@damemi
Copy link
Contributor

damemi commented Feb 19, 2021

Perhaps the documentation could be more explicit about the fact that overutilized nodes only evict toward underutilized nodes. If that distinction is clear, I think the logs tell you everything you need to know

@lixiang233
Copy link
Contributor

lixiang233 commented Feb 20, 2021

@martinwoods In this strategy, nodes are devided into 3 types: underutilized, overutilized and appropriately utilized. As @damemi said, pods can only be evicted from overutilized nodes to underutilized nodes, thus we should make sure that both number of underutilized nodes and overutilized nodes are non-zero before evicting any pod, number of underutilized nodes is checked first, so in your case, you only got messages related to underutilization.

@damemi +1 for mention this in documentation and I noticed that current logs can be optimized, I'll help to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants