Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor low node utilization #384

Merged

Conversation

ingvagabund
Copy link
Contributor

@ingvagabund ingvagabund commented Aug 21, 2020

Each time a pod's resource consumption is translated into a fraction (number in (0; 1) interval), some precision is lost. Instead, summing all resource consumption absolutely allows to compute the resource usage precisely (up to rounding losses when using Quantity type).

Storing resources consumption into a map of Quantities also allows to simplify arguments of functions used in the code.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 21, 2020
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 21, 2020
@ingvagabund ingvagabund changed the title wip: Refactor low node utilization Refactor low node utilization Aug 24, 2020
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 24, 2020
@ingvagabund ingvagabund force-pushed the refactor-low-node-utilization branch 3 times, most recently from 041d3ce to e9c1975 Compare August 24, 2020 11:21
@ingvagabund
Copy link
Contributor Author

@seanmalloy @damemi @lixiang233 PTAL

  • changing the utilization computation from fractions (in percentages) to resources that are left instead
  • refactoring the code to make it more generic (so it's more reusable when implementing HighNodeUtilization strategy)

@seanmalloy
Copy link
Member

I don't see anything wrong with these changes. @lixiang233 please review again when you have some time. Thanks!

@seanmalloy
Copy link
Member

/lgtm

@lixiang233 please review one more time when you get a chance. Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 26, 2020
@lixiang233
Copy link
Contributor

@seanmalloy this change looks pretty good to me.

@seanmalloy
Copy link
Member

/assign @damemi

@seanmalloy
Copy link
Member

/kind cleanup

@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Sep 2, 2020
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 2, 2020
@ingvagabund
Copy link
Contributor Author

@seanmalloy done PTAL

@seanmalloy
Copy link
Member

/lgtm
/assign @damemi

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 9, 2020
Copy link
Contributor

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ingvagabund could you add some more description to the commits/PR on what the goal is with this refactor, for future reference?

@ingvagabund
Copy link
Contributor Author

@ingvagabund could you add some more description to the commits/PR on what the goal is with this refactor, for future reference?

Description updated. Lemme know whether it's sufficient.

Copy link
Contributor

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely one of the more complex strategies and I'm thankful for the improvements to quality and readability @ingvagabund. Just a couple comments to make sure my understanding is correct

Comment on lines +104 to +113
func(node *v1.Node, usage NodeUsage) bool {
if nodeutil.IsNodeUnschedulable(node) {
klog.V(2).InfoS("Node is unschedulable, thus not considered as underutilized", "node", klog.KObj(node))
return false
}
return isNodeWithLowUtilization(usage)
},
func(node *v1.Node, usage NodeUsage) bool {
return isNodeAboveTargetUtilization(usage)
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to pass functions here? The checks seem simple enough that they could just be hardcoded into classifyNodes, unless something else is going to be re-using it with different filters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a preparation for HighNodeUtilization strategy where the upper/lower threshold conditions are different

pkg/descheduler/strategies/lownodeutilization.go Outdated Show resolved Hide resolved
tj += value
}
}
ti := nodes[i].usage[v1.ResourceMemory].Value() + nodes[i].usage[v1.ResourceCPU].MilliValue() + nodes[i].usage[v1.ResourcePods].Value()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these still make sense to add together now that you're storing absolute values instead of percentages? Like if I have a node with 500 pods, 1 mem, 1 cpu that will be ranked higher usage than 10 pods, 100 mem, 100 cpu right? But idk if that's a reasonable comparison

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the original computation. Though, it might be better to rather carry addition over relative units. E.g. pods/totalpods + cpu/totalcpu + memory/totalmemory. So nodes with higher percentage of consumed resources have higher priority over those with lower percentage. As a feature we might allow to set weight for each resource. E.g. in case memory is more expensive than cpu.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 11, 2020
@seanmalloy
Copy link
Member

@damemi please take a look when you have some time. I believe @ingvagabund has addressed all of your review feedback.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 12, 2020
Copy link
Contributor

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
Thanks @ingvagabund !

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damemi, ingvagabund

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 14, 2020
@k8s-ci-robot k8s-ci-robot merged commit c9cfeb3 into kubernetes-sigs:master Sep 14, 2020
@ingvagabund ingvagabund deleted the refactor-low-node-utilization branch September 14, 2020 08:27
briend pushed a commit to briend/descheduler that referenced this pull request Feb 11, 2022
…node-utilization

Refactor low node utilization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants