Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add gomaxprocs limit, return node fit error and pod QoS in advance #1423

Closed

Conversation

fanhaouu
Copy link
Contributor

@fanhaouu fanhaouu commented Jun 2, 2024

This PR aims to address the following three issues:

  1. The default value of GOMAXPROCS is the number of CPU cores of the machine. After GOMAXPROCS is higher than the number of truly usable cores, the Go scheduler will keep switching OS threads, which will affect the performance of descheduler program;
  2. There are many filter policies in the current node fit, similar to the filter plugin in the scheduler. The correct logic should be: if any plugin does not meet the conditions, it should be terminated in advance. The current check logic is incomprehensible, and it actually performs all filter check logic, which is not only time-consuming, but also very unnecessary.(https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/descheduler/node/node.go#L107)
  3. If the pod's Qos has a value, return the QoS value early. For pods evicted within the descheduler, they are typically already scheduled onto nodes, and in most cases, the QoS value is already present on the pod's status. Therefore, it is unnecessary to start the determination process from scratch.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 2, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @fanhaouu. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign a7i for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fanhaouu fanhaouu changed the title add gomaxprocs limt and return node fit error in advance add gomaxprocs limit and return node fit error in advance Jun 2, 2024
@fanhaouu fanhaouu changed the title add gomaxprocs limit and return node fit error in advance add gomaxprocs limit, return node fit error and pod QoS in advance Jun 3, 2024
@fanhaouu
Copy link
Contributor Author

fanhaouu commented Jun 7, 2024

@a7i ,master, can you help me review this pr?

@a7i
Copy link
Contributor

a7i commented Jun 8, 2024

Hi @fanhaouu great contribution. Going to copy some of the maintainers for feedback as well:
/cc @jklaw90 @ingvagabund @damemi

My feedback:

  • regarding point 1: For GOMAXPROCS, as far as I know, we don't run any goroutines in Descheduler. How does this help?

  • regarding point 2: The idea is to present all NodeFit predicates. The predicate check is not in any particular order, so returning the first one may not present the whole picture to the cluster operator. In a cluster of 35k pods / 15k deployments, a Descheduler run takes a few (single digit) seconds. Given that the shortest frequency can be a minute, I'm not convinced this optimization is worth it. What do you think?

  • regarding point 3: I like it, that's a great change!

  • overall: if you could split this into 3 PRs, I think it would make it easier to provide feedback and request changes.

@fanhaouu
Copy link
Contributor Author

fanhaouu commented Jun 8, 2024

Hi @fanhaouu great contribution. Going to copy some of the maintainers for feedback as well: /cc @jklaw90 @ingvagabund @damemi

My feedback:

  • regarding point 1: For GOMAXPROCS, as far as I know, we don't run any goroutines in Descheduler. How does this help?
  • regarding point 2: The idea is to present all NodeFit predicates. The predicate check is not in any particular order, so returning the first one may not present the whole picture to the cluster operator. In a cluster of 35k pods / 15k deployments, a Descheduler run takes a few (single digit) seconds. Given that the shortest frequency can be a minute, I'm not convinced this optimization is worth it. What do you think?
  • regarding point 3: I like it, that's a great change!
  • overall: if you could split this into 3 PRs, I think it would make it easier to provide feedback and request changes.

Hi @fanhaouu great contribution. Going to copy some of the maintainers for feedback as well: /cc @jklaw90 @ingvagabund @damemi

My feedback:

  • regarding point 1: For GOMAXPROCS, as far as I know, we don't run any goroutines in Descheduler. How does this help?
  • regarding point 2: The idea is to present all NodeFit predicates. The predicate check is not in any particular order, so returning the first one may not present the whole picture to the cluster operator. In a cluster of 35k pods / 15k deployments, a Descheduler run takes a few (single digit) seconds. Given that the shortest frequency can be a minute, I'm not convinced this optimization is worth it. What do you think?
  • regarding point 3: I like it, that's a great change!
  • overall: if you could split this into 3 PRs, I think it would make it easier to provide feedback and request changes.

master, thank you for your reply.

point1. At present, descheduler has some goroutine, but the number is not very large, so gomaxproc can be added or not, but the current go runtime cannot recognize the container environment, I think it is better to add it. For example, jvm runtime has long supported container environments; (As descheduler in our company has a lot of goroutine, the performance can be improved significantly after adding this restriction, so I didn't remove this submission from the pr)

point2. Because the default loop time is 5min, this time can be adjusted. The larger the cluster, the longer the nodeFit time, so the full loop time can easily exceed 5min. What if the user sets the loop time to 1 minute? Comparing one by one I think is very, very unnecessary, too time-consuming, we should be the same as the filter policy in the scheduler.

In short, if we can make the program better, faster, without affecting any functionality, I think it makes a lot of sense.

We are also developing a descheduler cache mechanism internally. Currently, descheduler pulls weight data for filtering at policy runtime, which can actually cache a good part of the data in advance, just like the cache in the scheduler. If our company is stable online, I will also contribute to our descheduler community. We can review it together then.

@ingvagabund
Copy link
Contributor

As Amir mentioned would you please break the PR into three separate PRs? Some of the suggested changes deserves a dedicated discussion. Wrt. NodeFit I am in the process of composing a KEP: #1421. This sounds like a good use case to include in the proposal.

@fanhaouu
Copy link
Contributor Author

fanhaouu commented Jun 9, 2024

hi, masters, i have split this into 3 PRs, looking forward to your feedback:

  1. add GOMAXPROCS limit #1434
  2. return pod qos in advance #1435
  3. return node fit error in advance #1436

/cc @a7i @jklaw90 @ingvagabund @damemi

@k8s-ci-robot k8s-ci-robot requested a review from a7i June 9, 2024 16:46
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 28, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 26, 2024
@a7i
Copy link
Contributor

a7i commented Sep 26, 2024

/close

it's already split into 3 PRs

@k8s-ci-robot
Copy link
Contributor

@a7i: Closed this PR.

In response to this:

/close

it's already split into 3 PRs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants