-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add filtering out DS pods from scale-up, refactor default pod list processor #5442
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BigDarkClown, towca The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
rishabh-11
added a commit
to rishabh-11/autoscaler
that referenced
this pull request
May 8, 2023
* recommendation post processor for integer CPU * update flag cpu-integer-post-processor-enabled description * directly comparing recommendation with rescaling the values * Capping post-processor should always be the last one * add documentation for CPU integer post processor * fix typo in FAQ * Uncomment myself now that I am a Kubernetes org member. * aws: allow setting max retries from AWS_MAX_ATTEMPTS env var * Move PredicateChecker initialization before processors initialization * allow setting kuberentes client burst and qps to avoid rate limiting * State expectations around cloudprovider OWNERS * Stop asking PR creators to name modified components Rely on labels instead. * Cap logs logged by HintingSimulator. To avoid excesive logging of the HintingSimulator we should use klogx to cap the number of logged lines depending on the verbosity level. * Configurable difference ratios * fixing linting pipeline * fix: alicloud example yaml config * Mark VPA KEP 4902 as deprecated in favor of using PDB feature * Added the RBAC Permission to alicloud. * update clusterapi readme with table of contents this change will make navigating the readme easier for users. * add a note to clusterapi readme about ignored labels this change adds a section to the readme that provides advice for clusterapi users about which labels they might want to ignore when using the balance similar node groups flag on various cloud providers. * remove clusterapi nodegroupset processor as discussed with the cluster api community[0], the nodegroupset processor is being removed from the clusterapi provider implementation in favor of instructing our community on the use of the --balancing-ignore-label flag. due to the wide variety of provider infrastructures that clusterapi can be deployed on, we would prefer to not encode all of these labels in the autoscaler itself. see the linked recording for more information. [0] https://www.youtube.com/watch?v=jbhca_9oPuQ * gRPC expander: allow realistic server responses, and log errors Expander requests' payloads can be rather heavy under upscale pressure, as they're compounding all candidates options and unschedulable pods that could fit each options. Expander responses are a subset of the requests' payload items. We're allowing ourself to send arbitrary payload sizes (gRPC `defaultClientMaxSendMessageSize` is `math.MaxInt32`), but we're prone to drop expander servers responses to the floor, due to the `4MiB` `defaultClientMaxReceiveMessageSize`. The arbitrary 128MiB value is meant to be huge (enough to support eg. several dozen fat 1MiB pods) but not unlimited. Let me know if you'd rather see that turned to be a command line flag, or an other value. Also logging the possible gRPC call errors, as that of great help to diagnose that kind of issues. * Added link for RFC3986 in host.go * add an extra note to clusterapi readme about gpus this change adds a little more detail to ensure that users understand how to use the GPU label feature. * Rephrase error messages specific for GPUs. Some error messages related to custom resources assumes that the custom resource is always GPU. Rephrasing these error messages to clear up any confusion. * Add scale down candidates observer * Add previous scale down candidate sorting * OCI OKE autoscaller requires different set of permissions. Without this autoscaller is not working * Add os parameters to MigOsInfo interface * Cleanup version from osReserved calculator * Cleanup Version() method for gce Mig * Clean up DS utils: remove unused cluster snapshot and predicate checker * Add filtering out DS pods from scale-up, refactor default pod list processor * update vendor to v1.27.0-alpha.1 * return argument of framework.RunFilterPlugins has changed from a map[string]*Status to *Status. The func returns after the first failed plugin. Augmented schedulerbased.go to reflect this change. * update vendor to 1.26.3 * update go.mod --------- Co-authored-by: David Benque <david.benque@datadoghq.com> Co-authored-by: Junwon Kwon <snsxhdz@naver.com> Co-authored-by: jesse.millan <jesse.millan@oracle.com> Co-authored-by: Michael Grosser <michael@grosser.it> Co-authored-by: Kubernetes Prow Robot <k8s-ci-robot@users.noreply.github.com> Co-authored-by: Bartłomiej Wróblewski <bwroblewski@google.com> Co-authored-by: Daniel Kłobuszewski <danielmk@google.com> Co-authored-by: Daniel Gutowski <danielgutowski@google.com> Co-authored-by: bsoghigian <bsoghigian@microsoft.com> Co-authored-by: ZhenRan <zhenran.sz@gmail.com> Co-authored-by: Joachim Bartosik <jbartosik@google.com> Co-authored-by: shubham82 <shubham.kuchhal@india.nec.com> Co-authored-by: michael mccune <msm@opbstudios.com> Co-authored-by: Benjamin Pineau <benjamin.pineau@datadoghq.com> Co-authored-by: Hakan Bostan <hbostan@google.com> Co-authored-by: Yaroslava Serdiuk <yaroslava@google.com> Co-authored-by: janis.orlovs <janis@cwise.eu> Co-authored-by: Jayant Jain <jayantj@google.com>
rishabh-11
pushed a commit
to rishabh-11/autoscaler
that referenced
this pull request
May 8, 2023
Sync upto `kubernetes#5442` PR of upstream
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
approved
Indicates a PR has been approved by an approver from all required OWNERS files.
area/cluster-autoscaler
cncf-cla: yes
Indicates the PR's author has signed the CNCF CLA.
kind/feature
Categorizes issue or PR as related to a new feature.
lgtm
"Looks good to me", indicates that a PR is ready to be merged.
size/L
Denotes a PR that changes 100-499 lines, ignoring generated files.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR improves the way the CA handles unschedulable DS pods. These pods are unhelpable from the scale-upside, as each of them can only get scheduled on a specific existing node. Currently these pods are passed to scale-up logic, which increases the computational complexity, even though we know it is unnecessary.
This change filters out pending DS pods, which solves the described issue. It additionally refactors the defaultPodListProcessor to make it more readable.
Does this PR introduce a user-facing change?