Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise an alert when the descheduler is not correctly configured #3100

Merged
merged 1 commit into from
Sep 26, 2024

Conversation

tiraboschi
Copy link
Member

@tiraboschi tiraboschi commented Sep 17, 2024

What this PR does / why we need it:
Descheduler is an optional operator and it's not installed by default nor as a dependency of HCO. When installed it can work on a cluster with KubeVirt only if configured enabling devEnableEvictionsInBackground profileCustomization that is disabled by default.
HCO will check if the descheduler is there, and if so it will check its configuration.
If the descheduler is misconfigured for the KubeVirt use case, HCO will trigger an alert making the cluster admin aware.
HCO is not directly amending the descheduler configuration since it's an external independent operator and directly controlling it is not a safe practice (it could bring to infinite loops fighting with other operators and so on).

Reviewer Checklist

Reviewers are supposed to review the PR for every aspect below one by one. To check an item means the PR is either "OK" or "Not Applicable" in terms of that item. All items are supposed to be checked before merging a PR.

  • PR Message
  • Commit Messages
  • How to test
  • Unit Tests
  • Functional Tests
  • User Documentation
  • Developer Documentation
  • Upgrade Scenario
  • Uninstallation Scenario
  • Backward Compatibility
  • Troubleshooting Friendly

Jira Ticket:

https://issues.redhat.com/browse/CNV-47165

Release note:

Raise an alert when the descheduler is not correctly configured for KubeVirt

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Sep 17, 2024
@coveralls
Copy link
Collaborator

coveralls commented Sep 17, 2024

Pull Request Test Coverage Report for Build 11052150087

Details

  • 59 of 124 (47.58%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.7%) to 85.046%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/util/cluster.go 38 41 92.68%
controllers/descheduler/descheduler_controller.go 8 37 21.62%
controllers/crd/crd_controller.go 13 46 28.26%
Totals Coverage Status
Change from base Build 11049543701: -0.7%
Covered Lines: 5397
Relevant Lines: 6346

💛 - Coveralls

@tiraboschi tiraboschi force-pushed the check_descheduler_conf branch 6 times, most recently from bb4642a to 6c93497 Compare September 18, 2024 14:22
@tiraboschi tiraboschi changed the title WIP: raise an alert when the descheduler is not correctly configured raise an alert when the descheduler is not correctly configured Sep 18, 2024
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 18, 2024
@tiraboschi tiraboschi force-pushed the check_descheduler_conf branch 2 times, most recently from 8392352 to fb50a15 Compare September 18, 2024 15:25
@avlitman
Copy link
Collaborator

avlitman commented Sep 18, 2024

/lgtm

of course we can do it later just small nits.

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 18, 2024
@hco-bot
Copy link
Collaborator

hco-bot commented Sep 18, 2024

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kubevirt-bot kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 18, 2024
@tiraboschi
Copy link
Member Author

Descheduler is an optional operator and it's not installed by default nor as a dependency of HCO. When installed it can work on a cluster with KubeVirt only if configured enabling devEnableEvictionsInBackground profileCustomization that is disabled by default.
HCO will check if the descheduler is there, and if so it will check its configuration.
If the descheduler is misconfigured for the KubeVirt use case, HCO will trigger an alert making the cluster admin aware.
HCO is not directly amending the descheduler configuration since it's an external independent operator and directly controlling it is not a safe practice (it could bring to infinite loops fighting with other operators and so on).

done

Copy link
Collaborator

@nunnatsa nunnatsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 26, 2024
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nunnatsa

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 26, 2024
@kubevirt-bot kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 26, 2024
Descheduler is an optional operator and it's
not installed by default nor as a dependency of HCO.
When installed it can work on a cluster with
KubeVirt only if configured enabling
devEnableEvictionsInBackground profileCustomization
that is disabled by default.
HCO will check if the descheduler is there,
and if so it will check its configuration.
If the descheduler is misconfigured for
the KubeVirt use case, HCO will trigger
an alert making the cluster admin aware.
HCO is not directly amending the descheduler
configuration since it's an external independent
operator and directly controlling it is not a safe
practice (it could bring to infinite loops fighting
with other operators and so on).

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
Copy link

sonarcloud bot commented Sep 26, 2024

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 26, 2024
@hco-bot
Copy link
Collaborator

hco-bot commented Sep 26, 2024

hco-e2e-upgrade-prev-operator-sdk-azure lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-aws
hco-e2e-operator-sdk-azure lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp
hco-e2e-operator-sdk-azure lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-aws

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-aws, ci/prow/hco-e2e-operator-sdk-gcp, ci/prow/hco-e2e-upgrade-prev-operator-sdk-aws

In response to this:

hco-e2e-upgrade-prev-operator-sdk-azure lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-aws
hco-e2e-operator-sdk-azure lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp
hco-e2e-operator-sdk-azure lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tiraboschi
Copy link
Member Author

tiraboschi commented Sep 26, 2024

ignoring slightly reduce coverage due to infra code for the two additional controllers
/override coverage/coveralls

@kubevirt-bot
Copy link
Contributor

@tiraboschi: Overrode contexts on behalf of tiraboschi: coverage/coveralls

In response to this:

ignoring slightly reduce coverage sue to infra code for the two additional controllers
/override coverage/coveralls

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Sep 26, 2024

hco-e2e-upgrade-prev-operator-sdk-sno-azure lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-aws

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-aws

In response to this:

hco-e2e-upgrade-prev-operator-sdk-sno-azure lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tiraboschi
Copy link
Member Author

/retest

Copy link

openshift-ci bot commented Sep 26, 2024

@tiraboschi: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/hco-e2e-operator-sdk-gcp 9a33f14 link true /test hco-e2e-operator-sdk-gcp
ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-aws 9a33f14 link false /test hco-e2e-upgrade-prev-operator-sdk-sno-aws
ci/prow/hco-e2e-kv-smoke-azure 9a33f14 link true /test hco-e2e-kv-smoke-azure

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hco-bot
Copy link
Collaborator

hco-bot commented Sep 26, 2024

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Sep 26, 2024

hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure
hco-e2e-upgrade-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-operator-sdk-azure, ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure

In response to this:

hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure
hco-e2e-upgrade-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kubevirt-bot kubevirt-bot merged commit ccdef4e into kubevirt:main Sep 26, 2024
31 checks passed
@tiraboschi tiraboschi deleted the check_descheduler_conf branch September 26, 2024 21:21
@tiraboschi
Copy link
Member Author

/cherry-pick release-1.13

@kubevirt-bot
Copy link
Contributor

@tiraboschi: new pull request created: #3118

In response to this:

/cherry-pick release-1.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants