Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HCOMisconfiguredDescheduler runbook #266

Merged
merged 1 commit into from
Sep 19, 2024

Conversation

tiraboschi
Copy link
Member

@tiraboschi tiraboschi commented Sep 18, 2024

What this PR does / why we need it:
Add a runbook for HCOMisconfiguredDescheduler

A Descheduler is a Kubernetes application that causes the control plane to re-arrange the workloads in a better way.
It operates every pre-defined period and goes back to sleep after it had performed its job.

The descheduler uses the Kubernetes eviction API to evict pods, and receives feedback from kube-api whether the eviction request was granted or not.
On the other side, in order to keep VM live and trigger live-migration, KubeVirt handles eviction requests in a custom way and unfortunately a live migration takes time.
So from the descheduler's point of view, virt-launcher pods fail to be evicted, but they actually migrating to another node in background.
The descheduler notes the failure to evict the virt-launcher pod and keeps trying to evict other pods, typically resulting in it attempting to evict substantially all virt-launcher pods from the node triggering a migration storm.
In other words, the way KubeVirt handles eviction requests causes the descheduler to make wrong decisions and take wrong actions that could destabilize the cluster.
Using the descheduler operator with the LowNodeUtilization strategy results in unstable/oscillatory behavior if the descheduler is used in this way to migrate VMs.
To correctly handle the special case of VM pod evicted triggering a live migration to another node, the Kube Descheduler Operator introduced a profileCustomizations named devEnableEvictionsInBackground
which is currently considered an alpha feature on Kube Descheduler Operator side.
to prevent unexpected behaviours, if the Kube Descheduler Operator is installed and configured alongside HCO, HCO will check its configuration looking for the presence of devEnableEvictionsInBackground profileCustomizations eventually
suggesting to the cluster admin to fix the configuration of the Kube Descheduler Operator via an alert and its linked runbook.

In order to fix the configuration of the Kube Descheduler Operator to be suitable also for the KubeVirt use case,
something like:

apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
metadata:
  name: cluster
  namespace: openshift-kube-descheduler-operator
spec:
  profileCustomizations:
    devEnableEvictionsInBackground: true

should be merged in its configuration.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes https://issues.redhat.com/browse/CNV-48734

Special notes for your reviewer:
It's a runbook for kubevirt/hyperconverged-cluster-operator#3100

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note:

Add HCOMisconfiguredDescheduler runbook

@kubevirt-bot kubevirt-bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Sep 18, 2024
@tiraboschi tiraboschi force-pushed the add_HCOMisconfiguredDescheduler branch 3 times, most recently from 551ea83 to bb727cf Compare September 18, 2024 21:11
<!--USstart-->
If you cannot resolve the issue, see the following resources:

- [OKD Help](https://www.okd.io/help/)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

page not found

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it moved to: https://okd.io/docs/community/help.

Fixing only here now.
But we have to fix it (in another PR) on many other runbooks:
docs/deprecated_runbooks/KubeMacPoolDown.md:- OKD Help
docs/deprecated_runbooks/KubevirtHyperconvergedClusterOperatorNMOInUseAlert.md:- OKD Help
docs/deprecated_runbooks/VirtualMachineCRCErrors.md:- OKD Help
docs/runbooks/CDIDataImportCronOutdated.md:- OKD Help
docs/runbooks/CDIDataVolumeUnusualRestartCount.md:- OKD Help
docs/runbooks/CDIDefaultStorageClassDegraded.md:- OKD Help
docs/runbooks/CDIMultipleDefaultVirtStorageClasses.md:- OKD Help
docs/runbooks/CDINoDefaultStorageClass.md:- OKD Help
docs/runbooks/CDINotReady.md:- OKD Help
docs/runbooks/CDIOperatorDown.md:- OKD Help
docs/runbooks/CDIStorageProfilesIncomplete.md:- OKD Help
docs/runbooks/CnaoDown.md:- OKD Help
docs/runbooks/HPPNotReady.md:- OKD Help
docs/runbooks/HPPOperatorDown.md:- OKD Help
docs/runbooks/HPPSharingPoolPathWithOS.md:- OKD Help
docs/runbooks/KubeVirtDeprecatedAPIRequested.md:- OKD Help
docs/runbooks/KubeVirtNoAvailableNodesToRunVMs.md:- OKD Help
docs/runbooks/KubeVirtVMIExcessiveMigrations.md:- OKD Help
docs/runbooks/KubemacpoolDown.md:- OKD Help
docs/runbooks/LowReadyVirtControllersCount.md:- OKD Help
docs/runbooks/LowReadyVirtOperatorsCount.md:- OKD Help
docs/runbooks/LowVirtAPICount.md:- OKD Help
docs/runbooks/LowVirtControllersCount.md:- OKD Help
docs/runbooks/LowVirtOperatorCount.md:- OKD Help
docs/runbooks/NetworkAddonsConfigNotReady.md:- OKD Help
docs/runbooks/NoLeadingVirtOperator.md:- OKD Help
docs/runbooks/NoReadyVirtController.md:- OKD Help
docs/runbooks/NoReadyVirtOperator.md:- OKD Help
docs/runbooks/OrphanedVirtualMachineInstances.md:- OKD Help
docs/runbooks/OutdatedVirtualMachineInstanceWorkloads.md:- OKD Help
docs/runbooks/SSPDown.md:- OKD Help
docs/runbooks/SSPFailingToReconcile.md:- OKD Help
docs/runbooks/SSPHighRateRejectedVms.md:- OKD Help
docs/runbooks/SSPOperatorDown.md:- OKD Help
docs/runbooks/SSPTemplateValidatorDown.md:- OKD Help
docs/runbooks/VMStorageClassWarning.md:- OKD Help
docs/runbooks/VirtAPIDown.md:- OKD Help
docs/runbooks/VirtApiRESTErrorsBurst.md:- OKD Help
docs/runbooks/VirtApiRESTErrorsHigh.md:- OKD Help
docs/runbooks/VirtControllerDown.md:- OKD Help
docs/runbooks/VirtControllerRESTErrorsBurst.md:- OKD Help
docs/runbooks/VirtControllerRESTErrorsHigh.md:- OKD Help
docs/runbooks/VirtHandlerRESTErrorsBurst.md:- OKD Help
docs/runbooks/VirtHandlerRESTErrorsHigh.md:- OKD Help
docs/runbooks/VirtOperatorDown.md:- OKD Help
docs/runbooks/VirtOperatorRESTErrorsBurst.md:- OKD Help
docs/runbooks/VirtOperatorRESTErrorsHigh.md:- OKD Help

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiraboschi is it possible to add a redirect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure, we should try asking on https://github.com/okd-project/okd-web/

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
@tiraboschi tiraboschi force-pushed the add_HCOMisconfiguredDescheduler branch from bb727cf to bff087e Compare September 19, 2024 08:47
@sradco sradco merged commit 84e3a01 into kubevirt:main Sep 19, 2024
2 checks passed
github-actions bot pushed a commit that referenced this pull request Sep 19, 2024
…nfiguredDescheduler

Add HCOMisconfiguredDescheduler runbook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants