Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync CNV runbook HCOMisconfiguredDescheduler.md (Updated at 2024-09-19 09:04:21 +0000 UTC) #213

Conversation

hco-bot
Copy link
Contributor

@hco-bot hco-bot commented Sep 20, 2024

This is an automated PR by 'tools/openshift-virtualization-operator/runbook-sync'.

CNV runbook 'HCOMisconfiguredDescheduler.md' was updated in upstream https://github.com/kubevirt/monitoring at 2024-09-19 09:04:21 +0000 UTC.
This PR syncs the runbook in this repository to contain all new added changes.

/cc @machadovilaca

Copy link
Contributor

openshift-ci bot commented Sep 20, 2024

Hi @hco-bot. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 20, 2024
@sradco
Copy link
Contributor

sradco commented Oct 31, 2024

@jherrman Hi, This PR needs a review please.
@machadovilaca can we add @jherrman as a reviewer to these PRs.
@jherrman would that be correct ?

@sradco
Copy link
Contributor

sradco commented Oct 31, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 31, 2024
Copy link
Contributor

openshift-ci bot commented Oct 31, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hco-bot, sradco

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 31, 2024
Copy link
Contributor

openshift-ci bot commented Oct 31, 2024

@hco-bot: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 3197431 into openshift:master Oct 31, 2024
2 checks passed
Copy link

@jherrman jherrman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of suggestions for better clarity, consistency, and a bit of grammar :-)


## Meaning

A Descheduler is a OpenShift Container Platform application that causes the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> an OpenShift...

Also probably lower-case case d and italics for descheduler


The descheduler uses the OpenShift Container Platform eviction API to evict
pods, and receives
feedback from `kube-api` whether the eviction request was granted or not.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> ...from kube-api on whether...

Comment on lines +12 to +15
On the other side, in order to keep VM live and trigger live-migration,
OpenShift Virtualization handles eviction requests in a custom way and
unfortunately
a live migration takes time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> In contrast, to keep a VM live and trigger a live migration, OpenShift Virtualization handles eviction requests in a custom manner, and a live migration takes time to perform.

Comment on lines +16 to +20
So from the descheduler's point of view, `virt-launcher` pods fail to be
evicted, but they actually migrating to another node in background.
So the way OpenShift Virtualization handles eviction requests causes the
descheduler to
make wrong decisions and take wrong actions that could destabilize the cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> Therefore, when a virt-launcher pod is migrating to another node in the background, the descheduler detects this as a pod that failed to be evicted. As a consequence, the manner in which OpenShift Virtualization handles eviction requests causes the descheduler to make incorrect decisions and take incorrect actions that might destabilize the cluster.

descheduler to
make wrong decisions and take wrong actions that could destabilize the cluster.

To correctly handle the special case of `VM` pod evicted triggering a live

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> ... the special case of an evicted VM pod triggering...

$ oc get -n openshift-kube-descheduler-operator KubeDescheduler cluster -o yaml
```

looking for:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> 2. Search for the following lines in the CR:

Comment on lines +50 to +51
If not there, the `Kube Descheduler Operator` is not correctly configured
to work alongside OpenShift Virtualization.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> If these lines are not present, the...

Comment on lines +55 to +62
Set:
```yaml
spec:
profileCustomizations:
devEnableEvictionsInBackground: true
```
on the CR for the `Kube Descheduler Operator` or
remove the `Kube Descheduler Operator`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> Do one of the following:

  • Add the following lines to the CR for Kube Descheduler Operator:

     spec:
        profileCustomizations:
           devEnableEvictionsInBackground: true
  • Remove the Kube Descheduler Operator

on the CR for the `Kube Descheduler Operator` or
remove the `Kube Descheduler Operator`.

Please notice that `EvictionsInBackground` is an alpha feature,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> Note that devEnableEvictionsInBackground...

Comment on lines +65 to +66
and it's subject to change and currently provided as an
experimental feature.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> and as such, it is provided as an experimental feature and is subject to change.

jherrman added a commit to jherrman/monitoring that referenced this pull request Nov 1, 2024
Based on openshift/runbooks#213

Signed-off-by: Jiri Herrmann <jherrman@redhat.com>
@jherrman
Copy link

jherrman commented Nov 1, 2024

Created an upstream PR for the changes, as Shirly suggested: kubevirt/monitoring#273

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants