Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple alerts for HA on VM and host #10367

Open
shwstppr opened this issue Feb 11, 2025 · 5 comments
Open

Multiple alerts for HA on VM and host #10367

shwstppr opened this issue Feb 11, 2025 · 5 comments
Assignees
Milestone

Comments

@shwstppr
Copy link
Contributor

shwstppr commented Feb 11, 2025

problem

HA jobs for VM on host disconnection/down are not synchronized. CloudStack creates multiple HA jobs for the same VM when host goes into the Down state

versions

4.19.2.0-SNAPSHOT, other 4.19 and above versions as wel

The steps to reproduce the bug

  1. Create a cluster with 2 hypervisor hosts
  2. Create a HA enabled VM
  3. Make the host running VM go into down state (used https://linuxconfig.org/how-to-crash-your-linux-system-with-fork-bomb)
  4. After sometime when VM is started on a different hosts observe multiple HA alerts are generated for the same

Image

What to do about it?

No response

@DaanHoogland DaanHoogland added this to the 4.19.3 milestone Feb 11, 2025
@btzq
Copy link

btzq commented Feb 12, 2025

We actually face this alot too! Ive noticed this but wasnt sure if this was normal.

In your test, does CS creating multiple jobs also reduce the likelihood of a sucessful HA?

Asking cause we often encounter situations where:

  • In a large hypervisor, it it dies and HA is started, only a portion if VMs start up normally. The rest fails.
  • When CS is busy, i encounter situations where a VM starts normally, but CS thinks its timedout, and stops the VM. It causes the VM to flap on and off for 10-15 minutes until it stops.

@btzq
Copy link

btzq commented Feb 13, 2025

I am also now wondering. In CS i remember theres a global setting for max worker threads or ha threads.

Will fixing the duplicate HA jobs also improve the efficiency of these threads? 🤔

@shwstppr
Copy link
Contributor Author

@btzq In my testing with 2x KVM hosts and limited VMs, it didn't fail HA for those VMs.

Fixing this should help the performance of HA in my opinion as it would spend less time working on duplicate jobs

@DaanHoogland DaanHoogland self-assigned this Feb 26, 2025
@DaanHoogland
Copy link
Contributor

DaanHoogland commented Feb 26, 2025

@shwstppr , I tried to reproduce both with your fork-bomb method z(){ z|z& }; z& and with boldly powering off the host. I did get HA to kick in but only one job and one alert.
One extra thing I noticed is that the host going to disconnected state does not yield an alert, though I think that is worth one as well. We could include that in this work or create a new issue for it.

@DaanHoogland
Copy link
Contributor

One extra thing I noticed is that the host going to disconnected state does not yield an alert, though I think that is worth one as well. We could include that in this work or create a new issue for it.

I spoke too early: Image
it actually happens multiple times. I did not get that for the VM HA job (yet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants