Multiple alerts for HA on VM and host #10367

shwstppr · 2025-02-11T07:35:13Z

problem

HA jobs for VM on host disconnection/down are not synchronized. CloudStack creates multiple HA jobs for the same VM when host goes into the Down state

versions

4.19.2.0-SNAPSHOT, other 4.19 and above versions as wel

The steps to reproduce the bug

Create a cluster with 2 hypervisor hosts
Create a HA enabled VM
Make the host running VM go into down state (used https://linuxconfig.org/how-to-crash-your-linux-system-with-fork-bomb)
After sometime when VM is started on a different hosts observe multiple HA alerts are generated for the same

What to do about it?

No response

btzq · 2025-02-12T15:18:20Z

We actually face this alot too! Ive noticed this but wasnt sure if this was normal.

In your test, does CS creating multiple jobs also reduce the likelihood of a sucessful HA?

Asking cause we often encounter situations where:

In a large hypervisor, it it dies and HA is started, only a portion if VMs start up normally. The rest fails.
When CS is busy, i encounter situations where a VM starts normally, but CS thinks its timedout, and stops the VM. It causes the VM to flap on and off for 10-15 minutes until it stops.

btzq · 2025-02-13T00:55:45Z

I am also now wondering. In CS i remember theres a global setting for max worker threads or ha threads.

Will fixing the duplicate HA jobs also improve the efficiency of these threads? 🤔

shwstppr · 2025-02-13T11:20:04Z

@btzq In my testing with 2x KVM hosts and limited VMs, it didn't fail HA for those VMs.

Fixing this should help the performance of HA in my opinion as it would spend less time working on duplicate jobs

DaanHoogland · 2025-02-26T12:18:13Z

@shwstppr , I tried to reproduce both with your fork-bomb method z(){ z|z& }; z& and with boldly powering off the host. I did get HA to kick in but only one job and one alert.
One extra thing I noticed is that the host going to disconnected state does not yield an alert, though I think that is worth one as well. We could include that in this work or create a new issue for it.

DaanHoogland · 2025-02-26T12:29:54Z

One extra thing I noticed is that the host going to disconnected state does not yield an alert, though I think that is worth one as well. We could include that in this work or create a new issue for it.

I spoke too early:
it actually happens multiple times. I did not get that for the VM HA job (yet)

DaanHoogland added this to the 4.19.3 milestone Feb 11, 2025

DaanHoogland self-assigned this Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple alerts for HA on VM and host #10367

Multiple alerts for HA on VM and host #10367

shwstppr commented Feb 11, 2025 •

edited

Loading

btzq commented Feb 12, 2025

btzq commented Feb 13, 2025

shwstppr commented Feb 13, 2025

DaanHoogland commented Feb 26, 2025 •

edited

Loading

DaanHoogland commented Feb 26, 2025

Multiple alerts for HA on VM and host #10367

Multiple alerts for HA on VM and host #10367

Comments

shwstppr commented Feb 11, 2025 • edited Loading

problem

versions

The steps to reproduce the bug

What to do about it?

btzq commented Feb 12, 2025

btzq commented Feb 13, 2025

shwstppr commented Feb 13, 2025

DaanHoogland commented Feb 26, 2025 • edited Loading

DaanHoogland commented Feb 26, 2025

shwstppr commented Feb 11, 2025 •

edited

Loading

DaanHoogland commented Feb 26, 2025 •

edited

Loading