Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP-47656 VM anti-affinity generate breach alert #5605

Conversation

LunfanZhang
Copy link
Contributor

This PR introduces the generation of breach alerts during operations on VMs or groups, triggered when all VMs within a group are breached (i.e., placed on the same host).

Test scenarios covered in this PR include:

  • Starting and resuming a VM triggers a breach alert if it is breached.
  • Stopping and suspending a VM dismisses the breach alert if it is no longer breached.
  • Migrating a VM, whether causing or resolving a breach, generates or dismisses the alert respectively.
  • Moving a VM from one group to another triggers a breach alert if either group is breached.
  • Destroying a VM group dismisses all related breach alerts.
  • Changing the license from Premium to Standard dismisses all breach alerts, while changing from Standard to Premium generates them.

Additionally, the pool_feature has been split into two parts, pool_feature and pool_feature_helpers, to address dependency cycles.
Error: dependency cycle between modules in _build/default/ocaml/xapi: Xha_statefile -> Sm -> Sm_exec -> Xapi_session -> Pool_features -> Xapi_vm_group_helpers -> Xapi_alert -> Xapi_http -> Xapi_session

@gangj
Copy link
Contributor

gangj commented Apr 30, 2024

The first commit is to resolve dependency cycling issue if we call Xapi_alerts inside pool_features:

Error: dependency cycle between modules in _build/default/ocaml/xapi:

   Xha_statefile
-> Sm
-> Sm_exec
-> Xapi_session
-> Pool_features
-> Xapi_vm_group_helpers
-> Xapi_alert
-> Xapi_http
-> Xapi_session

@LunfanZhang , would you please include such commit message in the first commit? Thanks~

And the title in the commit message is too long now.

@LunfanZhang LunfanZhang force-pushed the private/luzhan_official/CP-47656 branch from cedd82a to bcd8481 Compare April 30, 2024 10:34
@gangj
Copy link
Contributor

gangj commented Apr 30, 2024

@LunfanZhang For this code change, I think we should also make sure no alert is generated/dismissed when no change in the state of VM anti-affinity, like below scenarios should also be tested:

  1. toolstack restart
  2. vm state change not including the running state, like: vm paused -> shutdown

@LunfanZhang
Copy link
Contributor Author

LunfanZhang commented Apr 30, 2024

@LunfanZhang For this code change, I think we should also make sure no alert is generated/dismissed when no change in the state of VM anti-affinity, like below scenarios should also be tested:

  1. toolstack restart
  2. vm state change not including the running state, like: vm paused -> shutdown

toolstack restart and VM paused -> shutdown has been verified. there will no alerts generate

ocaml/xapi/pool_features.ml Outdated Show resolved Hide resolved
@gangj
Copy link
Contributor

gangj commented May 6, 2024

  • Stopping and suspending a VM dismisses the breach alert if it is no longer breached.

Stopping or suspending a VM might also trigger an alert.

@gangj
Copy link
Contributor

gangj commented May 6, 2024

  • Starting and resuming a VM triggers a breach alert if it is breached.

Starting or resuming a VM might also dismiss an alert.

@gangj
Copy link
Contributor

gangj commented May 6, 2024

  • Moving a VM from one group to another triggers a breach alert if either group is breached.

Moving a VM from one anti-affinity group to another might also dismiss an alert.

@LunfanZhang LunfanZhang closed this May 6, 2024
@LunfanZhang LunfanZhang reopened this May 6, 2024
ocaml/xapi/xapi_vm_group_helpers.mli Outdated Show resolved Hide resolved
ocaml/xapi/xapi_vm_group_helpers.ml Outdated Show resolved Hide resolved
ocaml/xapi/pool_features_helpers.mli Outdated Show resolved Hide resolved
ocaml/xapi/pool_features_helpers.ml Outdated Show resolved Hide resolved
ocaml/xapi/pool_features_helpers.ml Show resolved Hide resolved
ocaml/xapi/pool_features_helpers.ml Show resolved Hide resolved
ocaml/xapi/xapi_vm_group_helpers.ml Outdated Show resolved Hide resolved
@LunfanZhang LunfanZhang force-pushed the private/luzhan_official/CP-47656 branch from 0e7e391 to ce8d1a9 Compare May 8, 2024 05:48

val maybe_update_alerts_on_feature_change :
__context:Context.t
-> old_restrictions:(string * string) list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these assoc lists representing and what strings are expected here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assoc list here is represent the (Feature, restriction of this feature) which get from Db.Pool.get_restrictions, we want to get the restriction change of the anti-affinity feature, especially from restriction true -> false or false -> true, the breach alert should generate or remove.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be documented in the MLI file; might want to use an example to make the syntax for strings obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the MLI file with clearer parameter explanations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you push the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I pushed the change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated it again to include a simple example.

ocaml/xapi/xapi_vm_group_helpers.ml Show resolved Hide resolved
|> function
| [] | [_] ->
None
| h :: remaining ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this code? Please add a comment. It also looks not efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code snippet checks the breach state of a group and returns the host that is in breach. When there are no VMs or only one VM in the group, it is not a breach. and, when there are two or more VMs and all of them are on the same host, it is a breach, and the specific host is returned.
In this context, h represents the host of the first VM, and remaining represents the other hosts. We compare h with the remaining hosts to determine whether the group is breached or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easier to express this using sets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both approaches are fine, and the current method is more easily extendable if the breach rule were to change in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lindig, I have resolved all the comments above, if there are no more comments, could help me to approve this PR? :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked to add a code comment or some other documentation that explains this code but I don't see it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I misunderstood the comment you mentioned. I assumed you were asking to add a comment in this conversion. I have added the comments for this function now. I think it's fine.

@LunfanZhang LunfanZhang force-pushed the private/luzhan_official/CP-47656 branch 2 times, most recently from a04d146 to 68b33ab Compare May 11, 2024 06:00
@LunfanZhang LunfanZhang requested a review from lindig May 11, 2024 08:44
@gangj
Copy link
Contributor

gangj commented May 13, 2024

@LunfanZhang Would you please update the commit message of the first commit?

  1. Shorten the current title
  2. Add some detailed description about the resolved dependency cycling.

@LunfanZhang LunfanZhang force-pushed the private/luzhan_official/CP-47656 branch from 68b33ab to 770f560 Compare May 13, 2024 06:21
@LunfanZhang
Copy link
Contributor Author

@LunfanZhang Would you please update the commit message of the first commit?

  1. Shorten the current title
  2. Add some detailed description about the resolved dependency cycling.

Updated

@LunfanZhang LunfanZhang force-pushed the private/luzhan_official/CP-47656 branch 2 times, most recently from 4431267 to d6d9454 Compare May 13, 2024 10:34
Each entry in the list contains a feature identifier and its corresponding restriction status.
Example:
[
("restrict_vlan", "true");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the value in the key/value always true/false? If so, why is it a string and not a bool?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's always either "true" string or "false" string.
They are read from the XAPI DB. Converting them to true or false boolean would be more efficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string-based API invites errors if the only possible values are true/false but strings are accepted instead of just boolean values. It does not matter where the client obtains the values.

@gangj
Copy link
Contributor

gangj commented May 13, 2024

@LunfanZhang Would you please update the commit message of the first commit?

  1. Shorten the current title
  2. Add some detailed description about the resolved dependency cycling.

Updated

Would you please add more description to the commit message to make it better:
add something like "resolve dependency cycling issue if we call Xapi_alerts inside pool_features:"

Copy link
Contributor

@gangj gangj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LunfanZhang
I have another thought here: would you please also update the commit message of the 2nd commit, to add some description of how the alert will be triggered in what scenarios. This will make the commit history easy to understand without having to read the code, thank you!

Resolve dependency cycling issue if we call Xapi_alerts inside pool_features.
Error: dependency cycle between modules in _build/default/ocaml/xapi:
   Xha_statefile
-> Sm
-> Sm_exec
-> Xapi_session
-> Pool_features
-> Xapi_vm_group_helpers
-> Xapi_alert
-> Xapi_http
-> Xapi_session

Signed-off-by: Lunfan Zhang <Lunfan.Zhang@cloud.com>
@LunfanZhang LunfanZhang force-pushed the private/luzhan_official/CP-47656 branch from d6d9454 to 8281c25 Compare May 13, 2024 12:08
The generation of breach alerts during operations on VMs or groups, triggered when all running VMs within a group are breached (i.e., placed on the same host).

The following may generate or dismiss alerts:

Starting and resuming a VM.
Stopping and suspending a VM.
Pausing and unpausing a VM.
Migrating a VM.
Moving a VM from one group to another.
Destroying a VM group.
Changing the license from Premium to Standard dismisses all breach alerts, while changing from Standard to Premium generates them.

In other scenarios, such as when a VM changes state from paused to shutdown, rebooting the VM will not trigger or dismiss alerts.

Signed-off-by: Lunfan Zhang <Lunfan.Zhang@cloud.com>
@LunfanZhang LunfanZhang force-pushed the private/luzhan_official/CP-47656 branch 2 times, most recently from 86d8166 to 22259d0 Compare May 13, 2024 12:14
@gangj gangj merged commit da8ad7f into xapi-project:feature/vm-anti-affinity May 13, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants