Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvm-storage: provide isVMMigrate information to storage plugins #10093

Conversation

rp-
Copy link
Contributor

@rp- rp- commented Dec 12, 2024

Description

This PR adds information to the storage adapters if the connectPhysicalDisk is running while doing
live VM migration. At least for Linstor this information is very beneficial to only allow two-primaries while
in this environment.
Other storage adapters will currently just ignore this information.

We just found out by a recent incident that this is rather critical for Linstor in the case that
for some reason the CloudStack management server thinks that a host is down or needs to do
VM HA, even tough the Host is still working as expected.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Linstor cluster, basic VM operations

How did you try to break this feature and the system with this change?

Copy link

codecov bot commented Dec 12, 2024

Codecov Report

Attention: Patch coverage is 13.04348% with 20 lines in your changes missing coverage. Please review.

Project coverage is 15.12%. Comparing base (a2f2e87) to head (00379ac).
Report is 2 commits behind head on 4.19.

Files with missing lines Patch % Lines
.../hypervisor/kvm/storage/LinstorStorageAdaptor.java 0.00% 6 Missing ⚠️
.../hypervisor/kvm/storage/KVMStoragePoolManager.java 0.00% 3 Missing ⚠️
...hypervisor/kvm/storage/IscsiAdmStorageAdaptor.java 0.00% 1 Missing ⚠️
...ud/hypervisor/kvm/storage/IscsiAdmStoragePool.java 0.00% 1 Missing ⚠️
.../hypervisor/kvm/storage/LibvirtStorageAdaptor.java 0.00% 1 Missing ⚠️
...pervisor/kvm/storage/ManagedNfsStorageAdaptor.java 0.00% 1 Missing ⚠️
...pervisor/kvm/storage/MultipathSCSIAdapterBase.java 0.00% 1 Missing ⚠️
...loud/hypervisor/kvm/storage/MultipathSCSIPool.java 0.00% 1 Missing ⚠️
.../hypervisor/kvm/storage/ScaleIOStorageAdaptor.java 50.00% 1 Missing ⚠️
...oud/hypervisor/kvm/storage/ScaleIOStoragePool.java 0.00% 1 Missing ⚠️
... and 3 more
Additional details and impacted files
@@             Coverage Diff              @@
##               4.19   #10093      +/-   ##
============================================
- Coverage     15.13%   15.12%   -0.01%     
+ Complexity    11268    11263       -5     
============================================
  Files          5408     5408              
  Lines        473867   473869       +2     
  Branches      57778    57779       +1     
============================================
- Hits          71700    71683      -17     
- Misses       394165   394187      +22     
+ Partials       8002     7999       -3     
Flag Coverage Δ
uitests 4.30% <ø> (ø)
unittests 15.84% <13.04%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✔️ debian ✖️ suse15. SL-JID 11788

@rp-
Copy link
Contributor Author

rp- commented Dec 13, 2024

@blueorangutan package

@blueorangutan
Copy link

@rp- a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link
Contributor

@slavkap slavkap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM
I don't see that the changes could affect the other storage plugins but I still cannot understand what the change does for Linstor

@rp-
Copy link
Contributor Author

rp- commented Dec 13, 2024

code LGTM I don't see that the changes could affect the other storage plugins but I still cannot understand what the change does for Linstor

In short, it prevents opening a volume on 2 hosts at the same time if it isn't a live migration (were it is required)

@slavkap
Copy link
Contributor

slavkap commented Dec 13, 2024

Thanks @rp- for the explanation! That makes sense. Btw, for the Host HA, I've added the option for the storage to disconnect the volume from all hosts before the start of a VM on another. It detaches the VM's volumes from the host just in case the host that should be down isn't. It should prevent the same issue as your change

primaryStoreDriver.detachVolumeFromAllStorageNodes(volumeVO);

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11801

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-11898)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 51068 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10093-t11898-kvm-ol8.zip
Smoke tests completed. 131 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_secure_vm_migration Error 389.58 test_vm_life_cycle.py
test_02_redundant_VPC_default_routes Failure 420.48 test_vpc_redundant.py

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11812

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-11902)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 43379 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10093-t11902-kvm-ol8.zip
Smoke tests completed. 133 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@DaanHoogland
Copy link
Contributor

@JoaoJandre are your concerns met, here?

Copy link

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

Particular Linstor needs can use this information to only allow
dual volume access for live migration and not enable it in general,
which can and will lead to data corruption if for some reason
2 VMs get started on 2 different hosts.
@rp- rp- force-pushed the linstor-4.19-allow-two-primaries-only-on-live-migration branch from bbc25a1 to 00379ac Compare December 16, 2024 09:36
@rp-
Copy link
Contributor Author

rp- commented Dec 16, 2024

fixed conflict and squashed the changelog.md into 1 commit

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 11819

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11821

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-11922)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 48466 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10093-t11922-kvm-ol8.zip
Smoke tests completed. 132 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_secure_vm_migration Error 134.44 test_vm_life_cycle.py
test_01_secure_vm_migration Error 134.45 test_vm_life_cycle.py

@DaanHoogland DaanHoogland merged commit a9587bf into apache:4.19 Dec 18, 2024
25 of 26 checks passed
DaanHoogland added a commit that referenced this pull request Dec 20, 2024
* 4.20:
  VR: apply iptables rules when add/remove static routes (#10064)
  Certificate and VM hostname validation improvements (#10051)
  set ulimit for server according to redhat spec (#10040)
  kvm-storage: provide isVMMigrate information to storage plugins (#10093)
  Allow config drive deletion of migrated VM, on host maintenance (#10045)
  linstor: improve heartbeat check with also asking linstor (#10105)
  server: simplify role change validation (#9173)
  UI: create VPC network offering with conserve mode (#10082)
  server: fix typo removeaccessvpn in VirtualRouterElement (#10086)
  UI: remove duplicated Instance Name in Public IP details page (#10087)
  UI: Fixes in the Usage UI (#10000)
  SAML2: add cookie with HttpOnly too #10013 (#10047)
  ui: Allow font-awesome icon usage and optimise icon size inconsistency (#9744)
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Dec 26, 2024
…he#10093)

Particular Linstor needs can use this information to only allow
dual volume access for live migration and not enable it in general,
which can and will lead to data corruption if for some reason
2 VMs get started on 2 different hosts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants