Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CKS Enhancements #9102

Open
wants to merge 73 commits into
base: main
Choose a base branch
from
Open

Conversation

nvazquez
Copy link
Contributor

@nvazquez nvazquez commented May 21, 2024

Description

Design Document: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CKS+Enhancements

This PR extends the CloudStack Kubernetes Service functionalities, matching these requirements:

  • Ability to specify different compute or service offerings for different types of CKS cluster nodes – worker, master or etcd: The createKubernetesCluster API and the corresponding UI must provide an option to provide different offering for different types of nodes. CKS compute offerings will be marked as CKS compatible.
  • Ability to use CKS ready custom templates for CKS cluster nodes: CKS will allow users to specify their own templates for different CKS node types (control and worker) at the point of cluster creation. Those templates will be marked as CKS compatible.
  • Ability to use generic (non CKS ready) custom templates for CKS cluster nodes: CKS will allow users to specify their own templates for different CKS node types (control and worker) at the point of cluster creation. Those templates will be marked as CKS compatible. The user will be responsible for installing all necessary packages in the template.
  • Ability to add and remove a pre-created instance as a worker node to an existing CKS cluster: An instance (either virtual of physical) which has been built and prepared for CKS can been added to the desired CKS cluster. The instance must have all the CKS worker node packages installed.
  • Ability to separate etcd from master nodes of the CKS cluster: End users should be provided with an option to separate etcd cluster at the time of CKS cluster creation. The user can enable such option in the UI or in the createKubernetesCluster API and specify the size of the etcd cluster. Based on the user inputs CloudStack should be able to provision such etcd nodes for the CKS cluster.
  • Ability to mark CKS cluster nodes for manual only upgrade: An end user should be able to mark the desired compute offering (or the CKS template) for manual upgrades only. CKS cluster nodes marked for manual upgrade should be untouched during the Kubernetes version upgrade when executed using upgradeKubernetesCluster API.
  • Ability to dedicate specific hosts/clusters to a specific domain for CKS cluster deployment: The dedicateHost/dedicateCluster APIs can be used to provide this functionality to dedicate hosts/clusters for CKS cluster deployments. During the deployment of CKS cluster node VMs they will by default be deployed in the dedicated cluster.
  • Methodology for AS number management: Operators should be able to assign a range of AS numbers to an ACS Zone. ACS must have a method to assign an AS number to each Isolated network (or VPC tier), which can be retrieved via the UI and API. (Introduced on PR New feature: Dynamic and Static Routing #9470)
  • Methodology to use diverse CNI plugins (Calico, Cilium, etc…): End users should be able to deploy CKS clusters with Calico CNI. An option to specify which CNI plugin to be used for a CKS cluster must be provided in the createKubernetesClusterCmd API. The CNI configuration and setup can be registered as a managed userdata, and any configurable parameters – here, AS number, BGP Peer AS number and IP address, can be defined as variables in the userdata be set during the creation of the CKS cluster. This provides a flexible way for users to use the CNI plugin of their choice.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

In testing on vCenter 7.0 environment + NSX SDN

How did you try to break this feature and the system with this change?

nvazquez and others added 8 commits May 21, 2024 12:40
* Ability to specify different compute or service offerings for different types of CKS cluster nodes – worker, master or etcd

* Ability to use CKS ready custom templates for CKS cluster nodes

---------

Co-authored-by: Pearl Dsilva <pearl1594@gmail.com>
… a kubernetes cluster

---------

Co-authored-by: nvazquez <nicovazquez90@gmail.com>
* CKS: Fix ISO attach logic

* address comment
@nvazquez
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✖️ debian ✔️ suse15. SL-JID 9649

Copy link

codecov bot commented May 22, 2024

Codecov Report

Attention: Patch coverage is 9.90587% with 2010 lines in your changes missing coverage. Please review.

Project coverage is 16.06%. Comparing base (fadb39e) to head (e8a96d4).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
...bernetes/cluster/KubernetesClusterManagerImpl.java 12.35% 308 Missing and 4 partials ⚠️
...r/actionworkers/KubernetesClusterActionWorker.java 1.18% 250 Missing ⚠️
...er/actionworkers/KubernetesClusterStartWorker.java 0.00% 245 Missing ⚠️
...ster/actionworkers/KubernetesClusterAddWorker.java 0.00% 211 Missing ⚠️
...KubernetesClusterResourceModifierActionWorker.java 0.00% 117 Missing ⚠️
...r/actionworkers/KubernetesClusterRemoveWorker.java 0.00% 106 Missing ⚠️
...er/actionworkers/KubernetesClusterScaleWorker.java 30.88% 83 Missing and 11 partials ⚠️
...ava/com/cloud/upgrade/dao/Upgrade42010to42100.java 0.00% 80 Missing ⚠️
.../cloud/kubernetes/cluster/KubernetesClusterVO.java 0.00% 56 Missing ⚠️
...dstack/api/response/KubernetesClusterResponse.java 0.00% 53 Missing ⚠️
... and 46 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #9102      +/-   ##
============================================
- Coverage     16.09%   16.06%   -0.03%     
- Complexity    12934    12972      +38     
============================================
  Files          5644     5657      +13     
  Lines        494582   496742    +2160     
  Branches      59963    60239     +276     
============================================
+ Hits          79622    79822     +200     
- Misses       406124   408057    +1933     
- Partials       8836     8863      +27     
Flag Coverage Δ
uitests 3.97% <ø> (-0.04%) ⬇️
unittests 16.92% <9.90%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nvazquez nvazquez force-pushed the cks-enhancements-upstream branch from 5710f92 to 469c08d Compare May 22, 2024 00:59
@nvazquez
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✖️ debian ✔️ suse15. SL-JID 9650

@nvazquez
Copy link
Contributor Author

nvazquez commented Jan 9, 2025

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✖️ debian ✔️ suse15. SL-JID 12020

@nvazquez
Copy link
Contributor Author

nvazquez commented Jan 9, 2025

@blueorangutan test

@blueorangutan
Copy link

@nvazquez a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-12066)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 65306 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9102-t12066-kvm-ol8.zip
Smoke tests completed. 137 look OK, 4 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_provision_certificate Error 80.92 test_certauthority_root.py
test_11_isolated_network_with_dynamic_routed_mode Error 2.28 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 2.36 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 2.36 test_ipv4_routing.py
ContextSuite context=TestKubernetesCluster>:teardown Error 623.00 test_kubernetes_clusters.py
test_05_vmschedule_test_e2e Failure 362.20 test_vm_schedule.py

@bernardodemarco
Copy link
Collaborator

Thanks for your review @bernardodemarco I've addressed all the comments except the scale cluster one which I'm currently working on. I have also included the rest of the missing functionalities and started a design document with a more detailed explanation which I'll complete in the following days: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CKS+Enhancements

Nice, I'll review it again in the next days

@nvazquez
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12065

@nvazquez
Copy link
Contributor Author

@blueorangutan test matrix

@blueorangutan
Copy link

@nvazquez a [SL] Trillian-Jenkins matrix job (EL8 mgmt + EL8 KVM, Ubuntu22 mgmt + Ubuntu22 KVM, EL8 mgmt + VMware 7.0u3, EL9 mgmt + XCP-ng 8.2 ) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-12094)

@blueorangutan
Copy link

[SF] Trillian test result (tid-12092)
Environment: kvm-ubuntu22 (x2), Advanced Networking with Mgmt server u22
Total time taken: 58703 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9102-t12092-kvm-ubuntu22.zip
Smoke tests completed. 139 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_11_isolated_network_with_dynamic_routed_mode Error 1.27 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 3.40 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 3.40 test_ipv4_routing.py
test_oobm_multiple_mgmt_server_ownership Failure 31.79 test_outofbandmanagement.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-12091)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 66849 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9102-t12091-kvm-ol8.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_11_isolated_network_with_dynamic_routed_mode Error 2.31 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 2.36 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 2.36 test_ipv4_routing.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-12093)
Environment: vmware-70u3 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 77570 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9102-t12093-vmware-70u3.zip
Smoke tests completed. 136 look OK, 5 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_11_isolated_network_with_dynamic_routed_mode Error 2.28 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 3.38 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 3.38 test_ipv4_routing.py
ContextSuite context=TestKubernetesCluster>:setup Error 0.00 test_kubernetes_clusters.py
test_list_vms_metrics_admin Error 3625.04 test_metrics_api.py
test_list_vms_metrics_history Error 3618.78 test_metrics_api.py
test_list_volumes_metrics_history Error 3621.72 test_metrics_api.py
test_01_deployVMInSharedNetwork Failure 3603.93 test_network.py
ContextSuite context=TestSharedNetworkWithConfigDrive>:teardown Error 3605.18 test_network.py
test_03_restore_vm_with_disk_offering_custom_size Error 58.17 test_restore_vm.py

@nvazquez
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✔️ debian ✖️ suse15. SL-JID 12075

@nvazquez
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12078

@nvazquez
Copy link
Contributor Author

@blueorangutan test matrix

@blueorangutan
Copy link

@nvazquez a [SL] Trillian-Jenkins matrix job (EL8 mgmt + EL8 KVM, Ubuntu22 mgmt + Ubuntu22 KVM, EL8 mgmt + VMware 7.0u3, EL9 mgmt + XCP-ng 8.2 ) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-12102)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 53466 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9102-t12102-kvm-ol8.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_11_isolated_network_with_dynamic_routed_mode Error 2.33 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 4.47 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 4.47 test_ipv4_routing.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-12103)
Environment: kvm-ubuntu22 (x2), Advanced Networking with Mgmt server u22
Total time taken: 56042 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9102-t12103-kvm-ubuntu22.zip
Smoke tests completed. 139 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_11_isolated_network_with_dynamic_routed_mode Error 2.31 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 2.41 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 2.41 test_ipv4_routing.py
test_oobm_multiple_mgmt_server_ownership Failure 31.77 test_outofbandmanagement.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants