From 6990200b29282be6dddb7e2ac95384295e7b9720 Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Mon, 13 Jun 2022 16:17:45 -0400 Subject: [PATCH 01/18] Initial draft of multi-endpoint design for cloudstack --- designs/cloudstack-multiple-endpoints.md | 161 +++++++++++++++++++++++ 1 file changed, 161 insertions(+) create mode 100644 designs/cloudstack-multiple-endpoints.md diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md new file mode 100644 index 000000000000..e38a2c1c0b02 --- /dev/null +++ b/designs/cloudstack-multiple-endpoints.md @@ -0,0 +1,161 @@ +# Supporting Cloudstack clusters across endpoints + +## Introduction + +**Problem:** + +Our customer needs to support running Cloudstack EKS-A clusters across multiple Cloudstack API endpoints. In CAPC, we are considering +addressing the problem by relying on Failure Domains and distributing a cluster across the given ones. In order to support this functionality in EKS-A, +we need to have a similar breakdown where an EKS-A cluster can span across multiple endpoints and Failure Domains. + +### Tenets + +* ****Simple:**** simple to use, simple to understand, simple to maintain +* ****Declarative:**** intent based system, as close to a Kubernetes native experience as possible + +### Goals and Objectives + +As a Kubernetes administrator I want to: + +* Perform preflight checks when creating/upgrading clusters which span across multiple failure domains +* Create EKS Anywhere clusters which span across multiple failure domains +* Upgrade EKS Anywhere clusters which span across multiple failure domains +* Delete EKS Anywhere clusters which span across multiple failure domains + +### Statement of Scope + +**In scope** + +* Add support for create/upgrade/delete of EKS-A clusters across multiple Cloudstack API endpoints +* Add test environment for CI/CD e2e tests which can be used as a second Cloudstack API endpoint + +**Not in scope** + +* + +**Future scope** + +* Multiple network support to handle IP address exhaustion within a zone + +## Overview of Solution + +We propose to take the least invasive solution of repurposing the CloudstackDataCenterConfig to point to multiple failure domains, each of which contains the necessary +information for interacting with a Cloudstack failure domain. The assumption is that the necessary Cloudstack resources (i.e. image, computeOffering, ISOAttachment, etc.) +will be available on *all* the Cloudstack API endpoints. + +## Solution Details + +### Interface changes +Currently, the CloudstackDataCenterConfig spec contains: +``` +// Domain contains a grouping of accounts. Domains usually contain multiple accounts that have some logical relationship to each other and a set of delegated administrators with some authority over the domain and its subdomains +// +Domain string `json:"domain"` +// Zones is a list of one or more zones that are managed by a single CloudStack management endpoint. +Zones []CloudStackZone `json:"zones"` +// Account typically represents a customer of the service provider or a department in a large organization. Multiple users can exist in an account, and all CloudStack resources belong to an account. Accounts have users and users have credentials to operate on resources within that account. If an account name is provided, a domain must also be provided. +Account string `json:"account,omitempty"` +// CloudStack Management API endpoint's IP. It is added to VM's noproxy list +ManagementApiEndpoint string `json:"managementApiEndpoint"` +``` + +We would instead propose to remove all the existing attributes and instead, simply include a list of FailureDomain objects, where each FailureDomain object looks like +``` +// Domain contains a grouping of accounts. Domains usually contain multiple accounts that have some logical relationship to each other and a set of delegated administrators with some authority over the domain and its subdomains +// This field is considered as a fully qualified domain name which is the same as the domain path without "ROOT/" prefix. For example, if "foo" is specified then a domain with "ROOT/foo" domain path is picked. +// The value "ROOT" is a special case that points to "the" ROOT domain of the CloudStack. That is, a domain with a path "ROOT/ROOT" is not allowed. +// +Domain string `json:"domain"` +// Zones is a list of one or more zones that are managed by a single CloudStack management endpoint. +Zone CloudStackZone `json:"zone"` +// Account typically represents a customer of the service provider or a department in a large organization. Multiple users can exist in an account, and all CloudStack resources belong to an account. Accounts have users and users have credentials to operate on resources within that account. If an account name is provided, a domain must also be provided. +Account string `json:"account,omitempty"` +// CloudStack Management API endpoint's IP. It is added to VM's noproxy list +ManagementApiEndpoint string `json:"managementApiEndpoint"` +``` + +and we would parse these resources and pass them into CAPC by modifying the templates we have currently implemented. We can then use this new model to read in credentials, perform pre-flight checks, plumb data to CAPC, and support upgrades in the controller. The goal would be to make these new resources backwards compatible via code + +### Failure Domain + +A failure domain is a CAPI concept which serves to improve HA and availability by destributing machines across "failure domains", as discussed [here](https://cluster-api.sigs.k8s.io/developer/providers/v1alpha2-to-v1alpha3.html?highlight=domain#optional-support-failure-domains). +CAPC currently utilizes them to distribute machines across CloudStack Zones. However, we now want to go a step further to support our customer and consider the following unique combination to be a failure domain: + +1. Cloudstack endpoint +2. Cloudstack domain +3. Cloudstack zone +4. Cloudstack account + +You can find more information about these Cloudstack resources [here](http://docs.cloudstack.apache.org/en/latest/conceptsandterminology/concepts.html#cloudstack-terminology) + +### `CloudstackDatacenterConfig` Validation + +With the multi-endpoint system for the Cloudstack provider, customers reference a CloudstackMachineConfig and it's created across multiple failure domains. The implication +is that all the Cloudstack resources such as image, ComputeOffering, ISOAttachment, etc. must be available in *all* the failure domains, or all the Cloudstack endpoints, +and these resources must be referenced by name, not unique ID. This would mean that for each CloudstackMachineConfig, we have to make sure that all the prerequisite +Cloudstack resources are available in all the Cloudstack API endpoints. + +In practice, the pseudocode would look like: + +for failureDomain in failureDomains: + for machineConfig in machineConfigs: + validate resource presence with the failureDomain's instance of the CloudMonkey executable + +### Cloudstack credentials + + +In a multi-endpoint Cloudstack cluster, each endpoint may have its own credentials. We propose that Cloudstack credentials will be passed in via environment variable in the same way as they are currently, +and mapped to a given failure domain by the API endpoint path. Currently, these credentials are passed in via environment variable, which contains a base64 encoded .ini file that looks like + +``` +[Global] +api-key = redacted +secret-key = redacted +api-url = http://172.16.0.1:8080/client/api +``` + +We would propose an extension of the above input mechanism so the user could provide credentials across multiple Cloudstack API endpoints like + +``` +[FailureDomain1] +api-key = redacted +secret-key = redacted +api-url = http://172.16.0.1:8080/client/api + +[FailureDomain2] +api-key = redacted +secret-key = redacted +api-url = http://172.16.0.2:8080/client/api + +[FailureDomain3] +api-key = redacted +secret-key = redacted +api-url = http://172.16.0.3:8080/client/api + +... +``` + +We are also exploring converting the ini file to a yaml input file which contains a list of credentials and their associated endpoints. Either way, this environment variable would +be passed along to CAPC and used by the CAPC controller just like it is currently. + +### Backwards Compatibility + +## User Experience + + +## Security + +The main change regarding security is the additional credential management. Otherwise, we are doing exactly the same operations - preflight check with cloudmonkey, +create yaml templates and apply them, and then read/write eks-a resources in the eks-a controller. The corresponding change is an extension of an existing mechanism +and there should not be any new surface area for risks than there was previously. + +## Testing + +The new code will be covered by unit and e2e tests, and the e2e framework will be extended to support cluster creation across multiple Cloudstack API endpoints. + +The following e2e test will be added: + +simple flow cluster creation/deletion across multiple Cloudstack API endpoints: + +* create a management+workload cluster spanning multiple Cloudstack API endpoints +* delete cluster From fc70d054638d8fb77d01e63837163664b319a46c Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Mon, 13 Jun 2022 16:44:18 -0400 Subject: [PATCH 02/18] Adding additional details --- designs/cloudstack-multiple-endpoints.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index e38a2c1c0b02..d26a6ee6a5d4 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -140,6 +140,12 @@ be passed along to CAPC and used by the CAPC controller just like it is currentl ### Backwards Compatibility +Our customer currently has clusters running with the old resource definition. In order to support backwards compatibility in the CloudstackDatacenterConfig resource, we can +1. Make all the fields optional and see if customers have the old fields set or the new ones +2. Introduce an eks-a version bump with conversion webhooks + +Between these two approaches, I would take the first and then deprecate the legacy fields in a subsequent release to simplify the code paths. + ## User Experience @@ -159,3 +165,9 @@ simple flow cluster creation/deletion across multiple Cloudstack API endpoints: * create a management+workload cluster spanning multiple Cloudstack API endpoints * delete cluster + +## Other approaches explored + +Another direction we can go to support this feature is to refactor the entire EKS-A codebase so that instead of all the failure domains existing inside the CloudstackDatacenterConfig +object, each CloudstackDatacenterConfig itself corresponds with a single failure domain. Then, the top level EKS-A Cluster object could be refactored to have a list of DatacenterRefs instead +of a single one. However, this approach feels extremely invasive to the product and does not provide tangible value to the other providers. \ No newline at end of file From 938b80aeb5b4e913cae31e9a3ed59e3f746de15e Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Tue, 14 Jun 2022 10:26:16 -0400 Subject: [PATCH 03/18] Update cloudstack-multiple-endpoints.md --- designs/cloudstack-multiple-endpoints.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index d26a6ee6a5d4..5f051575f441 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -105,7 +105,7 @@ for failureDomain in failureDomains: In a multi-endpoint Cloudstack cluster, each endpoint may have its own credentials. We propose that Cloudstack credentials will be passed in via environment variable in the same way as they are currently, -and mapped to a given failure domain by the API endpoint path. Currently, these credentials are passed in via environment variable, which contains a base64 encoded .ini file that looks like +only as a list corresponding to failure domains. Currently, these credentials are passed in via environment variable, which contains a base64 encoded .ini file that looks like ``` [Global] @@ -146,6 +146,8 @@ Our customer currently has clusters running with the old resource definition. In Between these two approaches, I would take the first and then deprecate the legacy fields in a subsequent release to simplify the code paths. +However, given that the Cloudstack credentials are persisted in a write-once secret on the cluster, upgrading existing clusters may not be feasible unless CAPC supports overwriting that secret. + ## User Experience @@ -170,4 +172,4 @@ simple flow cluster creation/deletion across multiple Cloudstack API endpoints: Another direction we can go to support this feature is to refactor the entire EKS-A codebase so that instead of all the failure domains existing inside the CloudstackDatacenterConfig object, each CloudstackDatacenterConfig itself corresponds with a single failure domain. Then, the top level EKS-A Cluster object could be refactored to have a list of DatacenterRefs instead -of a single one. However, this approach feels extremely invasive to the product and does not provide tangible value to the other providers. \ No newline at end of file +of a single one. However, this approach feels extremely invasive to the product and does not provide tangible value to the other providers. From e2b28894398c768c2b6ca6ba3bcd906766480e46 Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Tue, 14 Jun 2022 11:29:24 -0400 Subject: [PATCH 04/18] Update cloudstack-multiple-endpoints.md --- designs/cloudstack-multiple-endpoints.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index 5f051575f441..bd4a9b92fdd1 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -4,8 +4,11 @@ **Problem:** -Our customer needs to support running Cloudstack EKS-A clusters across multiple Cloudstack API endpoints. In CAPC, we are considering -addressing the problem by relying on Failure Domains and distributing a cluster across the given ones. In order to support this functionality in EKS-A, +For High Availability, multi Cloudstack endpoints is almost a given since the Cloudstack endpoints themselves are potential points of failure. If one endpoint goes down, then control of all of everything goes down. So, we want to spread our services across many ACS hosts to protect against that. + +For Scalability, multi ACS endpoints will likely be required for storage and API endpoint throughput. Just one cluster creation generates probably a thousand API calls to ACS. There are many ways to mitigate this, but adding more ACS hosts is a fairly foolproof way to do so. Then, there’s the size and performance of the underlying database that each Cloudstack instance runs on. + +In CAPC, we are considering addressing the problem by relying on Failure Domains and distributing a cluster across the given ones. In order to support this functionality in EKS-A, we need to have a similar breakdown where an EKS-A cluster can span across multiple endpoints and Failure Domains. ### Tenets From 89d0b7e57238890baa0cfaed8c809670c28692e0 Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Tue, 14 Jun 2022 12:14:13 -0400 Subject: [PATCH 05/18] Update cloudstack-multiple-endpoints.md --- designs/cloudstack-multiple-endpoints.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index bd4a9b92fdd1..8104c830f154 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -4,12 +4,11 @@ **Problem:** -For High Availability, multi Cloudstack endpoints is almost a given since the Cloudstack endpoints themselves are potential points of failure. If one endpoint goes down, then control of all of everything goes down. So, we want to spread our services across many ACS hosts to protect against that. +The mangement API endpoint for Cloudstack is a potential points of failure. If one endpoint goes down, then control of all of everything goes down. So, we want to spread our services across many Cloudstack endpoints and hosts to protect against that. -For Scalability, multi ACS endpoints will likely be required for storage and API endpoint throughput. Just one cluster creation generates probably a thousand API calls to ACS. There are many ways to mitigate this, but adding more ACS hosts is a fairly foolproof way to do so. Then, there’s the size and performance of the underlying database that each Cloudstack instance runs on. +For scalability, multiple Cloudstack endpoints will likely be required for storage and API endpoint throughput for our customer. Just one cluster creation generates probably a thousand API calls to ACS. There are many ways to mitigate this, but adding more Cloudstack hosts and endpoints is a fairly foolproof way to do so. Then, there’s the size and performance of the underlying database that each Cloudstack instance runs on. -In CAPC, we are considering addressing the problem by relying on Failure Domains and distributing a cluster across the given ones. In order to support this functionality in EKS-A, -we need to have a similar breakdown where an EKS-A cluster can span across multiple endpoints and Failure Domains. +In CAPC, we are considering addressing the problem by relying on Failure Domains and distributing a cluster across the given ones. In order to support this functionality in EKS-A, we need to have a similar breakdown where an EKS-A cluster can span across multiple endpoints and Failure Domains. ### Tenets From cdb7bd5a757d1636cbf9d835acdf0f51ae1783b1 Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Tue, 14 Jun 2022 16:11:48 -0400 Subject: [PATCH 06/18] Update cloudstack-multiple-endpoints.md --- designs/cloudstack-multiple-endpoints.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index 8104c830f154..afaf54e0f95f 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -8,7 +8,7 @@ The mangement API endpoint for Cloudstack is a potential points of failure. If o For scalability, multiple Cloudstack endpoints will likely be required for storage and API endpoint throughput for our customer. Just one cluster creation generates probably a thousand API calls to ACS. There are many ways to mitigate this, but adding more Cloudstack hosts and endpoints is a fairly foolproof way to do so. Then, there’s the size and performance of the underlying database that each Cloudstack instance runs on. -In CAPC, we are considering addressing the problem by relying on Failure Domains and distributing a cluster across the given ones. In order to support this functionality in EKS-A, we need to have a similar breakdown where an EKS-A cluster can span across multiple endpoints and Failure Domains. +In CAPC, we are considering addressing the problem by extending our use of the concept of [Failure Domains](https://cluster-api.sigs.k8s.io/developer/providers/v1alpha2-to-v1alpha3.html?highlight=failure%20domain#optional-support-failure-domains) and distributing a cluster across the given ones. However, instead of a failure domain consisting of a zone on a single Cloudstack endpoint, we will redefine it to consist of the unique combination of a Cloudstack zone, api endpoint, account, and domain. In order to support this functionality in EKS-A, we need to have a similar breakdown where an EKS-A cluster can span across multiple endpoints and Failure Domains. ### Tenets From 5874c758beaf00b929df3a19c06111c53c89dea5 Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Tue, 14 Jun 2022 16:14:45 -0400 Subject: [PATCH 07/18] Update cloudstack-multiple-endpoints.md --- designs/cloudstack-multiple-endpoints.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index afaf54e0f95f..bd11886ac773 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -6,7 +6,7 @@ The mangement API endpoint for Cloudstack is a potential points of failure. If one endpoint goes down, then control of all of everything goes down. So, we want to spread our services across many Cloudstack endpoints and hosts to protect against that. -For scalability, multiple Cloudstack endpoints will likely be required for storage and API endpoint throughput for our customer. Just one cluster creation generates probably a thousand API calls to ACS. There are many ways to mitigate this, but adding more Cloudstack hosts and endpoints is a fairly foolproof way to do so. Then, there’s the size and performance of the underlying database that each Cloudstack instance runs on. +For scalability, multiple Cloudstack endpoints will likely be required for storage and API endpoint throughput for our customer. Just one cluster creation as many as a thousand API calls to ACS (estimated). There are many ways to support this scale, but adding more Cloudstack hosts and endpoints is a fairly foolproof way to do so. Then, there’s the size and performance of the underlying database that each Cloudstack instance runs on. In CAPC, we are considering addressing the problem by extending our use of the concept of [Failure Domains](https://cluster-api.sigs.k8s.io/developer/providers/v1alpha2-to-v1alpha3.html?highlight=failure%20domain#optional-support-failure-domains) and distributing a cluster across the given ones. However, instead of a failure domain consisting of a zone on a single Cloudstack endpoint, we will redefine it to consist of the unique combination of a Cloudstack zone, api endpoint, account, and domain. In order to support this functionality in EKS-A, we need to have a similar breakdown where an EKS-A cluster can span across multiple endpoints and Failure Domains. From a03aaf75de81477ba7625afdb84a4622ddd2659f Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Tue, 14 Jun 2022 16:24:04 -0400 Subject: [PATCH 08/18] Update cloudstack-multiple-endpoints.md --- designs/cloudstack-multiple-endpoints.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index bd11886ac773..446ca9f1c4ec 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -4,7 +4,7 @@ **Problem:** -The mangement API endpoint for Cloudstack is a potential points of failure. If one endpoint goes down, then control of all of everything goes down. So, we want to spread our services across many Cloudstack endpoints and hosts to protect against that. +The mangement API endpoint for Apache Cloudstack (ACS) is a singe point of failure. If the endpoint goes down, then control of all of all VM's, networks, zones, accounts, domains, and everything else on Cloudstack goes down. So, we want to spread our clusters across many Cloudstack endpoints and hosts to protect against that. For scalability, multiple Cloudstack endpoints will likely be required for storage and API endpoint throughput for our customer. Just one cluster creation as many as a thousand API calls to ACS (estimated). There are many ways to support this scale, but adding more Cloudstack hosts and endpoints is a fairly foolproof way to do so. Then, there’s the size and performance of the underlying database that each Cloudstack instance runs on. From 32554787866467fcc85ba7dfb649144575131eee Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Tue, 14 Jun 2022 17:00:30 -0400 Subject: [PATCH 09/18] Apply suggestions from code review Co-authored-by: Guillermo Gaston --- designs/cloudstack-multiple-endpoints.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index 446ca9f1c4ec..7150cc187c27 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -6,7 +6,7 @@ The mangement API endpoint for Apache Cloudstack (ACS) is a singe point of failure. If the endpoint goes down, then control of all of all VM's, networks, zones, accounts, domains, and everything else on Cloudstack goes down. So, we want to spread our clusters across many Cloudstack endpoints and hosts to protect against that. -For scalability, multiple Cloudstack endpoints will likely be required for storage and API endpoint throughput for our customer. Just one cluster creation as many as a thousand API calls to ACS (estimated). There are many ways to support this scale, but adding more Cloudstack hosts and endpoints is a fairly foolproof way to do so. Then, there’s the size and performance of the underlying database that each Cloudstack instance runs on. +For scalability, multiple Cloudstack endpoints will likely be required for storage and API endpoint throughput. Just one cluster creation triggers as many as a thousand API calls to ACS (estimated). There are many ways to support this scale, but adding more Cloudstack hosts and endpoints is a fairly foolproof way to do so. Then, there’s the size and performance of the underlying database that each Cloudstack instance runs on. In CAPC, we are considering addressing the problem by extending our use of the concept of [Failure Domains](https://cluster-api.sigs.k8s.io/developer/providers/v1alpha2-to-v1alpha3.html?highlight=failure%20domain#optional-support-failure-domains) and distributing a cluster across the given ones. However, instead of a failure domain consisting of a zone on a single Cloudstack endpoint, we will redefine it to consist of the unique combination of a Cloudstack zone, api endpoint, account, and domain. In order to support this functionality in EKS-A, we need to have a similar breakdown where an EKS-A cluster can span across multiple endpoints and Failure Domains. @@ -81,7 +81,7 @@ and we would parse these resources and pass them into CAPC by modifying the temp ### Failure Domain A failure domain is a CAPI concept which serves to improve HA and availability by destributing machines across "failure domains", as discussed [here](https://cluster-api.sigs.k8s.io/developer/providers/v1alpha2-to-v1alpha3.html?highlight=domain#optional-support-failure-domains). -CAPC currently utilizes them to distribute machines across CloudStack Zones. However, we now want to go a step further to support our customer and consider the following unique combination to be a failure domain: +CAPC currently utilizes them to distribute machines across CloudStack Zones. However, we now want to go a step further and consider the following unique combination to be a failure domain: 1. Cloudstack endpoint 2. Cloudstack domain @@ -92,7 +92,7 @@ You can find more information about these Cloudstack resources [here](http://doc ### `CloudstackDatacenterConfig` Validation -With the multi-endpoint system for the Cloudstack provider, customers reference a CloudstackMachineConfig and it's created across multiple failure domains. The implication +With the multi-endpoint system for the Cloudstack provider, users reference a CloudstackMachineConfig and it's created across multiple failure domains. The implication is that all the Cloudstack resources such as image, ComputeOffering, ISOAttachment, etc. must be available in *all* the failure domains, or all the Cloudstack endpoints, and these resources must be referenced by name, not unique ID. This would mean that for each CloudstackMachineConfig, we have to make sure that all the prerequisite Cloudstack resources are available in all the Cloudstack API endpoints. @@ -142,8 +142,8 @@ be passed along to CAPC and used by the CAPC controller just like it is currentl ### Backwards Compatibility -Our customer currently has clusters running with the old resource definition. In order to support backwards compatibility in the CloudstackDatacenterConfig resource, we can -1. Make all the fields optional and see if customers have the old fields set or the new ones +In order to support backwards compatibility in the CloudstackDatacenterConfig resource for users with existing clusters, we can +1. Make all the fields optional and see if the user has the old fields set or the new ones 2. Introduce an eks-a version bump with conversion webhooks Between these two approaches, I would take the first and then deprecate the legacy fields in a subsequent release to simplify the code paths. From 3eb2cb3054c6b63b662c5c86bff10924fd781673 Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Tue, 14 Jun 2022 17:12:42 -0400 Subject: [PATCH 10/18] Update cloudstack-multiple-endpoints.md --- designs/cloudstack-multiple-endpoints.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index 7150cc187c27..bf0d44933410 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -143,7 +143,7 @@ be passed along to CAPC and used by the CAPC controller just like it is currentl ### Backwards Compatibility In order to support backwards compatibility in the CloudstackDatacenterConfig resource for users with existing clusters, we can -1. Make all the fields optional and see if the user has the old fields set or the new ones +1. Make all the fields optional and see if the user has the old fields set or the new ones, then write a transformer to set the new fields and clean up the old ones 2. Introduce an eks-a version bump with conversion webhooks Between these two approaches, I would take the first and then deprecate the legacy fields in a subsequent release to simplify the code paths. From 0d50f7c7daa3fd0e1c513c93c50f9e6ca6cbf6ef Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Wed, 15 Jun 2022 15:46:00 -0400 Subject: [PATCH 11/18] Update cloudstack-multiple-endpoints.md --- designs/cloudstack-multiple-endpoints.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index bf0d44933410..571bdb1f8876 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -172,6 +172,5 @@ simple flow cluster creation/deletion across multiple Cloudstack API endpoints: ## Other approaches explored -Another direction we can go to support this feature is to refactor the entire EKS-A codebase so that instead of all the failure domains existing inside the CloudstackDatacenterConfig -object, each CloudstackDatacenterConfig itself corresponds with a single failure domain. Then, the top level EKS-A Cluster object could be refactored to have a list of DatacenterRefs instead -of a single one. However, this approach feels extremely invasive to the product and does not provide tangible value to the other providers. +1. Another direction we can go to support this feature is to refactor the entire EKS-A codebase so that instead of all the failure domains existing inside the CloudstackDatacenterConfig object, each CloudstackDatacenterConfig itself corresponds with a single failure domain. Then, the top level EKS-A Cluster object could be refactored to have a list of DatacenterRefs instead of a single one. However, this approach feels extremely invasive to the product and does not provide tangible value to the other providers. +2. Lastly, we can consider the option of introducing a new DatacenterConfig object which represents not one Cloudstack "Availability Zone", but multiple. However, the issue here is that the CloudstackDatacenterConfig already has a list of CloudstackZone objects, so we're essentially already supporting multiple of what could be interpreted as Availability Zones. Adding additional attributes to that concept is a more natural extension of the API, instead of defining a new type From ad2f0b25f81507a290ea44ce48170cd665d1f852 Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Wed, 15 Jun 2022 15:49:05 -0400 Subject: [PATCH 12/18] Update cloudstack-multiple-endpoints.md --- designs/cloudstack-multiple-endpoints.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index 571bdb1f8876..f4f414a289cb 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -51,7 +51,6 @@ will be available on *all* the Cloudstack API endpoints. Currently, the CloudstackDataCenterConfig spec contains: ``` // Domain contains a grouping of accounts. Domains usually contain multiple accounts that have some logical relationship to each other and a set of delegated administrators with some authority over the domain and its subdomains -// Domain string `json:"domain"` // Zones is a list of one or more zones that are managed by a single CloudStack management endpoint. Zones []CloudStackZone `json:"zones"` @@ -66,7 +65,6 @@ We would instead propose to remove all the existing attributes and instead, simp // Domain contains a grouping of accounts. Domains usually contain multiple accounts that have some logical relationship to each other and a set of delegated administrators with some authority over the domain and its subdomains // This field is considered as a fully qualified domain name which is the same as the domain path without "ROOT/" prefix. For example, if "foo" is specified then a domain with "ROOT/foo" domain path is picked. // The value "ROOT" is a special case that points to "the" ROOT domain of the CloudStack. That is, a domain with a path "ROOT/ROOT" is not allowed. -// Domain string `json:"domain"` // Zones is a list of one or more zones that are managed by a single CloudStack management endpoint. Zone CloudStackZone `json:"zone"` @@ -101,7 +99,7 @@ In practice, the pseudocode would look like: for failureDomain in failureDomains: for machineConfig in machineConfigs: - validate resource presence with the failureDomain's instance of the CloudMonkey executable + validate resource presence with the failureDomain's configuration of the CloudMonkey executable ### Cloudstack credentials @@ -119,7 +117,7 @@ api-url = http://172.16.0.1:8080/client/api We would propose an extension of the above input mechanism so the user could provide credentials across multiple Cloudstack API endpoints like ``` -[FailureDomain1] +[Global] api-key = redacted secret-key = redacted api-url = http://172.16.0.1:8080/client/api @@ -165,9 +163,10 @@ The new code will be covered by unit and e2e tests, and the e2e framework will b The following e2e test will be added: -simple flow cluster creation/deletion across multiple Cloudstack API endpoints: +simple flow cluster creation/scaleu/deletion across multiple Cloudstack API endpoints: * create a management+workload cluster spanning multiple Cloudstack API endpoints +* scale the size of the management+workload cluster so that we touch multiple Cloudstack API endpoints * delete cluster ## Other approaches explored From 242485f65a7c8c61f63af85dc26fe47a1d4f15ee Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Wed, 15 Jun 2022 16:32:05 -0400 Subject: [PATCH 13/18] Update cloudstack-multiple-endpoints.md --- designs/cloudstack-multiple-endpoints.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index f4f414a289cb..84cce5de970b 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -92,14 +92,19 @@ You can find more information about these Cloudstack resources [here](http://doc With the multi-endpoint system for the Cloudstack provider, users reference a CloudstackMachineConfig and it's created across multiple failure domains. The implication is that all the Cloudstack resources such as image, ComputeOffering, ISOAttachment, etc. must be available in *all* the failure domains, or all the Cloudstack endpoints, -and these resources must be referenced by name, not unique ID. This would mean that for each CloudstackMachineConfig, we have to make sure that all the prerequisite -Cloudstack resources are available in all the Cloudstack API endpoints. +and these resources must be referenced by name, not unique ID. This would mean that we need to check if there are multiple Cloudstack endpoints, and if so check the zones, networks, domains, accounts, and users. + +### `CloudstackMachineConfig` Validation + +For each CloudstackMachineConfig, we have to make sure that all the prerequisite +Cloudstack resources are available in all the Cloudstack API endpoints (DiskOffering, ComputeOffering, template, affinitygroupids). In practice, the pseudocode would look like: for failureDomain in failureDomains: for machineConfig in machineConfigs: validate resource presence with the failureDomain's configuration of the CloudMonkey executable + ### Cloudstack credentials @@ -166,7 +171,8 @@ The following e2e test will be added: simple flow cluster creation/scaleu/deletion across multiple Cloudstack API endpoints: * create a management+workload cluster spanning multiple Cloudstack API endpoints -* scale the size of the management+workload cluster so that we touch multiple Cloudstack API endpoints +* scale up the size of the management+workload cluster so that we touch multiple Cloudstack API endpoints +* scale down the size of the management+workload cluster so that we touch multiple Cloudstack API endpoints * delete cluster ## Other approaches explored From 5645c8553d4744ffc712ceb7357cb5e6499eecac Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Wed, 15 Jun 2022 16:41:02 -0400 Subject: [PATCH 14/18] Update cloudstack-multiple-endpoints.md --- designs/cloudstack-multiple-endpoints.md | 34 ++++++++++++------------ 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index 84cce5de970b..6974ec5afe1c 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -8,7 +8,7 @@ The mangement API endpoint for Apache Cloudstack (ACS) is a singe point of failu For scalability, multiple Cloudstack endpoints will likely be required for storage and API endpoint throughput. Just one cluster creation triggers as many as a thousand API calls to ACS (estimated). There are many ways to support this scale, but adding more Cloudstack hosts and endpoints is a fairly foolproof way to do so. Then, there’s the size and performance of the underlying database that each Cloudstack instance runs on. -In CAPC, we are considering addressing the problem by extending our use of the concept of [Failure Domains](https://cluster-api.sigs.k8s.io/developer/providers/v1alpha2-to-v1alpha3.html?highlight=failure%20domain#optional-support-failure-domains) and distributing a cluster across the given ones. However, instead of a failure domain consisting of a zone on a single Cloudstack endpoint, we will redefine it to consist of the unique combination of a Cloudstack zone, api endpoint, account, and domain. In order to support this functionality in EKS-A, we need to have a similar breakdown where an EKS-A cluster can span across multiple endpoints and Failure Domains. +In CAPC, we are considering addressing the problem by extending our use of the concept of [Failure Domains](https://cluster-api.sigs.k8s.io/developer/providers/v1alpha2-to-v1alpha3.html?highlight=failure%20domain#optional-support-failure-domains) and distributing a cluster across the given ones. However, instead of a failure domain consisting of a zone on a single Cloudstack endpoint, we will redefine it to consist of the unique combination of a Cloudstack zone, api endpoint, account, and domain. In order to support this functionality in EKS-A, we need to have a similar breakdown where an EKS-A cluster can span across multiple endpoints, zones, accounts, and domains. ### Tenets @@ -41,8 +41,8 @@ As a Kubernetes administrator I want to: ## Overview of Solution -We propose to take the least invasive solution of repurposing the CloudstackDataCenterConfig to point to multiple failure domains, each of which contains the necessary -information for interacting with a Cloudstack failure domain. The assumption is that the necessary Cloudstack resources (i.e. image, computeOffering, ISOAttachment, etc.) +We propose to take the least invasive solution of repurposing the CloudstackDataCenterConfig to point to multiple Availability Zones, each of which contains the necessary +information for interacting with a Cloudstack failure domain. The assumption is that the necessary Cloudstack resources (i.e. image, computeOffering, diskOffering, network, etc.) will be available on *all* the Cloudstack API endpoints. ## Solution Details @@ -60,7 +60,7 @@ Account string `json:"account,omitempty"` ManagementApiEndpoint string `json:"managementApiEndpoint"` ``` -We would instead propose to remove all the existing attributes and instead, simply include a list of FailureDomain objects, where each FailureDomain object looks like +We would instead propose to remove all the existing attributes and instead, simply include a list of AvailabilityZone objects, where each AvailabilityZone object looks like ``` // Domain contains a grouping of accounts. Domains usually contain multiple accounts that have some logical relationship to each other and a set of delegated administrators with some authority over the domain and its subdomains // This field is considered as a fully qualified domain name which is the same as the domain path without "ROOT/" prefix. For example, if "foo" is specified then a domain with "ROOT/foo" domain path is picked. @@ -76,10 +76,10 @@ ManagementApiEndpoint string `json:"managementApiEndpoint"` and we would parse these resources and pass them into CAPC by modifying the templates we have currently implemented. We can then use this new model to read in credentials, perform pre-flight checks, plumb data to CAPC, and support upgrades in the controller. The goal would be to make these new resources backwards compatible via code -### Failure Domain +### AvailabilityZone A failure domain is a CAPI concept which serves to improve HA and availability by destributing machines across "failure domains", as discussed [here](https://cluster-api.sigs.k8s.io/developer/providers/v1alpha2-to-v1alpha3.html?highlight=domain#optional-support-failure-domains). -CAPC currently utilizes them to distribute machines across CloudStack Zones. However, we now want to go a step further and consider the following unique combination to be a failure domain: +CAPC currently utilizes them to distribute machines across CloudStack Zones. However, we now want to go a step further and consider the following unique combination to be an AvailabilityZone: 1. Cloudstack endpoint 2. Cloudstack domain @@ -90,8 +90,8 @@ You can find more information about these Cloudstack resources [here](http://doc ### `CloudstackDatacenterConfig` Validation -With the multi-endpoint system for the Cloudstack provider, users reference a CloudstackMachineConfig and it's created across multiple failure domains. The implication -is that all the Cloudstack resources such as image, ComputeOffering, ISOAttachment, etc. must be available in *all* the failure domains, or all the Cloudstack endpoints, +With the multi-endpoint system for the Cloudstack provider, users reference a CloudstackMachineConfig and it's created across multiple AvailabilityZones. The implication +is that all the Cloudstack resources such as image, ComputeOffering, ISOAttachment, etc. must be available in *all* the AvailabilityZones, or all the Cloudstack endpoints, and these resources must be referenced by name, not unique ID. This would mean that we need to check if there are multiple Cloudstack endpoints, and if so check the zones, networks, domains, accounts, and users. ### `CloudstackMachineConfig` Validation @@ -101,16 +101,16 @@ Cloudstack resources are available in all the Cloudstack API endpoints (DiskOffe In practice, the pseudocode would look like: -for failureDomain in failureDomains: +for availabilityZone in availabilityZones: for machineConfig in machineConfigs: - validate resource presence with the failureDomain's configuration of the CloudMonkey executable + validate resource presence with the availabilityZone's configuration of the CloudMonkey executable ### Cloudstack credentials In a multi-endpoint Cloudstack cluster, each endpoint may have its own credentials. We propose that Cloudstack credentials will be passed in via environment variable in the same way as they are currently, -only as a list corresponding to failure domains. Currently, these credentials are passed in via environment variable, which contains a base64 encoded .ini file that looks like +only as a list corresponding to AvailabilityZones. Currently, these credentials are passed in via environment variable, which contains a base64 encoded .ini file that looks like ``` [Global] @@ -127,12 +127,12 @@ api-key = redacted secret-key = redacted api-url = http://172.16.0.1:8080/client/api -[FailureDomain2] +[AvailabilityZone2] api-key = redacted secret-key = redacted api-url = http://172.16.0.2:8080/client/api -[FailureDomain3] +[AvailabilityZone3] api-key = redacted secret-key = redacted api-url = http://172.16.0.3:8080/client/api @@ -149,7 +149,7 @@ In order to support backwards compatibility in the CloudstackDatacenterConfig re 1. Make all the fields optional and see if the user has the old fields set or the new ones, then write a transformer to set the new fields and clean up the old ones 2. Introduce an eks-a version bump with conversion webhooks -Between these two approaches, I would take the first and then deprecate the legacy fields in a subsequent release to simplify the code paths. +Between these two approaches, we propose to take the first and then deprecate the legacy fields in a subsequent release to simplify the code paths. However, given that the Cloudstack credentials are persisted in a write-once secret on the cluster, upgrading existing clusters may not be feasible unless CAPC supports overwriting that secret. @@ -168,7 +168,7 @@ The new code will be covered by unit and e2e tests, and the e2e framework will b The following e2e test will be added: -simple flow cluster creation/scaleu/deletion across multiple Cloudstack API endpoints: +simple flow cluster creation/scaling/deletion across multiple Cloudstack API endpoints: * create a management+workload cluster spanning multiple Cloudstack API endpoints * scale up the size of the management+workload cluster so that we touch multiple Cloudstack API endpoints @@ -177,5 +177,5 @@ simple flow cluster creation/scaleu/deletion across multiple Cloudstack API endp ## Other approaches explored -1. Another direction we can go to support this feature is to refactor the entire EKS-A codebase so that instead of all the failure domains existing inside the CloudstackDatacenterConfig object, each CloudstackDatacenterConfig itself corresponds with a single failure domain. Then, the top level EKS-A Cluster object could be refactored to have a list of DatacenterRefs instead of a single one. However, this approach feels extremely invasive to the product and does not provide tangible value to the other providers. -2. Lastly, we can consider the option of introducing a new DatacenterConfig object which represents not one Cloudstack "Availability Zone", but multiple. However, the issue here is that the CloudstackDatacenterConfig already has a list of CloudstackZone objects, so we're essentially already supporting multiple of what could be interpreted as Availability Zones. Adding additional attributes to that concept is a more natural extension of the API, instead of defining a new type +1. Another direction we can go to support this feature is to refactor the entire EKS-A codebase so that instead of all the AvailabilityZones existing inside the CloudstackDatacenterConfig object, each CloudstackDatacenterConfig itself corresponds with a single AvailabilityZone. Then, the top level EKS-A Cluster object could be refactored to have a list of DatacenterRefs instead of a single one. However, this approach feels extremely invasive to the product and does not provide tangible value to the other providers. +2. Additionally, we can consider the option of introducing a new DatacenterConfig object which represents not one Cloudstack Availability Zone, but multiple. However, the issue here is that the CloudstackDatacenterConfig already has a list of CloudstackZone objects, so we're essentially already supporting multiple of what could be interpreted as Availability Zones. Adding additional attributes to that concept is a more natural extension of the API, instead of defining a new type From 0e2b6107637afdba90d965eb1632738abdc89830 Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Thu, 16 Jun 2022 10:07:46 -0400 Subject: [PATCH 15/18] Update cloudstack-multiple-endpoints.md --- designs/cloudstack-multiple-endpoints.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index 6974ec5afe1c..45e00efab81f 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -62,6 +62,8 @@ ManagementApiEndpoint string `json:"managementApiEndpoint"` We would instead propose to remove all the existing attributes and instead, simply include a list of AvailabilityZone objects, where each AvailabilityZone object looks like ``` +// Name would be used to match the availability zone defined in the datacenter config to the credentials passed in from the cloud-config ini file +Name string `json:"name"` // Domain contains a grouping of accounts. Domains usually contain multiple accounts that have some logical relationship to each other and a set of delegated administrators with some authority over the domain and its subdomains // This field is considered as a fully qualified domain name which is the same as the domain path without "ROOT/" prefix. For example, if "foo" is specified then a domain with "ROOT/foo" domain path is picked. // The value "ROOT" is a special case that points to "the" ROOT domain of the CloudStack. That is, a domain with a path "ROOT/ROOT" is not allowed. @@ -140,6 +142,8 @@ api-url = http://172.16.0.3:8080/client/api ... ``` +Where the Section names (i.e. Global, AvailabilityZone1, etc.) correspond to the Availability Zone names + We are also exploring converting the ini file to a yaml input file which contains a list of credentials and their associated endpoints. Either way, this environment variable would be passed along to CAPC and used by the CAPC controller just like it is currently. From 8fd1cb404025152cf5535e2d02f214fd445fce4d Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Thu, 16 Jun 2022 14:37:09 -0400 Subject: [PATCH 16/18] Update cloudstack-multiple-endpoints.md --- designs/cloudstack-multiple-endpoints.md | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index 45e00efab81f..1c172184ec9c 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -60,7 +60,25 @@ Account string `json:"account,omitempty"` ManagementApiEndpoint string `json:"managementApiEndpoint"` ``` -We would instead propose to remove all the existing attributes and instead, simply include a list of AvailabilityZone objects, where each AvailabilityZone object looks like +We would instead propose to gradually deprecate all the existing attributes and instead, simply include a list of AvailabilityZone objects like so + +``` +type CloudStackDatacenterConfigSpec struct { + // Deprecated + Domain string `json:"domain,omitempty"` + // Deprecated + Zones []CloudStackZone `json:"zones,omitempty"` + // Deprecated + Account string `json:"account,omitempty"` + // Deprecated + ManagementApiEndpoint string `json:"managementApiEndpoint,omitempty"` + // List of AvailabilityZones to distribute VMs across - corresponds to a list of CAPI failure domains + AvailabilityZones []CloudStackAvailabilityZone `json:"availabilityZones,omitempty"` +} +``` + +where each AvailabilityZone object looks like + ``` // Name would be used to match the availability zone defined in the datacenter config to the credentials passed in from the cloud-config ini file Name string `json:"name"` From 510db1c07e5706977e35f11aa25ac33dbcbaf3ed Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Thu, 16 Jun 2022 14:39:49 -0400 Subject: [PATCH 17/18] Update cloudstack-multiple-endpoints.md --- designs/cloudstack-multiple-endpoints.md | 45 +++++++++++++----------- 1 file changed, 25 insertions(+), 20 deletions(-) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index 1c172184ec9c..644914aaeed7 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -50,14 +50,16 @@ will be available on *all* the Cloudstack API endpoints. ### Interface changes Currently, the CloudstackDataCenterConfig spec contains: ``` -// Domain contains a grouping of accounts. Domains usually contain multiple accounts that have some logical relationship to each other and a set of delegated administrators with some authority over the domain and its subdomains -Domain string `json:"domain"` -// Zones is a list of one or more zones that are managed by a single CloudStack management endpoint. -Zones []CloudStackZone `json:"zones"` -// Account typically represents a customer of the service provider or a department in a large organization. Multiple users can exist in an account, and all CloudStack resources belong to an account. Accounts have users and users have credentials to operate on resources within that account. If an account name is provided, a domain must also be provided. -Account string `json:"account,omitempty"` -// CloudStack Management API endpoint's IP. It is added to VM's noproxy list -ManagementApiEndpoint string `json:"managementApiEndpoint"` +type CloudStackDatacenterConfigSpec struct { + // Domain contains a grouping of accounts. Domains usually contain multiple accounts that have some logical relationship to each other and a set of delegated administrators with some authority over the domain and its subdomains + Domain string `json:"domain"` + // Zones is a list of one or more zones that are managed by a single CloudStack management endpoint. + Zones []CloudStackZone `json:"zones"` + // Account typically represents a customer of the service provider or a department in a large organization. Multiple users can exist in an account, and all CloudStack resources belong to an account. Accounts have users and users have credentials to operate on resources within that account. If an account name is provided, a domain must also be provided. + Account string `json:"account,omitempty"` + // CloudStack Management API endpoint's IP. It is added to VM's noproxy list + ManagementApiEndpoint string `json:"managementApiEndpoint"` +} ``` We would instead propose to gradually deprecate all the existing attributes and instead, simply include a list of AvailabilityZone objects like so @@ -80,18 +82,20 @@ type CloudStackDatacenterConfigSpec struct { where each AvailabilityZone object looks like ``` -// Name would be used to match the availability zone defined in the datacenter config to the credentials passed in from the cloud-config ini file -Name string `json:"name"` -// Domain contains a grouping of accounts. Domains usually contain multiple accounts that have some logical relationship to each other and a set of delegated administrators with some authority over the domain and its subdomains -// This field is considered as a fully qualified domain name which is the same as the domain path without "ROOT/" prefix. For example, if "foo" is specified then a domain with "ROOT/foo" domain path is picked. -// The value "ROOT" is a special case that points to "the" ROOT domain of the CloudStack. That is, a domain with a path "ROOT/ROOT" is not allowed. -Domain string `json:"domain"` -// Zones is a list of one or more zones that are managed by a single CloudStack management endpoint. -Zone CloudStackZone `json:"zone"` -// Account typically represents a customer of the service provider or a department in a large organization. Multiple users can exist in an account, and all CloudStack resources belong to an account. Accounts have users and users have credentials to operate on resources within that account. If an account name is provided, a domain must also be provided. -Account string `json:"account,omitempty"` -// CloudStack Management API endpoint's IP. It is added to VM's noproxy list -ManagementApiEndpoint string `json:"managementApiEndpoint"` +type CloudStackAvailabilityZone struct { + // Name would be used to match the availability zone defined in the datacenter config to the credentials passed in from the cloud-config ini file + Name string `json:"name"` + // Domain contains a grouping of accounts. Domains usually contain multiple accounts that have some logical relationship to each other and a set of delegated administrators with some authority over the domain and its subdomains + // This field is considered as a fully qualified domain name which is the same as the domain path without "ROOT/" prefix. For example, if "foo" is specified then a domain with "ROOT/foo" domain path is picked. + // The value "ROOT" is a special case that points to "the" ROOT domain of the CloudStack. That is, a domain with a path "ROOT/ROOT" is not allowed. + Domain string `json:"domain"` + // Zones is a list of one or more zones that are managed by a single CloudStack management endpoint. + Zone CloudStackZone `json:"zone"` + // Account typically represents a customer of the service provider or a department in a large organization. Multiple users can exist in an account, and all CloudStack resources belong to an account. Accounts have users and users have credentials to operate on resources within that account. If an account name is provided, a domain must also be provided. + Account string `json:"account,omitempty"` + // CloudStack Management API endpoint's IP. It is added to VM's noproxy list + ManagementApiEndpoint string `json:"managementApiEndpoint"` +} ``` and we would parse these resources and pass them into CAPC by modifying the templates we have currently implemented. We can then use this new model to read in credentials, perform pre-flight checks, plumb data to CAPC, and support upgrades in the controller. The goal would be to make these new resources backwards compatible via code @@ -105,6 +109,7 @@ CAPC currently utilizes them to distribute machines across CloudStack Zones. How 2. Cloudstack domain 3. Cloudstack zone 4. Cloudstack account +5. A unique name You can find more information about these Cloudstack resources [here](http://docs.cloudstack.apache.org/en/latest/conceptsandterminology/concepts.html#cloudstack-terminology) From 8445b50b0974c76f06f245d6ef1f64522029a1d8 Mon Sep 17 00:00:00 2001 From: Max Dribinsky Date: Fri, 17 Jun 2022 09:36:20 -0400 Subject: [PATCH 18/18] Update cloudstack-multiple-endpoints.md --- designs/cloudstack-multiple-endpoints.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/designs/cloudstack-multiple-endpoints.md b/designs/cloudstack-multiple-endpoints.md index 644914aaeed7..f24a30d02deb 100644 --- a/designs/cloudstack-multiple-endpoints.md +++ b/designs/cloudstack-multiple-endpoints.md @@ -19,7 +19,7 @@ In CAPC, we are considering addressing the problem by extending our use of the c As a Kubernetes administrator I want to: -* Perform preflight checks when creating/upgrading clusters which span across multiple failure domains +* support validation of my cluster and environment across multiple failure domains before creating/upgrading/deleting my cluster * Create EKS Anywhere clusters which span across multiple failure domains * Upgrade EKS Anywhere clusters which span across multiple failure domains * Delete EKS Anywhere clusters which span across multiple failure domains @@ -29,7 +29,6 @@ As a Kubernetes administrator I want to: **In scope** * Add support for create/upgrade/delete of EKS-A clusters across multiple Cloudstack API endpoints -* Add test environment for CI/CD e2e tests which can be used as a second Cloudstack API endpoint **Not in scope** @@ -41,9 +40,8 @@ As a Kubernetes administrator I want to: ## Overview of Solution -We propose to take the least invasive solution of repurposing the CloudstackDataCenterConfig to point to multiple Availability Zones, each of which contains the necessary -information for interacting with a Cloudstack failure domain. The assumption is that the necessary Cloudstack resources (i.e. image, computeOffering, diskOffering, network, etc.) -will be available on *all* the Cloudstack API endpoints. +We propose to take the least invasive solution of repurposing the CloudstackDataCenterConfig to point to multiple Availability Zones, each of which contains the necessary Cloudstack resources (i.e. image, computeOffering, diskOffering, network, etc.). In order for this to work, all the necessary Cloudstack resources (i.e. image, computeOffering, diskOffering, network, etc.) +will need to be available on *all* the Cloudstack API endpoints. We will validate this prior to create/upgrade. ## Solution Details @@ -202,6 +200,8 @@ simple flow cluster creation/scaling/deletion across multiple Cloudstack API end * scale down the size of the management+workload cluster so that we touch multiple Cloudstack API endpoints * delete cluster +In order to achieve this e2e test, we'll need to introduce a new test environment for CI/CD e2e tests which can be used as a second Cloudstack API endpoint + ## Other approaches explored 1. Another direction we can go to support this feature is to refactor the entire EKS-A codebase so that instead of all the AvailabilityZones existing inside the CloudstackDatacenterConfig object, each CloudstackDatacenterConfig itself corresponds with a single AvailabilityZone. Then, the top level EKS-A Cluster object could be refactored to have a list of DatacenterRefs instead of a single one. However, this approach feels extremely invasive to the product and does not provide tangible value to the other providers.