Multi-endpoint design for cloudstack provider #2399

maxdrib · 2022-06-13T20:18:51Z

Issue #, if available:
#2406

Description of changes:
This design encapsulates a proposed approach for refactoring the CloudStack provider to support creating eks-a clusters across multiple Cloudstack endpoints with different sets of credentials.

Testing (if applicable):

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov · 2022-06-13T20:26:37Z

Codecov Report

Merging #2399 (8445b50) into main (dc30f16) will increase coverage by 0.71%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #2399      +/-   ##
==========================================
+ Coverage   56.07%   56.78%   +0.71%     
==========================================
  Files         303      306       +3     
  Lines       24645    24937     +292     
==========================================
+ Hits        13819    14160     +341     
+ Misses       9533     9462      -71     
- Partials     1293     1315      +22

Impacted Files	Coverage Δ
pkg/providers/tinkerbell/assert.go	`77.77% <0.00%> (-17.82%)`	⬇️
pkg/providers/tinkerbell/validate.go	`78.51% <0.00%> (-8.99%)`	⬇️
pkg/providers/tinkerbell/template.go	`28.36% <0.00%> (-4.10%)`	⬇️
pkg/providers/tinkerbell/tinkerbell.go	`34.93% <0.00%> (-3.43%)`	⬇️
.../api/v1alpha1/tinkerbelltemplateconfig_defaults.go	`99.00% <0.00%> (-1.00%)`	⬇️
pkg/api/v1alpha1/tinkerbellmachineconfig_types.go	`3.07% <0.00%> (-0.32%)`	⬇️
pkg/providers/tinkerbell/stack/stack.go	`82.03% <0.00%> (-0.09%)`	⬇️
pkg/task/task.go	`87.50% <0.00%> (ø)`
pkg/providers/snow/fetch.go	`100.00% <0.00%> (ø)`
pkg/api/v1alpha1/cluster_types.go	`74.42% <0.00%> (ø)`
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dc30f16...8445b50. Read the comment docs.

jiayiwang7 · 2022-06-14T14:58:53Z

designs/cloudstack-multiple-endpoints.md

+//
+Domain string `json:"domain"`
+// Zones is a list of one or more zones that are managed by a single CloudStack management endpoint.
+Zone CloudStackZone `json:"zone"`


shouldn't this be Zones []CloudstackZone ? each domain can have multiple zonings right?

That's a good question. I believe not. Each failure domain can only have a single zone. If a customer wants a cluster spread across multiple zones, those would be two separate Failure Domains

Yes, exactly. The customer ask is to enhance the CAPC Failure Domain, which currently is equivalent to ACS Zone to be Endpoint+Zone+Domain+Account.

jiayiwang7 · 2022-06-14T16:08:26Z

designs/cloudstack-multiple-endpoints.md

+ManagementApiEndpoint string `json:"managementApiEndpoint"`
+```
+
+We would instead propose to remove all the existing attributes and instead, simply include a list of FailureDomain objects, where each FailureDomain object looks like


would it make sense to treat each FailureDomain as a datacenter? If so, is the reason we don't have multiple cloudstackdatacenterconfig purely engineering efforts too high?

This topic is discussed below in "Other approaches considered". There are many instances in the EKS-A codebase where only a single datacenterconfig object is expected to be present, and decisions are based on the Kind of that one datacenterconfig object. To refactor the entire code base only for the benefit of a single provider feels wasteful of engineering resources and more likely to introduce bugs.

With the current approach, we already kind of have multiple failure domains under a CloudstackDatacenterConfig object, only they're limited to the "zone" dimension. We want to expand that to also support the account, domain, and endpoint dimension

By way of comparison, CAPV supports CAPI Failure Domains via node labeling (see Node Zones/Regions Topology). None the less, EKS-A expects those zones/regions to be managed through a single Datacenter.

jiayiwang7 · 2022-06-14T16:09:58Z

designs/cloudstack-multiple-endpoints.md

+ManagementApiEndpoint string `json:"managementApiEndpoint"`
+```
+
+and we would parse these resources and pass them into CAPC by modifying the templates we have currently implemented. We can then use this new model to read in credentials, perform pre-flight checks, plumb data to CAPC, and support upgrades in the controller. The goal would be to make these new resources backwards compatible via code


curious how to map this failureDomain list to CAPC? Can we maybe have some examples here?

Currently, we pass data to CAPC for zones in https://github.com/aws/eks-anywhere/blob/main/pkg/providers/cloudstack/config/template-cp.yaml#L38. We will do the same thing for Failure Domains. A failure domain should be the same as a zone is now, but also include account/domain and a url for the management host.

jiayiwang7 · 2022-06-14T16:54:39Z

designs/cloudstack-multiple-endpoints.md

+### Cloudstack credentials
+
+
+In a multi-endpoint Cloudstack cluster, each endpoint may have its own credentials. We propose that Cloudstack credentials will be passed in via environment variable in the same way as they are currently,


what abt user specifies a file path to the .ini and pass it as ENV?

I think we could do that as well if you prefer. Although it might require some changes in the e2e test framework. I think there we're using the encoded value instead of any specific file. I'm not sure how/if we could download the file locally and then use it in the e2e tests but it does seem easier for the customer.

@vivek-koppuru any thoughts on this? I know we've discussed how to have customers pass data in at length.

Both options could work right? What are we doing for snow again?

for snow we pass in file path as ENV vars

From some discussions internally, it feels like storing credentials in a transient environment variable is more secure than in a file written to disk. That is why we prefer to go the env var route, although it is slightly more effort for the customer

vivek-koppuru · 2022-06-14T17:13:17Z

designs/cloudstack-multiple-endpoints.md

+ManagementApiEndpoint string `json:"managementApiEndpoint"`
+```
+
+We would instead propose to remove all the existing attributes and instead, simply include a list of FailureDomain objects, where each FailureDomain object looks like


These types of changes are breaking changes. Especially if we don't have a feature flag for Cloudstack, what about introducing a new object for this use case?

How would this list look like here?

These types of changes are breaking changes. Especially if we don't have a feature flag for Cloudstack, what about introducing a new object for this use case?

We do have a feature flag for Cloudstack currently. What sort of new object would you introduce?

How would this list look like here?

Currently the CloudstackDatacenterConfig object contains account, domain, zone, managementApiEndpoint, and zones. Zones are a proxy for the old concept of failure domains. Instead, we would propose that the CloudstackDatacenterConfig just contains a single entry, which is a list of FailureDomain objects. Then, each FailureDomain is independent and contains an account, domain, zone, and managementApiEndpoint

Check out https://github.com/aws/eks-anywhere/pull/2412/files for the new type definitions. We'll also have a transformer to convert the old attributes to using the new ones, under AvailabilityZone

vivek-koppuru · 2022-06-14T17:18:45Z

designs/cloudstack-multiple-endpoints.md

+1. Make all the fields optional and see if customers have the old fields set or the new ones
+2. Introduce an eks-a version bump with conversion webhooks
+
+Between these two approaches, I would take the first and then deprecate the legacy fields in a subsequent release to simplify the code paths.


I think we can still throw a warning if new clusters are created with the old fields.

That makes sense to me. In summary, we will continue to support the old fields until we can deprecate them.

vivek-koppuru · 2022-06-14T17:19:48Z

designs/cloudstack-multiple-endpoints.md

+## Other approaches explored
+
+Another direction we can go to support this feature is to refactor the entire EKS-A codebase so that instead of all the failure domains existing inside the CloudstackDatacenterConfig
+object, each CloudstackDatacenterConfig itself corresponds with a single failure domain. Then, the top level EKS-A Cluster object could be refactored to have a list of DatacenterRefs instead


It may actually provide value as we talked about wanting to support multiple environments in the future. However, I can see why it's a bigger step with less data for a specific provider.

Thanks for the comments. Let's discuss in more detail tomorrow.

I like that you listed the alternative solution here.

However, this seems like it would be bleeding a provider specific topology into the main cluster object.
If we want to make a more holistic API change that includes other providers, we would need to consider what's the pattern that makes sense for them and find the common denominator among all. And I don't think we have done that.

pmotyka · 2022-06-14T18:27:52Z

designs/cloudstack-multiple-endpoints.md

+
+The mangement API endpoint for Cloudstack is a potential points of failure. If one endpoint goes down, then control of all of everything goes down. So, we want to spread our services across many Cloudstack endpoints and hosts to protect against that.
+
+For scalability, multiple Cloudstack endpoints will likely be required for storage and API endpoint throughput for our customer. Just one cluster creation generates probably a thousand API calls to ACS. There are many ways to mitigate this, but adding more Cloudstack hosts and endpoints is a fairly foolproof way to do so. Then, there’s the size and performance of the underlying database that each Cloudstack instance runs on.


Is it worth instrumenting API calls to ACS during cluster lifecycle events to have precise volumetric data?

designs/cloudstack-multiple-endpoints.md

g-gaston · 2022-06-14T20:51:04Z

designs/cloudstack-multiple-endpoints.md

+## Other approaches explored
+
+Another direction we can go to support this feature is to refactor the entire EKS-A codebase so that instead of all the failure domains existing inside the CloudstackDatacenterConfig
+object, each CloudstackDatacenterConfig itself corresponds with a single failure domain. Then, the top level EKS-A Cluster object could be refactored to have a list of DatacenterRefs instead


I like that you listed the alternative solution here.

However, this seems like it would be bleeding a provider specific topology into the main cluster object.
If we want to make a more holistic API change that includes other providers, we would need to consider what's the pattern that makes sense for them and find the common denominator among all. And I don't think we have done that.

designs/cloudstack-multiple-endpoints.md

Co-authored-by: Guillermo Gaston <guillermogastonlorente@gmail.com>

g-gaston · 2022-06-17T21:24:44Z

/approve

eks-distro-bot · 2022-06-17T21:24:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: g-gaston

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [g-gaston]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* Initial draft of multi-endpoint design for cloudstack * Adding additional details * Update cloudstack-multiple-endpoints.md * Update cloudstack-multiple-endpoints.md * Update cloudstack-multiple-endpoints.md * Update cloudstack-multiple-endpoints.md * Update cloudstack-multiple-endpoints.md * Update cloudstack-multiple-endpoints.md * Apply suggestions from code review Co-authored-by: Guillermo Gaston <guillermogastonlorente@gmail.com> * Update cloudstack-multiple-endpoints.md * Update cloudstack-multiple-endpoints.md * Update cloudstack-multiple-endpoints.md * Update cloudstack-multiple-endpoints.md * Update cloudstack-multiple-endpoints.md * Update cloudstack-multiple-endpoints.md * Update cloudstack-multiple-endpoints.md * Update cloudstack-multiple-endpoints.md * Update cloudstack-multiple-endpoints.md Co-authored-by: Guillermo Gaston <guillermogastonlorente@gmail.com>

Initial draft of multi-endpoint design for cloudstack

6990200

eks-distro-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 13, 2022

maxdrib requested a review from wanyufe June 13, 2022 20:19

Adding additional details

fc70d05

maxdrib changed the title ~~Initial draft of multi-endpoint design for cloudstack~~ Multi-endpoint design for cloudstack provider Jun 13, 2022

maxdrib requested review from g-gaston and vivek-koppuru June 13, 2022 20:45

maxdrib added 3 commits June 14, 2022 10:26

Update cloudstack-multiple-endpoints.md

938b80a

Update cloudstack-multiple-endpoints.md

e2b2889

Update cloudstack-multiple-endpoints.md

89d0b7e

jiayiwang7 reviewed Jun 14, 2022

View reviewed changes

vivek-koppuru reviewed Jun 14, 2022

View reviewed changes

pmotyka reviewed Jun 14, 2022

View reviewed changes

designs/cloudstack-multiple-endpoints.md Outdated Show resolved Hide resolved

maxdrib added 3 commits June 14, 2022 16:11

Update cloudstack-multiple-endpoints.md

cdb7bd5

Update cloudstack-multiple-endpoints.md

5874c75

Update cloudstack-multiple-endpoints.md

a03aaf7

g-gaston reviewed Jun 14, 2022

View reviewed changes

maxdrib and others added 2 commits June 14, 2022 17:00

Apply suggestions from code review

3255478

Co-authored-by: Guillermo Gaston <guillermogastonlorente@gmail.com>

Update cloudstack-multiple-endpoints.md

3eb2cb3

maxdrib mentioned this pull request Jun 15, 2022

Defining new types for Cloudstack AvailabilityZone development #2412

Merged

maxdrib added 4 commits June 15, 2022 15:46

Update cloudstack-multiple-endpoints.md

0d50f7c

Update cloudstack-multiple-endpoints.md

ad2f0b2

Update cloudstack-multiple-endpoints.md

242485f

Update cloudstack-multiple-endpoints.md

5645c85

maxdrib requested review from g-gaston, vivek-koppuru and jiayiwang7 June 15, 2022 20:56

Update cloudstack-multiple-endpoints.md

0e2b610

maxdrib added 2 commits June 16, 2022 14:37

Update cloudstack-multiple-endpoints.md

8fd1cb4

Update cloudstack-multiple-endpoints.md

510db1c

vivek-koppuru approved these changes Jun 16, 2022

View reviewed changes

eks-distro-bot assigned vivek-koppuru Jun 16, 2022

eks-distro-bot added the lgtm label Jun 16, 2022

Update cloudstack-multiple-endpoints.md

8445b50

eks-distro-bot removed the lgtm label Jun 17, 2022

g-gaston approved these changes Jun 17, 2022

View reviewed changes

eks-distro-bot assigned g-gaston Jun 17, 2022

eks-distro-bot added the lgtm label Jun 17, 2022

eks-distro-bot added the approved label Jun 17, 2022

eks-distro-bot merged commit 5974edc into aws:main Jun 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-endpoint design for cloudstack provider #2399

Multi-endpoint design for cloudstack provider #2399

maxdrib commented Jun 13, 2022 •

edited

Loading

codecov bot commented Jun 13, 2022 •

edited

Loading

jiayiwang7 Jun 14, 2022

maxdrib Jun 14, 2022

jweite-amazon Jun 16, 2022

jiayiwang7 Jun 14, 2022

maxdrib Jun 14, 2022

jweite-amazon Jun 16, 2022

jiayiwang7 Jun 14, 2022

maxdrib Jun 14, 2022

jiayiwang7 Jun 14, 2022

maxdrib Jun 14, 2022

vivek-koppuru Jun 15, 2022

jiayiwang7 Jun 21, 2022

maxdrib Jun 21, 2022

vivek-koppuru Jun 14, 2022

vivek-koppuru Jun 14, 2022

maxdrib Jun 14, 2022

maxdrib Jun 15, 2022

vivek-koppuru Jun 14, 2022

maxdrib Jun 14, 2022

vivek-koppuru Jun 14, 2022

maxdrib Jun 14, 2022

g-gaston Jun 14, 2022

pmotyka Jun 14, 2022

g-gaston Jun 14, 2022

g-gaston commented Jun 17, 2022

eks-distro-bot commented Jun 17, 2022

		### Cloudstack credentials


		In a multi-endpoint Cloudstack cluster, each endpoint may have its own credentials. We propose that Cloudstack credentials will be passed in via environment variable in the same way as they are currently,


		The mangement API endpoint for Cloudstack is a potential points of failure. If one endpoint goes down, then control of all of everything goes down. So, we want to spread our services across many Cloudstack endpoints and hosts to protect against that.

		For scalability, multiple Cloudstack endpoints will likely be required for storage and API endpoint throughput for our customer. Just one cluster creation generates probably a thousand API calls to ACS. There are many ways to mitigate this, but adding more Cloudstack hosts and endpoints is a fairly foolproof way to do so. Then, there’s the size and performance of the underlying database that each Cloudstack instance runs on.

Multi-endpoint design for cloudstack provider #2399

Multi-endpoint design for cloudstack provider #2399

Conversation

maxdrib commented Jun 13, 2022 • edited Loading

codecov bot commented Jun 13, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

g-gaston commented Jun 17, 2022

eks-distro-bot commented Jun 17, 2022

maxdrib commented Jun 13, 2022 •

edited

Loading

codecov bot commented Jun 13, 2022 •

edited

Loading