Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-endpoint design for cloudstack provider #2399
Multi-endpoint design for cloudstack provider #2399
Changes from 7 commits
6990200
fc70d05
938b80a
e2b2889
89d0b7e
cdb7bd5
5874c75
a03aaf7
3255478
3eb2cb3
0d50f7c
ad2f0b2
242485f
5645c85
0e2b610
8fd1cb4
510db1c
8445b50
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to treat each
FailureDomain
as a datacenter? If so, is the reason we don't have multiple cloudstackdatacenterconfig purely engineering efforts too high?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This topic is discussed below in "Other approaches considered". There are many instances in the EKS-A codebase where only a single datacenterconfig object is expected to be present, and decisions are based on the Kind of that one datacenterconfig object. To refactor the entire code base only for the benefit of a single provider feels wasteful of engineering resources and more likely to introduce bugs.
With the current approach, we already kind of have multiple failure domains under a CloudstackDatacenterConfig object, only they're limited to the "zone" dimension. We want to expand that to also support the account, domain, and endpoint dimension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By way of comparison, CAPV supports CAPI Failure Domains via node labeling (see Node Zones/Regions Topology). None the less, EKS-A expects those zones/regions to be managed through a single Datacenter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These types of changes are breaking changes. Especially if we don't have a feature flag for Cloudstack, what about introducing a new object for this use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this list look like here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have a feature flag for Cloudstack currently. What sort of new object would you introduce?
Currently the CloudstackDatacenterConfig object contains account, domain, zone, managementApiEndpoint, and zones. Zones are a proxy for the old concept of failure domains. Instead, we would propose that the CloudstackDatacenterConfig just contains a single entry, which is a list of FailureDomain objects. Then, each FailureDomain is independent and contains an account, domain, zone, and managementApiEndpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out https://github.com/aws/eks-anywhere/pull/2412/files for the new type definitions. We'll also have a transformer to convert the old attributes to using the new ones, under AvailabilityZone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be
Zones []CloudstackZone
? each domain can have multiple zonings right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. I believe not. Each failure domain can only have a single zone. If a customer wants a cluster spread across multiple zones, those would be two separate Failure Domains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. The customer ask is to enhance the CAPC Failure Domain, which currently is equivalent to ACS Zone to be Endpoint+Zone+Domain+Account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious how to map this failureDomain list to CAPC? Can we maybe have some examples here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we pass data to CAPC for zones in https://github.com/aws/eks-anywhere/blob/main/pkg/providers/cloudstack/config/template-cp.yaml#L38. We will do the same thing for Failure Domains. A failure domain should be the same as a zone is now, but also include account/domain and a url for the management host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe now we also support reference by ID. If we are removing support for ID, we should probably include that plan in this doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could include a preflight check which fails on the following conditions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work, for sure.
However, is the benefit worth all the effort? I'm not necessarily concerned about writing that validation, but just about the burden of maintaining support for names in the codebase, but only partially.
That seems like extra unnecessary work.
Just for my understanding, why would specifying the name wouldn't work for multi availability zone? Shouldn't it work as long as the resource exists with the same name in all zones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, we can also keep the existing preflight check logic which supports both name and ID. It'll simply fail when it tries to look up a resource by ID in a Cloudstack endpoint where that resource ID does not exist. That should be sufficient right? And that way we can maintain support for the ID feature while still continuing to add support for multi-az
Yes, you're right. Specifying name is the only way that users can specify resources across multiple availability zones. The issue is when users try specifying a resource ID, since that may vary between availability zones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I got it in reverse :(
Yeah, leaving the validation as is works for me, great idea. It still prevents users from submitting an invalid configuration and it doesn't add extra work neither now nor for maintenance.
Whenever we move to
v1alpha2
, we should probably consider just removing the support for ID.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what abt user specifies a file path to the .ini and pass it as ENV?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could do that as well if you prefer. Although it might require some changes in the e2e test framework. I think there we're using the encoded value instead of any specific file. I'm not sure how/if we could download the file locally and then use it in the e2e tests but it does seem easier for the customer.
@vivek-koppuru any thoughts on this? I know we've discussed how to have customers pass data in at length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both options could work right? What are we doing for snow again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for snow we pass in file path as ENV vars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From some discussions internally, it feels like storing credentials in a transient environment variable is more secure than in a file written to disk. That is why we prefer to go the env var route, although it is slightly more effort for the customer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can still throw a warning if new clusters are created with the old fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. In summary, we will continue to support the old fields until we can deprecate them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may actually provide value as we talked about wanting to support multiple environments in the future. However, I can see why it's a bigger step with less data for a specific provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments. Let's discuss in more detail tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you listed the alternative solution here.
However, this seems like it would be bleeding a provider specific topology into the main cluster object.
If we want to make a more holistic API change that includes other providers, we would need to consider what's the pattern that makes sense for them and find the common denominator among all. And I don't think we have done that.