Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement multi region support for cpa #289

Merged
merged 1 commit into from
Dec 1, 2023
Merged

Conversation

shenmo3
Copy link
Contributor

@shenmo3 shenmo3 commented Aug 3, 2023

Description

This PR adds support for multi region configuration in CloudProviderAccount CRD.

Changes

  1. Change account config to have multiple service configs based on regions.
  2. Modify webhook to validate all regions in the account config. For AWS it checks if region is supported or not, for Azure it only checks if its empty or not. Also added region validation in service config creation, where both AWS and Azure are checked if region is supported.
  3. Adding handling to create AWS service config for each region and single Azure service config with multi-region filter.
  4. Change account operations to support multiple service configs and run in parallel, such as inventory poll and cloud sync.
  5. Add multi error in multi region operations, so success regions will be processed and error regions will be reported separately.
  6. Change security group operations to retrieve regions using vpc/vnet IDs instead of from cloud account.
  7. Change Azure cloud sync process to use ASG ID internally when syncing to avoid ASGs with the same name in different region being grouped into one.
  8. Update unit-tests to work with multi region.

@shenmo3 shenmo3 self-assigned this Aug 3, 2023
@shenmo3 shenmo3 force-pushed the multi-region branch 2 times, most recently from 531e968 to 3672cdc Compare August 7, 2023 07:35
if strings.Compare(existingConfig.endpoint, newConfig.endpoint) != 0 {
credsChanged = true
awsPluginLogger().Info("Endpoint url updated", "account", accountName)
}
sort.Strings(existingConfig.regions)
sort.Strings(newConfig.regions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if we need to convert newConfig.regions to lowercase or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually added to lower in aws_account_impl.go and azure_account_impl.go. This converts regions to lower before any processing so we don't need to worry about to lower in actual logics. However regions from the cloud, either in VM, VNET or region list still needs to be converted to lower.

@@ -200,11 +209,7 @@ func (ec2Cfg *ec2ServiceConfig) getVpcPeers(vpcId string) []string {
// getInstances gets instances from cloud matching the given selector configuration.
func (ec2Cfg *ec2ServiceConfig) getInstances(namespacedName *types.NamespacedName) ([]*ec2.Instance, error) {
var instances []*ec2.Instance
filters, found := ec2Cfg.instanceFilters[*namespacedName]
if found && len(filters) != 0 {
awsPluginLogger().V(1).Info("Fetching vm resources from cloud", "account", ec2Cfg.accountNamespacedName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any specific reason as to why you removed this log message?

for key := range regions {
supportedRegions = append(supportedRegions, key)
awsRegionMap := endpoints.AwsPartition().Regions()
awsRegionMap["all"] = endpoints.Region{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not supporting the keyword "all" for now. Please remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -37,7 +37,11 @@ func (c *awsCloud) CreateSecurityGroup(securityGroupIdentifier *cloudresource.Cl
defer accCfg.UnlockMutex()

cloudSgName := securityGroupIdentifier.GetCloudName(membershipOnly)
ec2Service := accCfg.GetServiceConfig().(*ec2ServiceConfig)
region := accCfg.FindRegion(vpcID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be return an error, rather than relying on empty value.

@shenmo3 shenmo3 force-pushed the multi-region branch 2 times, most recently from d30566b to 754cf3e Compare August 9, 2023 07:14
@@ -223,7 +223,9 @@ func (a *AccountManager) RemoveResourceFiltersFromAccount(accNamespacedName *typ
return fmt.Errorf(fmt.Sprintf("failed to delete selector %v, account %v: %v",
selectorNamespacedName, accNamespacedName, errorMsgAccountPollerNotFound))
}
accPoller.mutex.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to lock here?
Since we are not modifying the accPoller structure, we are accessing vmStore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, there is a race condition that needs this lock to be avoided. Will move the lock to the top of the func to make sure we are in lock when modifying anything.

@@ -285,7 +287,9 @@ func (a *AccountManager) removeAccountPoller(namespacedName *types.NamespacedNam
delete(a.accPollers, *namespacedName)

accPoller.stopPoller()
accPoller.mutex.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part actually doesn't need a lock since the poller is already stopped, will remove it.

@@ -145,7 +145,7 @@ func (p *accountPoller) doAccountPolling() {
p.updateAccountStatus(p.cloudInterface)
}()

cloudInventory, err := p.cloudInterface.GetCloudInventory(p.accountNamespacedName)
cloudInventory, err := p.cloudInterface.GetAccountCloudInventory(p.accountNamespacedName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason for changing the function name?
Seems a bit off while reading it.
If you want to have account, then may be try GetCloudInventoryByAccout()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it because service config also has func GetCloudInventory, but thats just for a single service config. Since account config can now have multiple service configs and this func combines all service configs' inventory, I want to change this func name to differentiate the two.

Don't quite like ByAccount also, since we are already got a specific account config, how about GetCloudInventoryForAccount?

@@ -23,6 +25,9 @@ import (

// AddProviderAccount adds and initializes given account of a cloud provider.
func (c *awsCloud) AddProviderAccount(client client.Client, account *crdv1alpha1.CloudProviderAccount) error {
for idx := range account.Spec.AWSConfig.Region {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying the retrieved CR here, seems inappropriate.
Going forward, we may consume the region field before calling this function.


// getRegion returns the region of the ec2 client.
func (ec2Wrapper *awsEC2WrapperImpl) getRegion() string {
return *ec2Wrapper.ec2.Config.Region
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are dereferencing the pointer, check if there id any scenario when region could be empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a check, but for AWS clients, the region is a required field.

return setAccountConfig
}

func (h *awsCloudCommonHelperImpl) GetCloudConfigComparatorFunc() internal.CloudConfigComparatorFunc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
func (h *awsCloudCommonHelperImpl) GetCloudConfigComparatorFunc() internal.CloudConfigComparatorFunc {
func (h *awsCloudCommonHelperImpl) GetAccountConfigComparatorFunc() internal.CloudConfigComparatorFunc {

// create ec2 sdk api client.
apiClient, err := service.compute()
if err != nil {
err = fmt.Errorf("error creating ec2 sdk api client for account: %s, region: %s, err: %v", accountNamespacedName.String(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try/simulae a scenario where this function returns a multi error (i.e. multiple errors) and check how it would look on the CPA status field?


awsServiceClientCreator, err := awsServicesHelper.newServiceSdkConfigProvider(awsAccountCredentials)
// check regions valid and process regions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// check regions valid and process regions.
// check valid regions.

for listResultIterator.More() {
nextResult, err := listResultIterator.NextPage(ctx)
if err != nil {
return nil, fmt.Errorf("failed to iterate list of asgs, reason %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of asgs, location

@shenmo3 shenmo3 marked this pull request as ready for review August 9, 2023 16:28
Copy link

github-actions bot commented Nov 8, 2023

This PR is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

Signed-off-by: Alexander Liu <alliu@vmware.com>
Signed-off-by: alliu <alliu@vmware.com>
@shenmo3
Copy link
Contributor Author

shenmo3 commented Nov 16, 2023

/nephe-test-e2e-all

Copy link
Contributor

@reachjainrahul reachjainrahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

@reachjainrahul reachjainrahul merged commit 9faeba2 into main Dec 1, 2023
@reachjainrahul reachjainrahul deleted the multi-region branch December 1, 2023 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants