Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(eks): fix helm deploy for public-ecr repositories #23176

Closed

Conversation

HannesBBR
Copy link

@HannesBBR HannesBBR commented Nov 30, 2022

Fixes #23052

Using the default region causes aws ecr-public get-login-password --region {region} to fail, which then breaks the further steps of the helm deploy process. See the issue #23052 for more details.

According to the ECR docs the region should be either us-east-1 or us-west-2 for public-ecr.

Note that the command also requires additional permissions to succeed, for which I added the Managed Policy AmazonElasticContainerRegistryPublicReadOnly to the handler lambda role.

I've tested the solution in our environment by updating the lambda-handler code and permissions accordingly with the proposed changes here, and now the helm deploy step succeeds, e.g.:

[INFO]	2022-11-30T10:44:02.061Z	6a4afdfa-0505-4d5f-b670-bc5734eee730	b'Login Succeeded\nPulled: public.ecr.aws/kubecost/cost-analyzer:1.97.0\nDigest: sha256:40886873c16ebba885bbb179e38aaea006ed42cd965e6995701dc127e1c80c9e\n'
[INFO]	2022-11-30T10:44:02.061Z	6a4afdfa-0505-4d5f-b670-bc5734eee730	b'Login Succeeded\nPulled: public.ecr.aws/kubecost/cost-analyzer:1.97.0\nDigest: sha256:40886873c16ebba885bbb179e38aaea006ed42cd965e6995701dc127e1c80c9e\n'

I also updated the integration tests to check for the addition of the managed policy, but did not run the actual integration tests yet. I tried running them from gitpod, but since the accounts in our environment have SCPs that only allow deploying to a single region, I wasn't able to complete all integration tests in time before gitpod closes me off. If someone could help out running the integration tests, that would be great 🙏


All Submissions:

Adding new Construct Runtime Dependencies:

  • This PR adds new construct runtime dependencies following the process described here

New Features

  • Have you added the new feature to an integration test?
    • Did you use yarn integ to deploy the infrastructure and generate the snapshot (i.e. yarn integ without --dry-run)?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link

gitpod-io bot commented Nov 30, 2022

@github-actions github-actions bot added bug This issue is a bug. effort/small Small work item – less than a day of effort p2 beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK labels Nov 30, 2022
@aws-cdk-automation aws-cdk-automation requested a review from a team November 30, 2022 14:19
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.

@HannesBBR HannesBBR marked this pull request as ready for review November 30, 2022 14:26
@HannesBBR
Copy link
Author

As mentioned above, I wasn't able to run the integration tests timely on our own environment using gitpod. If somebody could run them for me, it would be greatly appreciated 🙏

@comcalvi
Copy link
Contributor

Could you add a new integration test, ideally for the reproduction code in the first post of the linked issue? I can run the integ test.

Comment on lines -112 to +113
logger.info("Found AWS public repository, will use default region as deployment")
region = os.environ.get('AWS_REGION', 'us-east-1')
logger.info("Found AWS public repository, will use region 'us-east-1' as deployment")
region = 'us-east-1'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a change to the unit tests for this; could you add a unit test for this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the previous commits for this lambda handler but didn't see any unit tests (unless I missed those) for these functions, so not sure what to add exactly here. What kind of unit test are you looking for?

Note that the integration test https://github.com/aws/aws-cdk/blob/main/packages/%40aws-cdk/aws-eks/test/integ.eks-helm-asset.ts#L54 should already cover the behaviour change here as well, which I think makes more sense than a unit test in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, this is a custom resource lambda. We have no unit tests for these, so disregard this.

@HannesBBR
Copy link
Author

Could you add a new integration test, ideally for the reproduction code in the first post of the linked issue? I can run the integ test.

I think the integration test defined here should already cover this change https://github.com/aws/aws-cdk/blob/main/packages/%40aws-cdk/aws-eks/test/integ.eks-helm-asset.ts#L54.

Note that the original issue normally would only occur when running this test in a region not us-east-1 (as then the authentication to the public ecr would fail with the original lambda code). Is there a way to force an integration test to run in a specific region?

Copy link
Contributor

@comcalvi comcalvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct that the integ test covers this, but it must have been run in us-east-1 by change for it to work. Please update it to force it to run twice (once in us-east-1 and once in us-west-2) by either making two Apps or two IntegTests.

Comment on lines -112 to +113
logger.info("Found AWS public repository, will use default region as deployment")
region = os.environ.get('AWS_REGION', 'us-east-1')
logger.info("Found AWS public repository, will use region 'us-east-1' as deployment")
region = 'us-east-1'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's two valid regions, us-east-1 and us-west-2. If someone today is using us-west-2, then this change will force them to redeploy to update to us-east-1. We try to avoid forcing new deployments in new releases, so can you update this to check if AWS_REGION is us-west-2, and if it is, continue to use us-west-2?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like region is only used for grabbing temp ECR password and pass it to helm, and not subsequent commands or any SDK calls, so shouldn't impact rendered resources?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comcalvi

According to the Amazon ECR public registries documentation the region should always be us-east-1 for the ecr-public get-login-password operation.

https://docs.aws.amazon.com/AmazonECR/latest/public/public-registries.html#public-registry-auth
When authenticating to a public registry, always authenticate to the us-east-1 Region when using the AWS CLI.

so I think we should always make it us-east-1 here, even deploying in us-east-2.

elif registry.startswith(public_ecr):
logger.info("Found AWS public repository, will use default region as deployment")
region = os.environ.get('AWS_REGION', 'us-east-1')
cmnd = [
f"aws ecr-public get-login-password --region {region} | " \
f"helm registry login --username AWS --password-stdin {public_ecr}; helm pull {repository} --version {version} --untar"
]
else:
logger.error("OCI repository format not recognized, falling back to helm pull")
cmnd = ['helm', 'pull', repository, '--version', version, '--untar']

@comcalvi comcalvi self-assigned this Dec 7, 2022
@aws-cdk-automation
Copy link
Collaborator

This PR has been in the BUILD FAILING state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

@aws-cdk-automation
Copy link
Collaborator

This PR has been deemed to be abandoned, and will be automatically closed. Please create a new PR for these changes if you think this decision has been made in error.

@aws-cdk-automation aws-cdk-automation added the closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. label Dec 30, 2022
@pahud pahud reopened this Feb 3, 2023
@aws-cdk-automation
Copy link
Collaborator

This PR has been deemed to be abandoned, and will be automatically closed. Please create a new PR for these changes if you think this decision has been made in error.

@pahud pahud reopened this Feb 6, 2023
@pahud pahud removed the closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. label Feb 6, 2023
@pahud pahud added the pr-linter/do-not-close The PR linter will not close this PR while this label is present label Feb 6, 2023
@pahud
Copy link
Contributor

pahud commented Feb 6, 2023

reopening this PR with branch update.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: ab79ebb
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@pahud
Copy link
Contributor

pahud commented Feb 6, 2023

Hi @HannesBBR

Do you have the capacity to update the failed tests with integ-runner --update-on-failed ? We need pass the build before we are able to advance this PR. Thanks.

@aws-cdk-automation
Copy link
Collaborator

This PR has been deemed to be abandoned, and will be automatically closed. Please create a new PR for these changes if you think this decision has been made in error.

@aws-cdk-automation aws-cdk-automation added the closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. label Feb 7, 2023
@aws-cdk-automation
Copy link
Collaborator

The pull request linter fails with the following errors:

❌ Fixes must contain a change to an integration test file and the resulting snapshot.

PRs must pass status checks before we can provide a meaningful review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. effort/small Small work item – less than a day of effort p2 pr-linter/do-not-close The PR linter will not close this PR while this label is present
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aws-eks: addHelmChart() fails with public.ecr.aws
5 participants