Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(core): docker login to deployment account ECR occurs before asset is built #25894

Open
blimmer opened this issue Jun 7, 2023 · 9 comments
Open
Labels
@aws-cdk/core Related to core CDK functionality bug This issue is a bug. documentation This is a problem with documentation. effort/small Small work item – less than a day of effort p2

Comments

@blimmer
Copy link
Contributor

blimmer commented Jun 7, 2023

Describe the bug

Given a simple Dockerfile that pulls from a private ECR repository in the same account you're deploying to:

ARG AWS_ACCOUNT_NUMBER
ARG AWS_REGION
ARG REPO
ARG TAG

FROM ${AWS_ACCOUNT_NUMBER}.dkr.ecr.${AWS_REGION}.amazonaws.com/${REPO}:${TAG}

With a DockerImageAsset:

import * as cdk from 'aws-cdk-lib';
import { DockerImageAsset } from 'aws-cdk-lib/aws-ecr-assets';
import { Construct } from 'constructs';
import { join } from 'path';

export class CdkBugReportsStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    new DockerImageAsset(this, 'DockerImageAsset', {
      directory: join(__dirname, '..', 'assets', 'docker'),
      buildArgs: {
        "AWS_ACCOUNT_NUMBER": "123456789012",
        "AWS_REGION": "us-west-2",
        "REPO": "my-repo",
        "TAG": "latest"
      }
    })
  }
}

The cdk deploy will fail with a message that looks like this:

#3 [internal] load metadata for <ACCOUNT>.dkr.ecr.<REGION>.amazonaws.com/<REPO>:<TAG>
#3 ERROR: pulling from host<ACCOUNT>.dkr.ecr.<REGION>.amazonaws.com failed with status code [manifests <TAG>]: 403 Forbidden

So, in other words, the FROM in the Dockerfile cannot be resolved. The reason this happens is because the image publishing role (arn:aws:iam::<ACCOUNT>:role/cdk-hnb659fds-image-publishing-role-<ACCOUNT>-<REGION>) is used to login to docker before the image is built.

Therefore, it overrides the existing docker login you might have already done via:

> aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin <ACCOUNT>.dkr.ecr.<REGION>.amazonaws.com

And then it can't pull using existing credentials you've already set up.

Expected Behavior

I expected the system-level docker login to be respected during image build time so it could resolve the image from the private ECR repo.

I understand that the image-publishing-role needs to be assumed to push to the CDK assets ECR repository, but it feels like those credentials should only be used before calling docker push.

In other words, the flow looks like:

  1. Standard docker login happens as a setup step in my CI platform
  2. DockerImageAsset is built using the credentials from step 1.
  3. Existing docker login is backed up
  4. docker login occurs for image-publishing-role
  5. docker push the built asset
  6. Restore saved docker login credentials from step 3

Current Behavior

What's happening now appears to be:

  1. Standard docker login happens as a setup step in my CI platform
  2. docker login occurs for image-publishing-role
  3. DockerImageAsset is built using the credentials from step 2 (failure because image-publishing-role can't access the private ECR repo.

Reproduction Steps

blimmer/cdk-bug-reports#2 shows an example. You do need to manually push a latest tag to the repo to make it technically correct. However, you should still see the error even with an empty repo (you'll get a 403 error).

Possible Solution

If possible, system docker logins should be used to build the Docker images, not the image-publishing-role.

It might be challenging, however, to back up docker credentials, since there are a few different ways you can store those values.

Additional Information/Context

You can work around this issue by applying a policy to your private repository that allows the image-publishing-role access to the repo.

CDK CLI Version

2.83.0

Framework Version

No response

Node.js Version

18

OS

MacOS

Language

Typescript

Language Version

No response

Other information

No response

@blimmer blimmer added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 7, 2023
@github-actions github-actions bot added the @aws-cdk/aws-ecr-assets Related to AWS CDK Docker Image Assets label Jun 7, 2023
@peterwoodworth
Copy link
Contributor

I had to dig for a while to find the answer to this, we should document this. You're right, login occurs before the asset is built. But that's done intentionally, and there's a way to adjust that default. See the comment in the code here

// Default behavior is to login before build so that the Dockerfile can reference images in the ECR repo
// However, if we're in a pipelines environment (for example),
// we may have alternative credentials to the default ones to use for the build itself.
// If the special config file is present, delay the login to the default credentials until the push.
// If the config file is present, we will configure and use those credentials for the build.

You can configure a file which contains credential information, the CDK expects it to be here

/** Returns the presumed location of the CDK Docker credentials config file */
export function cdkCredentialsConfigFile(): string {
return process.env.CDK_DOCKER_CREDS_FILE ?? path.join((os.userInfo().homedir ?? os.homedir()).trim() || '/', '.cdk', 'cdk-docker-creds.json');
}

I didn't know we could have a config file for this, cool! I don't think we document this anywhere though

@peterwoodworth peterwoodworth added p1 needs-review effort/small Small work item – less than a day of effort documentation This is a problem with documentation. and removed needs-triage This issue or PR still needs to be triaged. labels Jun 8, 2023
@peterwoodworth peterwoodworth changed the title (ecr-assets): docker login to deployment account ECR occurs before asset is built (core): docker login to deployment account ECR occurs before asset is built Jun 8, 2023
@peterwoodworth peterwoodworth added @aws-cdk/core Related to core CDK functionality and removed @aws-cdk/aws-ecr-assets Related to AWS CDK Docker Image Assets labels Jun 8, 2023
@blimmer
Copy link
Contributor Author

blimmer commented Jun 9, 2023

Interesting - that's helpful that it exists already. I wonder, though, why doesn't it default to ~/.docker/config.json? That's where the default authentication is stored and I think it might "just work" with that default.

@iliapolo
Copy link
Contributor

@blimmer Were you able to resolve the issue with CDK_DOCKER_CREDS_FILE?

@iliapolo iliapolo added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-review labels Jun 14, 2023
@blimmer
Copy link
Contributor Author

blimmer commented Jun 14, 2023

Hey @iliapolo , I won't have time to check out the suggested workaround for some time due to a few tight deadlines. The workaround I provided in the description (granting the image publishing role access to the ECR repo) unblocked me for now.

I'm still curious to hear the CDK team's response to my question above. Why not default to using system docker credentials vs the publishing role? It feels like the expected behavior is inverted from the reality today.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jun 15, 2023
@joshua-haunty
Copy link

joshua-haunty commented Dec 28, 2023

It took me awhile to finally find someone with a similar issue to me. Thanks for bringing this issue up. I attempted to solve the 403 forbidden by using the CDK_DOCKER_CREDS_FILE env variable described above but did not have luck (although I didn't pursue it very thoroughly). What actually solved my issue was giving proper ECR permissions to the ECR publishing bootstrap cdk role that you @blimmer stated (specifically, access to two new ECR repositories that were outside the scope of our cdk setup that I switched our Dockerfiles to reference as their base image).

The knowledge gap for me was the OIDC authentication, role, policy, and trust relationship I was using for github actions to execute cdk deploy in a pipeline was not the actual role performing the docker build command when creating a lambda via lambda_.DockerImageFunction (because that role was using sts:assume to do all necessary cdk work). This meant my attempt to give that OIDC role ecr:* permissions did nothing to solve my issue, nor did authenticating to ECR with that role earlier in the job.

I would like to mention that I had trouble logging what role was attempting to create the dockerized lambda and execute the docker build command in the first place (which is why it took me this long to find this github issue). I probably would have gotten here sooner if I had changed the log verbosity during the deployment.

@colifran colifran added p2 and removed p1 labels Feb 15, 2024
@will7200
Copy link

will7200 commented May 17, 2024

Just wanted to say that CDK_DOCKER_CREDS_FILE does not work. My current work around is to pull the desired image before hitting cdk-assets.

[Container] 2024/05/17 18:34:19.009136 Running on CodeBuild On-demand
[Container] 2024/05/17 18:34:19.009149 Waiting for agent ping
[Container] 2024/05/17 18:34:19.211914 Waiting for DOWNLOAD_SOURCE
[Container] 2024/05/17 18:34:20.751952 Phase is DOWNLOAD_SOURCE
[Container] 2024/05/17 18:34:20.761618 CODEBUILD_SRC_DIR=/codebuild/output/src2092927603/src
[Container] 2024/05/17 18:34:20.762078 YAML location is /codebuild/readonly/buildspec.yml
[Container] 2024/05/17 18:34:20.763711 Setting HTTP client timeout to higher timeout for S3 source
[Container] 2024/05/17 18:34:20.763822 Processing environment variables
[Container] 2024/05/17 18:34:20.972663 No runtime version selected in buildspec.
[Container] 2024/05/17 18:34:21.024315 Moving to directory /codebuild/output/src2092927603/src
[Container] 2024/05/17 18:34:21.025827 Unable to initialize cache download: no paths specified to be cached
[Container] 2024/05/17 18:34:21.131149 Configuring ssm agent with target id: codebuild:-
[Container] 2024/05/17 18:34:21.156487 Successfully updated ssm agent configuration
[Container] 2024/05/17 18:34:21.156769 Registering with agent
[Container] 2024/05/17 18:34:21.199904 Phases found in YAML: 3
[Container] 2024/05/17 18:34:21.199919  PRE_BUILD: 3 commands
[Container] 2024/05/17 18:34:21.199924  INSTALL: 1 commands
[Container] 2024/05/17 18:34:21.199950  BUILD: 1 commands
[Container] 2024/05/17 18:34:21.200189 Phase complete: DOWNLOAD_SOURCE State: SUCCEEDED
[Container] 2024/05/17 18:34:21.200201 Phase context status code:  Message:
[Container] 2024/05/17 18:34:21.288625 Entering phase INSTALL
[Container] 2024/05/17 18:34:21.289039 Running command npm install -g cdk-assets@2
added 109 packages in 8s
[Container] 2024/05/17 18:34:42.134466 Phase complete: INSTALL State: SUCCEEDED
[Container] 2024/05/17 18:34:42.134494 Phase context status code:  Message:
[Container] 2024/05/17 18:34:42.166368 Entering phase PRE_BUILD
[Container] 2024/05/17 18:34:42.166917 Running command ACCOUNT_OWNER=`aws sts get-caller-identity --query 'Account' --output text`
[Container] 2024/05/17 18:34:57.488756 Running command aws ecr get-login-password | docker login -u AWS --password-stdin https://${ACCOUNT_OWNER}.dkr.ecr.${AWS_DEFAULT_REGION}.amazonaws.com
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
[Container] 2024/05/17 18:34:58.149278 Running command export CDK_DOCKER_CREDS_FILE=~/.docker/config.json
[Container] 2024/05/17 18:34:58.154418 Phase complete: PRE_BUILD State: SUCCEEDED
[Container] 2024/05/17 18:34:58.154442 Phase context status code:  Message:
[Container] 2024/05/17 18:34:58.188559 Entering phase BUILD
[Container] 2024/05/17 18:34:58.189069 Running command cdk-assets --path "assembly--14E3797B.assets.json" --verbose publish "-:--us-west-2"
verbose: Loaded manifest from assembly----/-.assets.json: 6 assets found
verbose: Applied selection: 1 assets selected.
info   : [0%] start: Publishing -:--us-west-2
verbose: [0%] check: Check -.dkr.ecr.us-west-2.amazonaws.com/cdk-hnb659fds-container-assets---us-west-2:-
error  : [100%] fail: Cannot convert undefined or null to object
Failure: TypeError: Cannot convert undefined or null to object
    at Function.keys (<anonymous>)
    at Docker.configureCdkCredentials (/usr/local/lib/node_modules/cdk-assets/lib/private/docker.js:114:32)
    at DockerFactory.forBuild (/usr/local/lib/node_modules/cdk-assets/lib/private/docker.js:180:53)
    at ContainerImageAssetHandler.build (/usr/local/lib/node_modules/cdk-assets/lib/private/handlers/container-images.js:23:65)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async AssetPublishing.publishAsset (/usr/local/lib/node_modules/cdk-assets/lib/publishing.js:123:17)
    at async AssetPublishing.publish (/usr/local/lib/node_modules/cdk-assets/lib/publishing.js:41:22)
    at async publish (/usr/local/lib/node_modules/cdk-assets/bin/publish.js:19:5)
    at async /usr/local/lib/node_modules/cdk-assets/bin/cdk-assets.js:32:9
    at async Object.handler (/usr/local/lib/node_modules/cdk-assets/bin/cdk-assets.js:56:9)
[Container] 2024/05/17 18:34:58.728495 Command did not exit successfully cdk-assets --path "assembly---Dev/-.assets.json" --verbose publish "-:--us-west-2" exit status 1
[Container] 2024/05/17 18:34:58.732469 Phase complete: BUILD State: FAILED
[Container] 2024/05/17 18:34:58.732488 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: cdk-assets --path "assembly---Dev/-.assets.json" --verbose publish "-:--us-west-2". Reason: exit status 1
[Container] 2024/05/17 18:34:58.760429 Entering phase POST_BUILD
[Container] 2024/05/17 18:34:58.763247 Phase complete: POST_BUILD State: SUCCEEDED
[Container] 2024/05/17 18:34:58.763259 Phase context status code:  Message:

@sthuber90
Copy link

For me the same. I cannot get it to work with CDK_DOCKER_CREDS_FILE. A working example in the docs would be really helpful @peterwoodworth

I've tried the DockerCredentials helper, created the file manually, following the structure described here, all without success. Only thing that works is pulling the image before running cdk deploy

@lorcanoeire
Copy link

To work around the issue, I had to resort to explicit docker pull (as mentioned above) commands. These commands are specified in the CDK pipeline property assetPublishingCodeBuildDefaults and added to the partialBuildSpec.

This is then run for each of the asset builds! Not ideal...

However, it is a workaround for now.

@ErmisCat
Copy link

ErmisCat commented Sep 7, 2024

When testing locally didn't face this issue, only in an azure devops ci/cd pipeline had this error. Adding the export before the cdk deploy worked.

aws ecr get-login-password --region $(region) | docker login --username AWS --password-stdin $(account).dkr.ecr.$(region).amazonaws.com
export CDK_DOCKER_CREDS_FILE=~/.docker/config.json     
cdk deploy --require-approval never

Just wanted to say that CDK_DOCKER_CREDS_FILE does not work. My current work around is to pull the desired image before hitting cdk-assets.

[Container] 2024/05/17 18:34:19.009136 Running on CodeBuild On-demand
[Container] 2024/05/17 18:34:19.009149 Waiting for agent ping
[Container] 2024/05/17 18:34:19.211914 Waiting for DOWNLOAD_SOURCE
[Container] 2024/05/17 18:34:20.751952 Phase is DOWNLOAD_SOURCE
[Container] 2024/05/17 18:34:20.761618 CODEBUILD_SRC_DIR=/codebuild/output/src2092927603/src
[Container] 2024/05/17 18:34:20.762078 YAML location is /codebuild/readonly/buildspec.yml
[Container] 2024/05/17 18:34:20.763711 Setting HTTP client timeout to higher timeout for S3 source
[Container] 2024/05/17 18:34:20.763822 Processing environment variables
[Container] 2024/05/17 18:34:20.972663 No runtime version selected in buildspec.
[Container] 2024/05/17 18:34:21.024315 Moving to directory /codebuild/output/src2092927603/src
[Container] 2024/05/17 18:34:21.025827 Unable to initialize cache download: no paths specified to be cached
[Container] 2024/05/17 18:34:21.131149 Configuring ssm agent with target id: codebuild:-
[Container] 2024/05/17 18:34:21.156487 Successfully updated ssm agent configuration
[Container] 2024/05/17 18:34:21.156769 Registering with agent
[Container] 2024/05/17 18:34:21.199904 Phases found in YAML: 3
[Container] 2024/05/17 18:34:21.199919  PRE_BUILD: 3 commands
[Container] 2024/05/17 18:34:21.199924  INSTALL: 1 commands
[Container] 2024/05/17 18:34:21.199950  BUILD: 1 commands
[Container] 2024/05/17 18:34:21.200189 Phase complete: DOWNLOAD_SOURCE State: SUCCEEDED
[Container] 2024/05/17 18:34:21.200201 Phase context status code:  Message:
[Container] 2024/05/17 18:34:21.288625 Entering phase INSTALL
[Container] 2024/05/17 18:34:21.289039 Running command npm install -g cdk-assets@2
added 109 packages in 8s
[Container] 2024/05/17 18:34:42.134466 Phase complete: INSTALL State: SUCCEEDED
[Container] 2024/05/17 18:34:42.134494 Phase context status code:  Message:
[Container] 2024/05/17 18:34:42.166368 Entering phase PRE_BUILD
[Container] 2024/05/17 18:34:42.166917 Running command ACCOUNT_OWNER=`aws sts get-caller-identity --query 'Account' --output text`
[Container] 2024/05/17 18:34:57.488756 Running command aws ecr get-login-password | docker login -u AWS --password-stdin https://${ACCOUNT_OWNER}.dkr.ecr.${AWS_DEFAULT_REGION}.amazonaws.com
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
[Container] 2024/05/17 18:34:58.149278 Running command export CDK_DOCKER_CREDS_FILE=~/.docker/config.json
[Container] 2024/05/17 18:34:58.154418 Phase complete: PRE_BUILD State: SUCCEEDED
[Container] 2024/05/17 18:34:58.154442 Phase context status code:  Message:
[Container] 2024/05/17 18:34:58.188559 Entering phase BUILD
[Container] 2024/05/17 18:34:58.189069 Running command cdk-assets --path "assembly--14E3797B.assets.json" --verbose publish "-:--us-west-2"
verbose: Loaded manifest from assembly----/-.assets.json: 6 assets found
verbose: Applied selection: 1 assets selected.
info   : [0%] start: Publishing -:--us-west-2
verbose: [0%] check: Check -.dkr.ecr.us-west-2.amazonaws.com/cdk-hnb659fds-container-assets---us-west-2:-
error  : [100%] fail: Cannot convert undefined or null to object
Failure: TypeError: Cannot convert undefined or null to object
    at Function.keys (<anonymous>)
    at Docker.configureCdkCredentials (/usr/local/lib/node_modules/cdk-assets/lib/private/docker.js:114:32)
    at DockerFactory.forBuild (/usr/local/lib/node_modules/cdk-assets/lib/private/docker.js:180:53)
    at ContainerImageAssetHandler.build (/usr/local/lib/node_modules/cdk-assets/lib/private/handlers/container-images.js:23:65)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async AssetPublishing.publishAsset (/usr/local/lib/node_modules/cdk-assets/lib/publishing.js:123:17)
    at async AssetPublishing.publish (/usr/local/lib/node_modules/cdk-assets/lib/publishing.js:41:22)
    at async publish (/usr/local/lib/node_modules/cdk-assets/bin/publish.js:19:5)
    at async /usr/local/lib/node_modules/cdk-assets/bin/cdk-assets.js:32:9
    at async Object.handler (/usr/local/lib/node_modules/cdk-assets/bin/cdk-assets.js:56:9)
[Container] 2024/05/17 18:34:58.728495 Command did not exit successfully cdk-assets --path "assembly---Dev/-.assets.json" --verbose publish "-:--us-west-2" exit status 1
[Container] 2024/05/17 18:34:58.732469 Phase complete: BUILD State: FAILED
[Container] 2024/05/17 18:34:58.732488 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: cdk-assets --path "assembly---Dev/-.assets.json" --verbose publish "-:--us-west-2". Reason: exit status 1
[Container] 2024/05/17 18:34:58.760429 Entering phase POST_BUILD
[Container] 2024/05/17 18:34:58.763247 Phase complete: POST_BUILD State: SUCCEEDED
[Container] 2024/05/17 18:34:58.763259 Phase context status code:  Message:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/core Related to core CDK functionality bug This issue is a bug. documentation This is a problem with documentation. effort/small Small work item – less than a day of effort p2
Projects
None yet
Development

No branches or pull requests

9 participants