Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awsx.ecr.RegistryImage deletion fails after credential refresh #1537

Open
rjhuijsman opened this issue Feb 26, 2025 · 1 comment
Open

awsx.ecr.RegistryImage deletion fails after credential refresh #1537

rjhuijsman opened this issue Feb 26, 2025 · 1 comment
Labels
awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). kind/bug Some behavior is incorrect or out of spec

Comments

@rjhuijsman
Copy link

Describe what happened

I'm using https://www.pulumi.com/registry/packages/awsx/api-docs/ecr/registryimage/ to manage Docker images stored in ECR. Normally, I can create and delete these resources correctly by running pulumi up. However, if I've been away long enough to need to refresh my AWS credentials, then awsx.ecr.RegistryImage deletions are (and remain) broken.

Here's an example timeline:

  1. On Monday:
  • I run aws sso login
  • I run pulumi up. A pulumi_awsx.ecr.RegistryImage is created.
  1. Later on Monday:
  • I edit my code to change the RegistryImage out for a different one (different Pulumi resource name, different image to push).
  • I run pulumi up again. This works: it creates a new pulumi_awsx.ecr.RegistryImage, and at the end of the run deletes the old RegistryImage.
  1. On Tuesday:
    • I run aws sso login again, because my credentials have expired.
    • I edit my code to change the RegistryImage out again, just like on Monday, again creating a whole new Pulumi resource and deleting the old one.
    • I run pulumi up again, just like on Monday. All AWS resources are updated correctly, except deleting the old pulumi_awsx.ecr.RegistryImage fails with a 403.

So to be explicit:

  • I am correctly authenticated on both days: I run aws sso login on both days, and all other AWS resources work.
  • If I run pulumi up 2x on the same day, then there's no problem with pulumi_awsx.ecr.RegistryImage - creation and deletion work. So my Pulumi code seems correct.
  • But on day 2, despite being logged in correctly and having correct Pulumi code, (only) deletion fails for (only) pulumi_awsx.ecr.RegistryImage.

My requirements.txt has up-to-date versions of the Pulumi SDKs:

pulumi>=3.149.0,<4.0.0
pulumi-aws>=6.23.0,<7.0.0
pulumi-awsx>=2.21.0,<3.0.0
pulumi-command>=1.0.1,<2.0.0

This issue is a blocker for our ability to use awsx.ecr.RegistryImage, and thereby makes it really hard for us to use AWS ECR via Pulumi.

As a workaround we are setting the keep_remotely=True setting on awsx.ecr.RegistryImage; that way we don't experience the 403, but leaves the image in the registry, which is not long-term feasible for us.

Sample program

Here is an example of Python code that exhibits this issue:

        repository = pulumi_aws.ecr.Repository(
           f"ecr-repository-{name}",
           name=repository_name,
           image_tag_mutability="IMMUTABLE",
           # Don't block deletion of the repository when the stack is deleted
           # just because we've pushed images to it.
           force_delete=True,
           opts=pulumi.ResourceOptions(
               parent=parent,
               provider=self._account_structure.provider,
           ),
       )

       # Load the image from its tar file into our local Docker daemon; this is
       # where Pulumi expects to find the image when it pushes it to the
       # registry.
       tar_content_hash = content_hash(source_tar_path)
       load_command = pulumi_command.local.Command(
           # Using the digest in the name ensures that if the image changes, a
           # new version of it will be loaded.
           f"{name}-{tar_content_hash}-load",
           create=pulumi.Output.concat(
               "docker load --input ", source_tar_path
           ),
           opts=pulumi.ResourceOptions(parent=parent),
       )
       # The `loaded_image_name` is the same every time, e.g. `mycoolimage:latest`.
       loaded_image_name = load_command.stdout.apply(
           lambda stdout:
           # Remove the last newline, if any.
           stdout.strip()
           # The last line of the output contains the image name.
           .split("\n")[-1]
           # It is the last word on that line.
           .split(" ")[-1]
       )
       registry_image = pulumi_awsx.ecr.RegistryImage(
           # Using a unique name for this resource means that every version of
           # the image will be a new `RegistryImage` resource, which we prefer
           # over replacing an existing `RegistryImage`: replacing first
           # deletes the old image, then creates the new one, which means that
           # there is a time period where the image is not available in ECR. If
           # the deployment were to fail during that time, the image would be
           # permanently unavailable. By creating a new resource for each
           # image, each image gets pushed, and only at the end of the Pulumi
           # run are the old images deleted (when Pulumi knows that their
           # resources are no longer being created).
           f"ecr-image-{name}-{tar_content_hash}",
           repository_url=repository.repository_url,
           source_image=loaded_image_name,
           # A unique tag that ensures we don't try to push the same tag twice
           # with different content.
           tag=tar_content_hash,
           opts=pulumi.ResourceOptions(
               parent=parent,
               provider=self._account_structure.provider,
           ),
       )

Log output

Diagnostics:
  docker:index:RegistryImage (ecr-image-myimage-267884495a165f874573dbf10f0a55d1ad2036099a03505cac7c6e4d4cf1aa9f):
    error:   sdk-v2/provider2.go:515: sdk.helper_schema: Got error deleting registry image: Got bad response from registry: 403 Forbidden: provider=docker@4.6.0
    error: deleting urn:pulumi:aws-test1::reboot-cloud-awsx:ecr:RegistryImage$docker:index/registryImage:RegistryImage::ecr-image-myimage-267884495a165f874573dbf10f0a55d1ad2036099a03505cac7c6e4d4cf1aa9f: 1 error occurred:
    	* Got error deleting registry image: Got bad response from registry: 403 Forbidden

Affected Resource(s)

awsx.ecr.RegistryImage, deletion only

Output of pulumi about

$ pulumi about
CLI
Version 3.100.0
Go Version go1.21.5
Go Compiler gc

Plugins
NAME VERSION
python unknown

Host
OS ubuntu
Version 20.04
Arch x86_64

This project is written in python: executable='/home/vscode/.rye/shims/python3' version='3.10.16'

Current Stack: [REDACTED]

TYPE URN
[REDACTED]

Found no pending operations associated with reboot-dev/aws-test1

Backend
Name pulumi.com
URL https://app.pulumi.com/[REDACTED]
User [REDACTED]
Organizations [REDACTED]
Token type personal

Dependencies:
NAME VERSION
build 1.0.3
certifi 2019.11.28
chardet 3.0.4
dbus-python 1.2.16
idna 2.8.0
isort 5.12.0
mypy 1.2.0
pip 23.0.1
Pygments 2.3.1
PyGObject 3.36.0
python-apt 2.0.1+ubuntu0.20.4.1
PyYAML 5.3.1
requests 2.22.0
requests-unixsocket 0.2.0
ruff 0.1.14
setuptools 65.5.0
six 1.14.0
urllib3 1.25.8
yapf 0.40.2

Pulumi locates its logs in /tmp by default

Additional context

From my naive external view, it looks like awsx.ecr.RegistryImage deletion is still using the old credentials, while e.g. creation uses new credentials. Could that be?

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@rjhuijsman rjhuijsman added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Feb 26, 2025
@rquitales
Copy link
Member

Thank you for reporting this issue. Unfortunately, there is no workaround at the moment due to a design limitation in how short-term credentials are stored.

The ECR registry credentials are generated using a Pulumi invoke function and stored in the nested Docker provider configuration. Since the credentials are fetched via an invoke and embedded in the provider configuration, they are not updated during refresh/destroy operations.

This issue will be resolved as part of the ongoing work to support running Pulumi programs during refresh and destroy operations.

@rquitales rquitales added awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). and removed needs-triage Needs attention from the triage team labels Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). kind/bug Some behavior is incorrect or out of spec
Projects
None yet
Development

No branches or pull requests

2 participants