Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifest Unknown After Cleanup on Skipped Tag, Amd64 Arch only #43

Closed
corinz opened this issue Aug 30, 2022 · 17 comments
Closed

Manifest Unknown After Cleanup on Skipped Tag, Amd64 Arch only #43

corinz opened this issue Aug 30, 2022 · 17 comments

Comments

@corinz
Copy link

corinz commented Aug 30, 2022

My container retention job meets expectations, except that every 7th day when it cleans up my "app" container, I am unable to pull the amd64 image. Though, the arm64 image pulls fine. Seems like this cleanup job is deleting a tag which my protected tag depends on? Bizarre behavior, as the tags that are being deleted are totally unrelated and deleting one tag shouldnt affect another. Any thoughts or insight here?

Thanks!

I get this error in my kube cluster every 7th day:

 Failed to pull image "ghcr.io/../app:development": rpc error: code = Unknown desc = manifest unknown

I cant replicate the error from m1 machine (arm arch) -- the pull is successful. From an amd64 machine, I am able to replicate the "manifest unknown" error.

docker pull ghcr.io/../app:development
development: Pulling from ../app
manifest unknown

My retention policy is set to every 7 days, and the "development" tag should be skipped. The tag that was cleaned up in the logs was a truncated hash.

name: Delete old unused GHCR container images 
on:
  schedule:
    - cron: '0 0 * * *'  # every day at midnight
  workflow_dispatch:

jobs:
  clean-ghcr:
    name: Delete old unused GHCR container images
    runs-on: ubuntu-latest
    steps:
      - name: Delete containers older than a week, ignore tags
        uses: snok/container-retention-policy@v1
        with:
          image-names: app
          cut-off: A week ago UTC
          account-type: org
          org-name: my-org
          keep-at-least: 3
          untagged-only: false
          skip-tags: latest, v*, dev*, gamma, beta, 1*, 2*, 3*, 4*, 5*, 6*
          token: ${{ secrets.TOKEN }}
@sondrelg
Copy link
Member

That's less than ideal 🙂 The logs don't indicate that the development image itself is deleted, right? Can't say that I've encountered anything like this myself, unfortunately.

@corinz
Copy link
Author

corinz commented Aug 30, 2022

@sondrelg thanks for your response.

No, the logs do not show that anything but the image with a sha tag have been deleted. Any ideas for debugging this?

@sondrelg
Copy link
Member

The action is really just a few API calls to the Github API, so if you can I think the best thing would be to authenticate locally, then maybe replicate the calls manually.

See:

Finally here is the Github API docs: https://docs.github.com/en/rest/packages#get-a-package-version-for-an-organization

If you find any issues, a PR would be more than welcome 🙏

@corinz
Copy link
Author

corinz commented Aug 31, 2022

@sondrelg Thanks for the info. What prevents the retention policy from deleting images that are depended on by multi-platform tagged versions?

If I understand correctly, the gh api will return a list of versions of a particular package, these versions will include untagged images that are potentially named in the manifest list for a multi-platform image. The retention policy may skip over a named tag, but it may include (for deletion) an image thats named in its manifest list. For example

dev:sha:abc123 {    <-- manifest list, dev tag and sha:abc123 image skipped for deletion
  archA: sha:foo,   <-- eligible for deletion?
  archB: sha:bar
}

If the example above represented a multi-plat manifest list, it would be preserved because it's tagged with "dev", but what about sha:foo and sha:bar images?

@sondrelg
Copy link
Member

I've never really used a multi-platform images, so it's very possible we need to add special handling for this case. If I understand you correctly, it sounds like taking manifest lists into consideration should be the default behavior. Currently no such behavior exists.

Do you have a real data example of what this looks like?

@corinz
Copy link
Author

corinz commented Aug 31, 2022

@sondrelg Start with this Dockerfile

FROM alpine
RUN mkdir foobar

Execute a multi platform build using Dockers Buildx builder:

docker buildx build --push --platform linux/arm64,linux/amd64 -t <YOUR_REPO_URL>/multi-arch-build .

Inspect the manifest

docker manifest inspect <YOUR_REPO_URL>/multi-arch-build

This will produce a result that looks like this

{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 735,
         "digest": "sha256:6619a5ea49cd7174ded29cf5f1c98c559be59edd862349fc3c6238eb6274d3f0",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 735,
         "digest": "sha256:24c08606be10f8db18e7f463e80fd2dc55a411f10d7a0d0beceab4591e3a6441",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      }
   ]
}

Notice the manifests array that includes 2 objects, one per arch. Each arch has its own container image referenced by the digest.

When we run this clean up job, we clean up those "child" images/digests because they are untagged. AFAIK, there is a simple solution to this. See this post (consider upvoting plz) docker/buildx#1301

@sondrelg
Copy link
Member

Upvoted 👍 I won't be able to look at this in depth for a few days, but I'll do a deep dive as soon as I can, if still needed. Certainly seems like I have all the information I need. In the meantime, as mentioned, contributions are always welcome 🙂

@corinz
Copy link
Author

corinz commented Aug 31, 2022

thanks @sondrelg I'm going to track this down with GHCR, and will contribute if possible.

@corinz
Copy link
Author

corinz commented Sep 1, 2022

@sondrelg Seems like github doesn't discriminate between a parent container or child container when using the Packages LIST API. What LIST fails to reveal is the graph/dependencies that exist behind the scenes in the container registry. Basically, to do a proper delete, the github api should be avoided, and the registry API should be used. See these api docs for what github is using behind the scenes to manage ghcr: https://github.com/distribution/distribution/blob/main/docs/spec/api.md#deleting-an-image

@sondrelg
Copy link
Member

Sorry, I think I missed your last message. I saw the response in the buildx issue, and agree a switch to this API seems like the right choice 👍

I'll be taking my holidays in a few days, so will have very limited capacity in the next 3 weeks. Are you free to work on this? If not, I guess we could create a new issue for this and get back to it when either one of us (or someone else) does have time 🙂

@corinz
Copy link
Author

corinz commented Oct 7, 2022

@sondrelg I won't have the personal time to do this for a while. But would be good to keep this issue in the backlog!

@Eddman
Copy link

Eddman commented Dec 6, 2022

Any news on this one. I just hit the same issue. We've disabled the second arch for the moment, but would like to use both in the future....

@sondrelg
Copy link
Member

sondrelg commented Dec 6, 2022

Haven't looked at this since October, mostly since it doesn't affect me personally yet. It will as soon as Github actions lets me build arm images on arm-runners 🙃

Would you be interested in implementing a fix @Eddman?

@xfoxfu
Copy link

xfoxfu commented Jul 18, 2023

A little question regarding the container registry API, it seems there is no API for listing all untagged manifests, right? So it still requires GitHub Packages API to list all the packages.

@mering
Copy link

mering commented Oct 31, 2023

This can be fixed by explicitly excluding untagged images referred in manifests of tagged images similar to https://github.com/Chizkiyahu/delete-untagged-ghcr-action/blob/278ac5c5ae16914324ba447591af23312af6c075/clean_ghcr.py#L137-L138.

@sondrelg
Copy link
Member

I see @corinz, the description in #43 (comment) is really helpful. After looking at this for a little bit, I think this should work as solution:

- name: Fetch SHAs for all associated multi-platform package versions
  id: multi-arch-digests
  run: |
    foo=$(docker manifest inspect ghcr.io/foo | jq -r '.manifests.[] | .digest' | paste -s -d ', ' -)
    bar=$(docker manifest inspect ghcr.io/bar | jq -r '.manifests.[] | .digest' | paste -s -d ', ' -)
    echo "multi-arch-digests=$foo,$bar" >> $GITHUB_OUTPUT

- uses: snok/container-retention-policy
  with:
    ...
    skip-shas: ${{ steps.multi-arch-digests.outputs.multi-arch-digests }}

This would mean implementing a new input for SHAs to avoid deleting, but that seems OK.

I want to release a v2 of the action soon where running a (much) smaller docker container is one of the main things I want to accomplish. Bundling the docker CLI in a container would be a bit of a nuisance, so I think this solution would solve things nicely, while keeping complexity low. Does anyone see any problems with it?

@sondrelg sondrelg mentioned this issue Jun 23, 2024
@sondrelg
Copy link
Member

The latest release adds a skip-shas input argument, which can be used to protect against deleting multi-platform images. Please see the new section in the readme for details, and let me know if anything is unclear.

The migration guide for v3 is included in the release post 👍

If you run into any issues, please share them in the issue opened for tracking the v3 release ☺️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants