Skip to content

Commit

Permalink
Merge branch 'release/0.1.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
stumpylog committed Apr 30, 2023
2 parents 2422ee1 + cb0c967 commit 9c75a26
Show file tree
Hide file tree
Showing 21 changed files with 822 additions and 296 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@ ENV/
env.bak/
venv.bak/

# ide
.idea

# mypy
.mypy_cache/
.dmypy.json
Expand All @@ -55,4 +58,5 @@ dmypy.json
# ruff linter
.ruff_cache

# secrets
token.txt
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# CHANGELOG

## 0.1.0

- Initial tagged release of the action
4 changes: 3 additions & 1 deletion Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,14 @@ verify_ssl = true
name = "pypi"

[packages]
httpx = "*"
httpx = {version = "*", extras = ["http2", "brotli"]}
github-action-utils = "*"

[dev-packages]
ruff = "*"
black = "*"
pre-commit = "*"
mypy = "*"

[requires]
python_version = "3.10"
218 changes: 190 additions & 28 deletions Pipfile.lock

Large diffs are not rendered by default.

65 changes: 58 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,13 @@
## The Problem
# Image Cleaner Action

This repository contains two actions for solving problems with the GitHub Container Registry.

These

## Ephemeral Image Removal

Many actions in a repository will end up creating a Docker image stored on the GitHub Container Registry
(ghcr.io). This is quite useful, as another developer can pull the image to test your build code or a
(ghcr.io). This is quite useful, as another developer can pull the image to test your code or a
user can confirm the fix worked as expected. The image may be built on each push to a certain named branch or
when a pull request is created or updated.

Expand All @@ -10,10 +16,55 @@ need to keep the Docker image around, accessible and cluttering up your packages
user to locate the last released tags. If you're paying for the storage space, each image takes up some
amount of storage as well.

In an ideal world, there would be a retention policy or reaper configuration to easily remove images
based on some configuration.
You end up with a situation like this:

![Repository with many stale images](./imgs/stale-images.png)

That's the problem this action attempts to solve. This action aims to simplify the cleaning of containers
which are meant to be ephemeral. Once their job is completed, they don't need to exist in the registry.

This action correlates an image to its source, either a branch or a pull request. If the branch is
deleted or the pull request has closed, the image is un-versioned using the REST API.

This action plays nicely with the untagged cleaner, as un-versioning doesn't remove the actual image.

### Features

- Safe by default: only takes action after you directly tell it to
- Verbose: Every action taken is logged, including with enough information to restore a deletion
- Flexible: handles branch based or PR based naming

### Usage

For details on using the action, see [the README for the action](./untagged/README.md)

## Untagged Images Cleanup

When a new image is built, tagged and pushed to replace an existing tag, the original
image doesn't get removed from the registry.

You end up with a situation like this:

![Repository with many untagged images](./imgs/mang-tags.png)

These untagged images are still accessible using the `sha256:` digest of the image as a tag,
but most people are never using such tags. And the long lived tags, such as your releases
will always remain accessible via the `sha256:`

It's not directly a problem, but it is untidy. If you're paying for the storage space, each
image takes up some amount of storage as well. In an ideal world, there would be a retention
policy or reaper configuration to easily remove images based on some configuration. But as
of now, GitHub doesn't provide such tools.

This action therefore handles deleting the untagged package versions using the REST API.

### Features

- Safe by default: only takes action after you directly tell it to
- Verbose: Every action taken is logged, including with enough information to restore a deletion
- Handles multi-architecture and regular images, even mixed in the same package
- Handles packages owned by an organization or a user

## The Solution
### Usage

This actions aims to simplify the cleaning of containers which are meant to be ephemeral. Once
their job is completed, they don't need to exist in the registry.
For details on using the action, see [the README for the action](./ephemeral/README.md)
36 changes: 0 additions & 36 deletions README_untagged.md

This file was deleted.

13 changes: 13 additions & 0 deletions ephemeral/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Ephemeral Image Cleaner Action

| Input | Type | Description |
| ------------ | ------- | ------------------------------------------------------------------------------------------------------- |
| token | string | A Personal Access Token with OAuth scope for packages:delete (if delete is set) |
| owner | string | The owner of the package |
| is_org | boolean | If the owner is a organization, this must be set True |
| package_name | string | The name of the package to run against |
| do_delete | boolean | If set True, the action will actually delete the package |
| log_level | string | The logging level, based on Python log levels (defaults to "info") |
| repo_name | string | The repository which is the source of the images |
| scheme | string | One of "branch" or "pull_request", describing how the images have been named |
| match_regex | string | A regular expression, with matching group(s) to extract a pull request number (if using "pull_request") |
46 changes: 30 additions & 16 deletions ephemeral/action.yml
Original file line number Diff line number Diff line change
@@ -1,23 +1,34 @@
name: 'Ephemeral Image Cleaner'
description: 'A GitHub Action that cleans up the ghcr.io registry of old images, once the image source is gone'
inputs:
scheme:
description: 'Choose the image naming scheme either "branch" or "pull_request"'
token:
description: 'The Personal Access Token for deleting packages'
required: true
default: "branch"
package_name:
description: 'The name of the container package to clean'
owner:
description: 'The owner of the package'
required: true
is_org:
description: 'True if the package owner is an organization, False otherwise'
required: true
default: "false"
tag_match_regex:
package_name:
description: 'The name of the container package to clean'
required: true
repo_name:
description: 'Name of the repository for pull requests or branches'
required: true
scheme:
description: 'Choose the image naming scheme either "branch" or "pull_request"'
required: true
default: "branch"
match_regex:
description: 'The regular expression to use for matching your temporary image tagging scheme'
required: true
default: "$^"
do_delete:
description: 'If true, actually delete packages'
required: true
default: "false"
log_level:
description: 'Control the log level'
default: "info"
Expand All @@ -34,32 +45,35 @@ runs:
name: Install pipenv
shell: bash
run: |
pip install --user pipenv==2023.3.20
pip3 --quiet install --user pipenv==2023.4.20
-
name: Install dependencies
shell: bash
run: |
cd ${{ github.action_path }}/..
pipenv --python ${{ steps.setup-python.outputs.python-version }} sync
pipenv --quiet --python ${{ steps.setup-python.outputs.python-version }} sync
-
name: List installed dependencies
shell: bash
run: |
cd ${{ github.action_path }}/..
pipenv --python ${{ steps.setup-python.outputs.python-version }} run pip list
pipenv --quiet --python ${{ steps.setup-python.outputs.python-version }} run pip list
-
name: Clean the images
shell: bash
id: get-square
run: |
cd ${{ github.action_path }}/..
pipenv \
pipenv --quiet \
--python ${{ steps.setup-python.outputs.python-version }} \
run \
${{ github.action_path }}/../main_ephemeral.py \
--token ${{ inputs.token }} \
--owner ${{ inputs.owner }} \
--is-org ${{ inputs.is_org }} \
--name ${{ inputs.package_name }} \
--delete ${{ inputs.do_delete }} \
--loglevel ${{ inputs.log_level }}
--token "${{ inputs.token }}" \
--owner "${{ inputs.owner }}" \
--scheme "${{ inputs.scheme }}" \
--match-regex "${{ inputs.match_regex }}" \
--is-org "${{ inputs.is_org }}" \
--name "${{ inputs.package_name }}" \
--delete "${{ inputs.do_delete }}" \
--loglevel "${{ inputs.log_level }}" \
--repo "${{ inputs.repo_name }}"
33 changes: 19 additions & 14 deletions github/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,34 +8,35 @@
"""
import logging

import github_action_utils as gha_utils
import httpx

logger = logging.getLogger(__name__)


class GithubApiBase:
"""
A base class for interacting with the Github API. It
A base class for interacting with the GitHub API. It
will handle the session and setting authorization headers.
"""

API_BASE_URL = "https://api.github.com"

def __init__(self, token: str) -> None:
self._token = token
self._client: httpx.Client | None = None

def __enter__(self):
"""
Sets up the required headers for auth and response
type from the API
"""
self._client = httpx.Client()
self._client.headers.update(
{
# Create the client for connection pooling, add headers for type
# version and authorization
self._client: httpx.Client = httpx.Client(
base_url=self.API_BASE_URL,
timeout=30.0,
headers={
"Accept": "application/vnd.github.v3+json",
"Authorization": f"token {self._token}",
"X-GitHub-Api-Version": "2022-11-28",
},
)

def __enter__(self):
return self

def __exit__(self, exc_type, exc_val, exc_tb):
Expand All @@ -52,15 +53,17 @@ def __exit__(self, exc_type, exc_val, exc_tb):
self._client.close()
self._client = None

def _read_all_pages(self, endpoint):
def _read_all_pages(self, endpoint: str, query_params: dict | None = None):
"""
Helper function to read all pages of an endpoint, utilizing the
next.url until exhausted. Assumes the endpoint returns a list
"""
internal_data = []
if query_params is None:
query_params = {}

while True:
resp = self._client.get(endpoint)
resp = self._client.get(endpoint, params=query_params)
if resp.status_code == 200:
internal_data += resp.json()
if "next" in resp.links:
Expand All @@ -69,7 +72,9 @@ def _read_all_pages(self, endpoint):
logger.debug("Exiting pagination loop")
break
else:
logger.warning(f"Request to {endpoint} return HTTP {resp.status_code}")
msg = f"Request to {endpoint} return HTTP {resp.status_code}"
gha_utils.error(message=msg, title=f"HTTP Error {resp.status_code}")
logger.error(msg)
resp.raise_for_status()

return internal_data
Expand Down
15 changes: 11 additions & 4 deletions github/branches.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import functools
import logging
import re

from github.base import GithubApiBase
from github.base import GithubEndpointResponse
Expand All @@ -19,6 +21,10 @@ def __init__(self, data: dict) -> None:
def __str__(self) -> str:
return f"Branch {self.name}"

@functools.cache
def matches(self, pattern: str) -> bool:
return re.match(pattern, self.name) is not None


class GithubBranchApi(GithubApiBase):
"""
Expand All @@ -28,17 +34,18 @@ class GithubBranchApi(GithubApiBase):
"""

API_ENDPOINT = "/repos/{OWNER}/{REPO}/branches"

def __init__(self, token: str) -> None:
super().__init__(token)

self._ENDPOINT = "https://api.github.com/repos/{OWNER}/{REPO}/branches"

def branches(self, owner: str, repo: str) -> list[GithubBranch]:
"""
Returns all current branches of the given repository owned by the given
owner or organization.
"""
# The environment GITHUB_REPOSITORY already contains the owner in the correct location
endpoint = self._ENDPOINT.format(OWNER=owner, REPO=repo)
internal_data = self._read_all_pages(endpoint)
internal_data = self._read_all_pages(
self.API_ENDPOINT.format(OWNER=owner, REPO=repo),
)
return [GithubBranch(branch) for branch in internal_data]
Loading

0 comments on commit 9c75a26

Please sign in to comment.