Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci-automation/garbage_collect.sh: Add min age, remove orphan directories #1608

Merged
merged 3 commits into from
Jan 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 113 additions & 12 deletions ci-automation/garbage_collect.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,29 @@
#
# garbage_collect() should be called after sourcing.
#
# The garbage collector will remove artifacts of all NON-RELEASE versions from the build cache
# which BOTH
# * exceed the number of builds to keep (defaults to 50)
# AND
# * are older than the minimum purge age (14 days by default)
#
# Note that the min age threshold can lead to MORE than 50 builds being kept if this script
# is run with its default values.
#
# Additionally, the garbage collector will remove all artifacts and directories that do not have
# a version TAG in the scripts repository.
#
# OPTIONAL INPUT
# - Number of (recent) versions to keep. Defaults to 50.
# Explicitly setting this value will reset the minimum age (see below) to 0 days.
# - Minimum age of version tag to be purged, in days. Defaults to 14.
# Only artifacts older than this AND exceeding the builds to keep threshold
# will be removed.
# - PURGE_VERSIONS (Env variable). Space-separated list of versions to purge
# instead of all but the 50 most recent ones.
# Setting this will IGNORE minimum age and number of versions to keep.
# NOTE that only dev versions (not official releases) may be specified.
# This is to prevent accidental deletion of official release tags from the git repo.
# - DRY_RUN (Env variable). Set to "y" to just list what would be done but not
# actually purge anything.

Expand All @@ -38,26 +57,60 @@ function garbage_collect() {
# --

function _garbage_collect_impl() {
local keep="${1:-50}"
local keep="${1:-}"
local min_age_days="${2:-}"
local dry_run="${DRY_RUN:-}"
local purge_versions="${PURGE_VERSIONS:-}"

local versions_detected="$(git tag -l --sort=-committerdate \
| grep -E '(main|alpha|beta|stable|lts)-[0-9]+\.[0-9]+\.[0-9]+\-.*' \
| grep -vE '(-pro)$')"
# Set defaults; user-provided 'keep' has priority over default 'min_age_days'
if [ -n "${keep}" -a -z "${min_age_days}" ] ; then
min_age_days="0"
elif [ -z "${keep}" ] ; then
keep="50"
fi
if [ -z "${min_age_days}" ] ; then
min_age_days="14"
fi

echo "######## Full list of version(s) found ########"
echo "${versions_detected}" | awk '{printf "%5d %s\n", NR, $0}'
local min_age_date="$(date -d "${min_age_days} days ago" +'%Y-%m-%d')"
echo "######## Garbage collector starting ########"
echo
if [ -z "${purge_versions}" ] ; then
echo "Number of versions to keep: '${keep}'"
echo "Keep newer than: '${min_age_date}' (overrides number of versions to keep)"
fi
echo

if [ -z "${purge_versions}" ] ; then
keep="$((keep + 1))" # for tail -n+...
# Generate a list "<timestamp> | <tagname>" from all repo tags that look like dev versions
local versions_detected="$(git tag -l --sort=-committerdate \
--format="%(creatordate:format:%Y-%m-%d) | %(refname:strip=2)" \
| grep -E '.*\| (main|alpha|beta|stable|lts)-[0-9]+\.[0-9]+\.[0-9]+-.*' \
| grep -vE '(-pro)$')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why excluding the -pro build tag? Some artifact from the past?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to prevent deleting the release tags of the -pro versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit too all-catching as it will prevent GC of any dev build that ends with -pro (so a build tag like weekly-updates-like-a-pro won't be GC'd), but OTOH we tend not to put -pro into our build tags, so I suppose it's fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, the match could be more precise.


echo "######## Full list of version(s) and their creation dates ########"
echo
echo "${versions_detected}" | awk '{printf "%5d %s\n", NR, $0}'

# Filter minimum number of versions to keep, min age
purge_versions="$(echo "${versions_detected}" \
| tail -n+"${keep}")"
| awk -v keep="${keep}" -v min_age="${min_age_date}" '{
if (keep > 0) {
keep = keep - 1
next
}

if ($1 > min_age)
next

print $3
}')"
else
# make sure we only accept dev versions
# User-provided version list, make sure we only accept dev versions
purge_versions="$(echo "${purge_versions}" | sed 's/ /\n/g' \
| grep -E '(main|alpha|beta|stable|lts)-[0-9]+\.[0-9]+\.[0-9]+\-.*' \
| grep -vE '(-pro)$')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for excluding -pro from purges.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to rework tag removal and exclude any pattern that matches an official release version. Note that versions that look like official releases are skipped entirely - the garbage collector only cares about dev builds and nightlies. (Releases are supposed to be cleaned up manually, after pushing release binaries to the official mirrors.)

keep=0
fi

source ci-automation/ci_automation_common.sh
Expand All @@ -71,7 +124,7 @@ function _garbage_collect_impl() {
echo "(NOTE this is just a dry run since DRY_RUN=y)"
echo
fi
echo "${purge_versions}" | awk -v keep="${keep}" '{if ($0 == "") next; printf "%5d %s\n", NR + keep - 1, $0}'
echo "${purge_versions}" | awk '{if ($0 == "") next; printf "%5d %s\n", NR, $0}'
echo
echo

Expand All @@ -90,7 +143,7 @@ function _garbage_collect_impl() {
local os_docker_vernum="$(vernum_to_docker_image_version "${FLATCAR_VERSION}")"

# Remove container image tarballs and SDK tarball (if applicable)
#
# Keep in sync with "orphaned directories" clean-up below.
local rmpat=""
rmpat="${BUILDCACHE_PATH_PREFIX}/sdk/*/${os_vernum}/"
rmpat="${rmpat} ${BUILDCACHE_PATH_PREFIX}/containers/${os_docker_vernum}/flatcar-sdk-*"
Expand Down Expand Up @@ -144,6 +197,54 @@ function _garbage_collect_impl() {
fi
done

echo
echo "########################################"
echo
echo Checking for orphaned directories
echo

local dir=""
for dir in "sdk/amd64" \
"containers" \
"boards/amd64-usr" \
"boards/arm64-usr" \
"images/amd64" \
"images/arm64" \
"testing" \
; do
local fullpath="${BUILDCACHE_PATH_PREFIX}/${dir}"
echo
echo "## Processing '${fullpath}'"
echo "---------------------------"
local version=""
for version in $($sshcmd "${BUILDCACHE_USER}@${BUILDCACHE_SERVER}" "ls -1 ${BUILDCACHE_PATH_PREFIX}/${dir}"); do
if [ "${dir}" = "containers" ] && echo "${version/+/-}" | grep -qE '.*-github-.*'; then
echo "Ignoring github CI SDK container in '${fullpath}/${version}'."
echo "Github CI SDK artifacts are handled by 'garbage_collect_github_ci_sdk.sh'"
echo " in a later step".
continue
fi
if ! git tag -l | grep -q "${version/+/-}"; then
local o_fullpath="${fullpath}/${version}"
echo
echo "## No tag '${version/+/-}' for orphan directory '${o_fullpath}'; removing."
echo "## The following files will be removed ##"
$sshcmd "${BUILDCACHE_USER}@${BUILDCACHE_SERVER}" \
"ls -la ${o_fullpath} || true"

if [ "$dry_run" != "y" ] ; then
set -x
$sshcmd "${BUILDCACHE_USER}@${BUILDCACHE_SERVER}" \
"rm -rf ${o_fullpath} || true"
set +x
else
echo "## (DRY_RUN=y so not doing anything) ##"
fi
echo
fi
done
done

echo
echo "########################################"
echo
Expand All @@ -170,6 +271,6 @@ function _garbage_collect_impl() {
echo

source ci-automation/garbage_collect_github_ci_sdk.sh
garbage_collect_github_ci
garbage_collect_github_ci 1 "${min_age_days}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one SDK build to keep? Asking to make sure, especially that the default was 20. I do realize that probably more will be kept until they are older than 14 days or so.

Copy link
Member Author

@t-lo t-lo Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "1" just means that at least one is kept; for everything else the 14-day limit will apply. We don't build the SDK from GitHub actions that often so I thought that would be sensible.

}
# --
28 changes: 24 additions & 4 deletions ci-automation/garbage_collect_github_ci_sdk.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@
#
# OPTIONAL INPUT
# - Number of (recent) Github SDK builds to keep. Defaults to 20.
# - Minimum age of version tag to be purged, in days. Defaults to 14.
# Only artifacts older than this AND exceeding the builds to keep threshold
# will be removed.
# - DRY_RUN (Env variable). Set to "y" to just list what would be done but not
# actually purge anything.

Expand All @@ -34,8 +37,10 @@ function garbage_collect_github_ci() {

function _garbage_collect_github_ci_impl() {
local keep="${1:-20}"
local min_age_days="${2:-14}"
local dry_run="${DRY_RUN:-}"

local min_age_date="$(date -d "${min_age_days} days ago" +'%Y_%m_%d')"
# Example version string
# <a href="./3598.0.0-nightly-20230508-2100-github-2023_05_09__08_06_54/">
# <a href="./3598.0.0-nightly-20230508-2100-github-pr-12345-2023_05_09__08_06_54/">
Expand All @@ -49,15 +54,30 @@ function _garbage_collect_github_ci_impl() {
# 3. remove the "/"
local versions_sorted="$(echo "${versions_detected}" \
| sed 's/\(-github\(-pr-[0-9]*\)*-\)/\1\//' \
| sort -k 2 -t / \
| sort -k 2 -t / -r \
| sed 's:/::')"

echo
echo "Number of versions to keep: '${keep}'"
echo "Keep newer than: '${min_age_date}'"
echo

echo "######## Full list of version(s) found ########"
echo "${versions_sorted}" | awk '{printf "%5d %s\n", NR, $0}'

local tail_keep="$((keep + 1))" # for tail -n+...
local purge_versions
mapfile -t purge_versions < <(tail -n+"${tail_keep}" <<<"${versions_sorted}")
mapfile -t purge_versions < <(echo "${versions_sorted}" \
| awk -v keep="${keep}" -v min_age="${min_age_date}" '{
if (keep > 0) {
keep = keep - 1
next
}
ts = gensub(".*-github-([0-9_]+)__.*","\\1","g",$1)
if (ts > min_age)
next

print $1
}')

source ci-automation/ci_automation_common.sh
local sshcmd="$(gen_sshcmd)"
Expand All @@ -69,7 +89,7 @@ function _garbage_collect_github_ci_impl() {
echo "(NOTE this is just a dry run since DRY_RUN=y)"
echo
fi
printf '%s\n' "${purge_versions[@]}" | awk -v keep="${keep}" '{if ($0 == "") next; printf "%5d %s\n", NR + keep, $0}'
printf '%s\n' "${purge_versions[@]}" | awk '{if ($0 == "") next; printf "%5d %s\n", NR, $0}'
echo
echo

Expand Down
Loading