Skip to content

Commit

Permalink
chore: automate clean-acr with github action workflow (#1735)
Browse files Browse the repository at this point in the history
* hello workflow

* make pre-commit executable

* clean acr

* gh-set-secret

* pat->github

* echo

* env

* Create gh-set-secret.yml

* Update gh-set-secret.yml

* Update gh-set-secret.yml

* Create manual.yml

* Update manual.yml

* Update manual.yml

* Update manual.yml

* Update manual.yml

* use azurecli for keyvault access

* remove pip cache

* remove columns

* fix indent

* fix env var name

* split off script file

* azurecli@v1

* shorten path

* lengthen path

* add query option to az cmd

* re-indent

* re-indent again

* echo

* print

* test maniehtestkv

* back to azure kv task

* back to mmlspark-keys

* quot arg

* typo

* use popen for pipeline-run

* run through deletions in whatif mode

* print result code

* delete result code

* format changes and check result of transfer

* remove tqdn

* remove tqdn

* restore actual deletion

* formatize prints

* delete cruft

* switch from manual to cron

* sundays at 1am

* chmod pre-commit

Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>
  • Loading branch information
niehaus59 and mhamilton723 authored Nov 21, 2022
1 parent 952d1bd commit 1de2d55
Show file tree
Hide file tree
Showing 2 changed files with 98 additions and 0 deletions.
44 changes: 44 additions & 0 deletions .github/workflows/clean-acr.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Notes: To access key vault and grab the connection string, we first need a service principal.
# We need to add that service principal as a Reader in the RBAC for the key vault in question,
# as well as adding it with Get and List permissions in the key vault's access policies.
# Then we need to store that service principal's info as a GitHub secret.
# We then use that secret here as the credentials for logging into Azure.
# Instructions are here: https://learn.microsoft.com/en-us/azure/developer/github/github-key-vault
# In our case, the service principal is called synapseml-clean-acr.
# The github secret is a repository secret called clean_acr.
# It is backed up in the mmlspark-keys vault by secret clean-acr-github-actions-info.
# The secret has an expiration date (currently 11/20/2024), so it will need to be renewed at some point.

name: Clean ACR

on:
schedule:
- cron: "0 1 * * 0" # every sunday at 1am

jobs:
clean-acr:
name: Clean ACR
runs-on: ubuntu-latest
steps:
- name: Azure Login
uses: azure/login@v1
with:
creds: ${{ secrets.clean_acr }}
# TODO: The docs say that Azure/get-keyvault-secrets@v1 is deprecated but are vague on what to use instead.
# Keep an eye on how this continues to work.
- name: Get connection string
uses: Azure/get-keyvault-secrets@v1
with:
keyvault: "mmlspark-keys"
secrets: "clean-acr-connection-string"
id: getSecret
- name: checkout repo content
uses: actions/checkout@v2 # checkout the repo
- name: setup python
uses: actions/setup-python@v4
with:
python-version: '3.x'
- run: pip install azure-storage-blob azure-identity
- name: execute clean acr
run: python .github/workflows/scripts/clean-acr.py "${{ steps.getSecret.outputs.clean-acr-connection-string }}"
shell: sh
54 changes: 54 additions & 0 deletions .github/workflows/scripts/clean-acr.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import os
import json
from azure.storage.blob import BlobClient
from azure.identity import DefaultAzureCredential
import sys
import subprocess

credential = DefaultAzureCredential()
"""
run this if sas expires and place result in keyvault under secret name
IMPORT_SAS=?$(az storage container generate-sas \
--name acrbackup \
--account-name mmlspark \
--expiry 2023-01-01 \
--permissions rawdl \
--https-only \
--output tsv)
echo $IMPORT_SAS
"""

acr = "mmlsparkmcr"
container = "acrbackup"
rg = "marhamil-mmlspark"
pipeline = "mmlsparkacrexport3"

conn_string = sys.argv[1]

os.system('az extension add --name acrtransfer')

repos = json.loads(os.popen(f"az acr repository list -n {acr}").read())
for repo in repos:
tags = json.loads(os.popen(
f"az acr repository show-tags -n {acr} --repository {repo} --orderby time_desc").read())

for tag in tags:
target_blob = repo + "/" + tag + ".tar"
image = repo + ":" + tag

backup_exists = BlobClient.from_connection_string(
conn_string, container_name=container, blob_name=target_blob).exists()
if not backup_exists:
result = os.system(f"az acr pipeline-run create --resource-group {rg} --registry {acr} --pipeline {pipeline} --name {str(abs(hash(target_blob)))} --pipeline-type export --storage-blob {target_blob} -a {image}")
assert result == 0
print(f"Transferred {target_blob}")
else:
print(f"Skipped existing {image}")

backup_exists = BlobClient.from_connection_string(
conn_string, container_name=container, blob_name=target_blob).exists()
if backup_exists:
print(f"Deleting {image}")
result = os.system(f"az acr repository delete --name {acr} --image {image} --yes")
assert result == 0

0 comments on commit 1de2d55

Please sign in to comment.