Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker release CT-3 #4616

Merged
merged 33 commits into from
Feb 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
560dd18
new docker setup
iknox-fa Dec 14, 2021
60a2425
formatting
iknox-fa Dec 14, 2021
1c2f58b
Updated spark: support for extras
iknox-fa Dec 15, 2021
708fadb
Added third-party adapter support
iknox-fa Dec 15, 2021
4a76009
More selective lib installs for spark
iknox-fa Dec 15, 2021
7628c31
added docker to bumpversion
iknox-fa Dec 15, 2021
b2354af
Updated refs to be tag-based because bumpversion doesn't understand '…
iknox-fa Dec 15, 2021
f95d5fe
Updated docs per PR feedback
iknox-fa Dec 17, 2021
52e054a
reducing RUNs and formatting/pip best practices changes
iknox-fa Dec 17, 2021
530d9c8
Added multi-architecture support and small test script, updated docs
iknox-fa Dec 20, 2021
5cea6c5
typo
iknox-fa Dec 20, 2021
9ef952a
Added a few more tests
iknox-fa Dec 20, 2021
db3241e
fixed tests output, clarified dbt-postgres special case-ness
iknox-fa Dec 20, 2021
408a70f
Fix merge conflicts
iknox-fa Dec 14, 2021
2770e17
formatting
iknox-fa Dec 14, 2021
5694131
Updated spark: support for extras
iknox-fa Dec 15, 2021
1567909
Added third-party adapter support
iknox-fa Dec 15, 2021
6950c03
More selective lib installs for spark
iknox-fa Dec 15, 2021
81d1ae5
added docker to bumpversion
iknox-fa Dec 15, 2021
0a05cf0
Updated refs to be tag-based because bumpversion doesn't understand '…
iknox-fa Dec 15, 2021
265c93d
Updated docs per PR feedback
iknox-fa Dec 17, 2021
72df891
reducing RUNs and formatting/pip best practices changes
iknox-fa Dec 17, 2021
b8ff688
Added multi-architecture support and small test script, updated docs
iknox-fa Dec 20, 2021
becc4ba
typo
iknox-fa Dec 20, 2021
b94a4de
Added a few more tests
iknox-fa Dec 20, 2021
e07e2f0
fixed tests output, clarified dbt-postgres special case-ness
iknox-fa Dec 20, 2021
3c19338
Merge branch 'update-dockerfile' of github.com:dbt-labs/dbt-core into…
iknox-fa Dec 22, 2021
5e013e7
changelog
iknox-fa Dec 22, 2021
2c36713
basic framework
iknox-fa Jan 4, 2022
c71b2d3
PR ready excepts docs
iknox-fa Jan 24, 2022
9caa4c4
merge conflict
iknox-fa Jan 24, 2022
1a2a855
PR feedback
iknox-fa Jan 27, 2022
1a55d02
merge to main
iknox-fa Jan 28, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .bumpversion.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,5 @@ first_value = 1
[bumpversion:file:plugins/postgres/setup.py]

[bumpversion:file:plugins/postgres/dbt/adapters/postgres/__version__.py]

[bumpversion:file:docker/Dockerfile]
14 changes: 14 additions & 0 deletions .github/actions/latest-wrangler/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM python:3-slim AS builder
ADD . /app
WORKDIR /app

# We are installing a dependency here directly into our app source dir
RUN pip install --target=/app requests packaging

# A distroless container image with Python and some basics like SSL certificates
# https://github.com/GoogleContainerTools/distroless
FROM gcr.io/distroless/python3-debian10
COPY --from=builder /app /app
WORKDIR /app
ENV PYTHONPATH /app
CMD ["/app/main.py"]
50 changes: 50 additions & 0 deletions .github/actions/latest-wrangler/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Github package 'latest' tag wrangler for containers
## Usage

Plug in the necessary inputs to determine if the container being built should be tagged 'latest; at the package level, for example `dbt-redshift:latest`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some more about latest vs minor_latest? I want to make sure we capture why we need both tags and not just 1 (I could see this coming up as a question from a new hire or somebody outside of the team)

Copy link
Contributor Author

@iknox-fa iknox-fa Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I'm not 100% sure why we need the minor_latest beyond the idea that we might have breaking changes between a 1.0.latest and a 1.1.latest? @jtcohen6 would be a better person to answer that question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this comment a few days ago! Rationale is to provide consistency with how we're thinking about patch + minor upgrades elsewhere (e.g. in dbt Cloud).

There are two premises after v1.0:

  • Use patches as soon as they're available
  • Minor releases won't have breaking changes (we've committed to this!), but they will touch significantly more code than patch releases

As such, we should let users decide when they want to perform the minor version upgrade. Each minor version will be officially supported for 12 months after its initial release.

So I wouldn't be surprised if some users want to stick with 1.0.latest, for a few weeks or a month, even after the 1.1.0 release. The inclusion of a 1.0.latest tag will give them the opportunity to coordinate upgrades at a time of their choosing. During that period, if we have to cut a "security" patch for v1.0 (security bug, dependency/installation issue), they'll still get it automatically, as they should.


## Inputs
| Input | Description |
| - | - |
| `package` | Name of the GH package to check against |
| `new_version` | Semver of new container |
| `gh_token` | GH token with package read scope|
| `halt_on_missing` | Return non-zero exit code if requested package does not exist. (defaults to false)|


## Outputs
| Output | Description |
| - | - |
| `latest` | Wether or not the new container should be tagged 'latest'|
| `minor_latest` | Wether or not the new container should be tagged major.minor.latest |

## Example workflow
```yaml
name: Ship it!
on:
workflow_dispatch:
inputs:
package:
description: The package to publish
required: true
version_number:
description: The version number
required: true

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- name: Wrangle latest tag
id: is_latest
uses: ./.github/actions/latest-wrangler
with:
package: ${{ github.event.inputs.package }}
new_version: ${{ github.event.inputs.new_version }}
gh_token: ${{ secrets.GITHUB_TOKEN }}
- name: Print the results
run: |
echo "Is it latest? Survey says: ${{ steps.is_latest.outputs.latest }} !"
echo "Is it minor.latest? Survey says: ${{ steps.is_latest.outputs.minor_latest }} !"
```
20 changes: 20 additions & 0 deletions .github/actions/latest-wrangler/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: "Github package 'latest' tag wrangler for containers"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! I created a place to put actions that can be shared across repositories. We should move this action over there at some point, I can see this being handy in many places. https://github.com/dbt-labs/actions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh.. good call.

description: "Determines wether or not a given dbt container should be given a bare 'latest' tag (I.E. dbt-core:latest)"
inputs:
package_name:
description: "Package to check (I.E. dbt-core, dbt-redshift, etc)"
required: true
new_version:
description: "Semver of the container being built (I.E. 1.0.4)"
required: true
gh_token:
description: "Auth token for github (must have view packages scope)"
required: true
outputs:
latest:
description: "Wether or not built container should be tagged latest (bool)"
minor_latest:
description: "Wether or not built container should be tagged minor.latest (bool)"
Comment on lines +14 to +17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question.. I've never known what to use as return types for GHA & scripting things. This returns True | False, but does it make sense to return 0 | 1 or other bool representations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so this is something that took a while to sort out, but the TL;DR is that it doesn't matter. GHA is all written in javascript, which types everything dynamically so any variable you set as an output operates in the same way.

I've defaulted to using string representations by default because all values coming from bash are untyped and treated as strings. It's one less thing to have to remember. :)

runs:
using: "docker"
image: "Dockerfile"
26 changes: 26 additions & 0 deletions .github/actions/latest-wrangler/examples/example_workflow.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: Ship it!
on:
workflow_dispatch:
inputs:
package:
description: The package to publish
required: true
version_number:
description: The version number
required: true

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- name: Wrangle latest tag
id: is_latest
uses: ./.github/actions/latest-wrangler
with:
package: ${{ github.event.inputs.package }}
new_version: ${{ github.event.inputs.new_version }}
gh_token: ${{ secrets.GITHUB_TOKEN }}
- name: Print the results
run: |
echo "Is it latest? Survey says: ${{ steps.is_latest.outputs.latest }} !"
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"inputs": {
"version_number": "1.0.1",
"package": "dbt-redshift"
}
}
98 changes: 98 additions & 0 deletions .github/actions/latest-wrangler/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
import os
import sys
import requests
from distutils.util import strtobool
from typing import Union
from packaging.version import parse, Version

if __name__ == "__main__":

# get inputs
package = os.environ["INPUT_PACKAGE"]
new_version = parse(os.environ["INPUT_NEW_VERSION"])
gh_token = os.environ["INPUT_GH_TOKEN"]
halt_on_missing = strtobool(os.environ.get("INPUT_HALT_ON_MISSING", "False"))

# get package metadata from github
package_request = requests.get(
f"https://api.github.com/orgs/dbt-labs/packages/container/{package}/versions",
auth=("", gh_token),
)
Comment on lines +17 to +20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it works as expected. This uses packages that have been released to github as the source of truth. If we wanted to use this action to determine the lastest version of a dbt python package, for example, we might get an incorrect answer if a Docker image doesn't exist yet for it. I don't think it should be part of this PR, but what are your thoughts on making this action a little more generic and using PyPI as the source of truth? There is some prior art from the ole release process get this info.
https://github.com/dbt-labs/dbt-release/blob/b6efbb06cc52b84f6f935350046a737e9ad76c38/scripts/release-pypath/builder/common.py#L210-L222
https://github.com/dbt-labs/dbt-release/blob/b6efbb06cc52b84f6f935350046a737e9ad76c38/scripts/release-pypath/builder/common.py#L62-L75

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather leave docker dependent only on what docker images already exist, but I hold that opinion very loosely. In the long term I'd actually like to use GH releases as the source of truth for this, but it won't make sense to do that until we have a more complete release system (as opposed to this work which was meant to only address how Docker gets released)

package_meta = package_request.json()

# Log info if we don't get a 200
if package_request.status_code != 200:
print(f"Call to GH API failed: {package_request.status_code} {package_meta['message']}")

# Make an early exit if there is no matching package in github
if package_request.status_code == 404:
if halt_on_missing:
sys.exit(1)
else:
# everything is the latest if the package doesn't exist
print(f"::set-output name=latest::{True}")
print(f"::set-output name=minor_latest::{True}")
sys.exit(0)

# TODO: verify package meta is "correct"
leahwicz marked this conversation as resolved.
Show resolved Hide resolved
# https://github.com/dbt-labs/dbt-core/issues/4640

# map versions and tags
version_tag_map = {
version["id"]: version["metadata"]["container"]["tags"]
for version in package_meta
}

# is pre-release
pre_rel = True if any(x in str(new_version) for x in ["a", "b", "rc"]) else False

# semver of current latest
for version, tags in version_tag_map.items():
if "latest" in tags:
# N.B. This seems counterintuitive, but we expect any version tagged
# 'latest' to have exactly three associated tags:
# latest, major.minor.latest, and major.minor.patch.
# Subtracting everything that contains the string 'latest' gets us
# the major.minor.patch which is what's needed for comparison.
current_latest = parse([tag for tag in tags if "latest" not in tag][0])
else:
current_latest = False

# semver of current_minor_latest
for version, tags in version_tag_map.items():
if f"{new_version.major}.{new_version.minor}.latest" in tags:
# Similar to above, only now we expect exactly two tags:
# major.minor.patch and major.minor.latest
current_minor_latest = parse(
[tag for tag in tags if "latest" not in tag][0]
)
else:
current_minor_latest = False

def is_latest(
pre_rel: bool, new_version: Version, remote_latest: Union[bool, Version]
) -> bool:
"""Determine if a given contaier should be tagged 'latest' based on:
- it's pre-release status
- it's version
- the version of a previously identified container tagged 'latest'

:param pre_rel: Wether or not the version of the new container is a pre-release
:param new_version: The version of the new container
:param remote_latest: The version of the previously identified container that's already tagged latest or False
"""
# is a pre-release = not latest
if pre_rel:
return False
# + no latest tag found = is latest
if not remote_latest:
return True
# + if remote version is lower than current = is latest, else not latest
return True if remote_latest <= new_version else False

latest = is_latest(pre_rel, new_version, current_latest)
minor_latest = is_latest(pre_rel, new_version, current_minor_latest)

print(f"::set-output name=latest::{latest}")
print(f"::set-output name=minor_latest::{minor_latest}")

113 changes: 113 additions & 0 deletions .github/workflows/release_docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# **what?**
# This workflow will generate a series of docker images for dbt and push them to the github container registry

# **why?**
# Docker images for dbt are used in a number of important places throughout the dbt ecosystem. This is how we keep those images up-to-date.

# **when?**
# This is triggered manually

# **next steps**
# - build this into the release workflow (or conversly, break out the different release methods into their own workflow files)

name: Docker release

on:
workflow_dispatch:
inputs:
package:
description: The package to release. _One_ of [dbt-core, dbt-redshift, dbt-bigquery, dbt-snowflake, dbt-spark, dbt-postgres]
required: true
version_number:
description: The release version number (i.e. 1.0.0b1). Do not include `latest` tags or a leading `v`!
required: true

jobs:
get_version_meta:
name: Get version meta
runs-on: ubuntu-latest
outputs:
major: ${{ steps.version.outputs.major }}
minor: ${{ steps.version.outputs.minor }}
patch: ${{ steps.version.outputs.patch }}
latest: ${{ steps.latest.outputs.latest }}
minor_latest: ${{ steps.latest.outputs.minor_latest }}
steps:
- uses: actions/checkout@v1
- name: Split version
id: version
run: |
IFS="." read -r MAJOR MINOR PATCH <<< ${{ github.event.inputs.version_number }}
echo "::set-output name=major::$MAJOR"
echo "::set-output name=minor::$MINOR"
echo "::set-output name=patch::$PATCH"

- name: Is pkg 'latest'
id: latest
uses: ./.github/actions/latest-wrangler
with:
package: ${{ github.event.inputs.package }}
new_version: ${{ github.event.inputs.version_number }}
gh_token: ${{ secrets.GITHUB_TOKEN }}
halt_on_missing: False

setup_image_builder:
name: Set up docker image builder
runs-on: ubuntu-latest
needs: [get_version_meta]
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1

build_and_push:
name: Build images and push to GHCR
runs-on: ubuntu-latest
needs: [setup_image_builder, get_version_meta]
steps:
- name: Get docker build arg
id: build_arg
run: |
echo "::set-output name=build_arg_name::"$(echo ${{ github.event.inputs.package }} | sed 's/\-/_/g')
echo "::set-output name=build_arg_value::"$(echo ${{ github.event.inputs.package }} | sed 's/postgres/core/g')

- name: Log in to the GHCR
uses: docker/login-action@v1
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Build and push MAJOR.MINOR.PATCH tag
uses: docker/build-push-action@v2
with:
file: docker/Dockerfile
push: True
target: ${{ github.event.inputs.package }}
build-args: |
${{ steps.build_arg.outputs.build_arg_name }}_ref=${{ steps.build_arg.outputs.build_arg_value }}@v${{ github.event.inputs.version_number }}
tags: |
ghcr.io/dbt-labs/${{ github.event.inputs.package }}:${{ github.event.inputs.version_number }}

- name: Build and push MINOR.latest tag
uses: docker/build-push-action@v2
if: ${{ needs.get_version_meta.outputs.minor_latest == 'True' }}
with:
file: docker/Dockerfile
push: True
target: ${{ github.event.inputs.package }}
build-args: |
${{ steps.build_arg.outputs.build_arg_name }}_ref=${{ steps.build_arg.outputs.build_arg_value }}@v${{ github.event.inputs.version_number }}
tags: |
ghcr.io/dbt-labs/${{ github.event.inputs.package }}:${{ needs.get_version_meta.outputs.major }}.${{ needs.get_version_meta.outputs.minor }}.latest

- name: Build and push latest tag
uses: docker/build-push-action@v2
if: ${{ needs.get_version_meta.outputs.latest == 'True' }}
with:
file: docker/Dockerfile
push: True
target: ${{ github.event.inputs.package }}
build-args: |
${{ steps.build_arg.outputs.build_arg_name }}_ref=${{ steps.build_arg.outputs.build_arg_value }}@v${{ github.event.inputs.version_number }}
tags: |
ghcr.io/dbt-labs/${{ github.event.inputs.package }}:latest
leahwicz marked this conversation as resolved.
Show resolved Hide resolved
1 change: 1 addition & 0 deletions .github/workflows/test/.actrc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
-P ubuntu-latest=ghcr.io/catthehacker/ubuntu:act-latest
1 change: 1 addition & 0 deletions .github/workflows/test/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.secrets
1 change: 1 addition & 0 deletions .github/workflows/test/.secrets.EXAMPLE
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
GITHUB_TOKEN=GH_PERSONAL_ACCESS_TOKEN_GOES_HERE
6 changes: 6 additions & 0 deletions .github/workflows/test/inputs/release_docker.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"inputs": {
"version_number": "1.0.1",
"package": "dbt-postgres"
}
}
4 changes: 2 additions & 2 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ FROM --platform=$build_for python:3.9.9-slim-bullseye as base
# N.B. The refs updated automagically every release via bumpversion
# N.B. dbt-postgres is currently found in the core codebase so a value of dbt-core@<some_version> is correct

ARG dbt_core_ref=dbt-core@v1.0.0
ARG dbt_postgres_ref=dbt-core@v1.0.0
ARG dbt_core_ref=dbt-core@v1.0.1
ARG dbt_postgres_ref=dbt-core@v1.0.1
ARG dbt_redshift_ref=dbt-redshift@v1.0.0
ARG dbt_bigquery_ref=dbt-bigquery@v1.0.0
ARG dbt_snowflake_ref=dbt-snowflake@v1.0.0
Expand Down