Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Github Action deploy #75

Merged
merged 1 commit into from
Jan 30, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .github/workflows/deploy-prod.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Build and deploy a Docker image to the production AWS environment
# when a new release has been created.

name: Deploy to Production

on:
release:
types:
published

jobs:
deploy-prod:
runs-on: ubuntu-latest
steps:

- name: checkout
uses: actions/checkout@v3

- name: Build and push Docker image to production
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID_PRODUCTION }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY_PRODUCTION }}
AWS_ECR_DOCKER_REPO: ${{ secrets.AWS_ECR_DOCKER_REPO_PRODUCTION }}
run: |
echo "production deploy not yet enabled"
# uncomment this when the keys are avaialable!
# ./deploy.sh
31 changes: 31 additions & 0 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Build and deploy a Docker image to development and staging AWS environments
# when a tagged version is created during weekly dependency updates.

name: Deploy

on:
push:
tags:
- 'rel-*-*-*'

jobs:
deploy-stage-qa:
runs-on: ubuntu-latest
steps:

- name: checkout
uses: actions/checkout@v3

- name: Build and push Docker image to development (qa in SDR)
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID_DEVELOPMENT }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY_DEVELOPMENT }}
AWS_ECR_DOCKER_REPO: ${{ secrets.AWS_ECR_DOCKER_REPO_DEVELOPMENT }}
run: ./deploy.sh

- name: Build and push Docker image to staging
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID_STAGING }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY_STAGING }}
AWS_ECR_DOCKER_REPO: ${{ secrets.AWS_ECR_DOCKER_REPO_STAGING }}
run: ./deploy.sh
59 changes: 28 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,45 +27,38 @@ terraform validate
terraform apply
```

## Build Docker Image
## Build and Deploy

To build the container you will need to first download the pytorch models that Whisper uses. This is about 13GB of data and can take some time! The idea here is to bake the models into Docker image so they don't need to be fetched dynamically every time the container runs (which will add to the runtime). If you know you only need one size model, and want to just include that then edit the `whisper_models/urls.txt` file accordingly before running the `wget` command.
In order to use the service you will need to build and deploy the speech-to-text Docker image to ECR where it will be picked up by Batch you can use the provided `deploy.sh` script.

```shell
wget --directory-prefix whisper_models --input-file whisper_models/urls.txt
```

Then you can build the image:
Before running it you will need to define three environment variables using the values that Terraform has created for you, which you can inspect by running `terraform output`:

```shell
docker build --tag sul-speech-to-text .
```

## Push Docker Image

You will need to push your Docker image to the ECR repository that Terraform created. You can ask Terraform for the repository URL that it created. For example mine is:

```shell
terraform output docker_repository
"482101366956.dkr.ecr.us-east-1.amazonaws.com/edsu-speech-to-text-qa"
$ terraform output

batch_job_definition = "arn:aws:batch:us-west-2:1234567890123:job-definition/sul-speech-to-text-qa"
batch_job_queue = "arn:aws:batch:us-west-2:1234567890123:job-queue/sul-speech-to-text-qa"
docker_repository = "1234567890123.dkr.ecr.us-west-2.amazonaws.com/sul-speech-to-text-qa"
ecs_instance_role = "sul-speech-to-text-qa-ecs-instance-role"
s3_bucket = "arn:aws:s3:::sul-speech-to-text-qa"
sqs_done_queue = "https://sqs.us-west-2.amazonaws.com/1234567890123/sul-speech-to-text-done-qa"
text_to_speech_access_key_id = "XXXXXXXXXXXXXX"
text_to_speech_secret_access_key = <sensitive>

$ terraform output text_to_speech_secret_access_key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, good to know you can do this, definitely more convenient than doing terraform show -json and then text searching the terminal output.

"XXXXXXXXXXXXXXXXXXXXXXXX"
```

Tag your Docker image with the ECR URL:
You will want to set these in your environment:

```shell
docker tag speech-to-text YOUR-ECR-URL
```

Ensure your Docker client is logged in:

```shell
aws ecr get-login-password | docker login --username AWS --password-stdin YOUR-ECR-URL
```
- AWS_ACCESS_KEY_ID: the `text_to_speech_access_key_id` value
- AWS_SECRET_ACCESS_KEY: the `text_to_speech_secret_access_key`
- AWS_ECR_DOCKER_REPO: the `docker_repository` value

And then you can push the Docker image:
Then you can run the deploy:

```shell
docker push YOUR-ECR-URL
```bash
$ ./deploy.sh
```

## Run
Expand Down Expand Up @@ -282,7 +275,11 @@ If you get no result, install with:

`brew install ffmpeg`

## Updating Docker Image
## Continuous Integration

This Github repository is set up with a Github Action that will automatically deployed tagged releases e.g. `rel-2025-01-01` to the DLSS development and staging AWS environments. When a Github release is created it will automatically be deployed to the production AWS environment.

## Development Notes

When updating the base Docker image, in order to prevent random segmentation faults you will want to make sure that:

Expand Down
35 changes: 35 additions & 0 deletions deploy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/bash

# The following environment variables will need to be set in order to push the
# new speech-to-text Docker image:
#
# - AWS_ACCESS_KEY_ID: the access key for the speech-to-text user
# - AWS_SECRET_ACCESS_KEY: the secret key for the speech-to-text user
# - AWS_ECR_DOCKER_REPO: the Elastic Compute Registry URL for the Docker repository
#
# The values can be obtained by running `terraform output` in the relevant portion of
# the Terraform configuration.

# Exit immediately if something doesn't work

set -e

# Download the Whisper large-v3 model, which is what we use by default. Building
# the image with the model in it already will speed up processing since whisper
# won't need to pull it dynamically.

wget --timestamping --directory whisper_models https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt

# Log in to ECR

aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin $AWS_ECR_DOCKER_REPO

# Build the image for Linux (not really needed when running in Github Actions)

docker build -t speech-to-text --platform="linux/amd64" .

# Tag and push the image to ECR

docker tag speech-to-text $AWS_ECR_DOCKER_REPO

docker push $AWS_ECR_DOCKER_REPO
2 changes: 1 addition & 1 deletion speech_to_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@ def load_whisper_model(model_name) -> whisper.model.Whisper:
def create(media_path: Path):
"""
Create a job for a given media file by placing the media file in S3 and then
creating a batch job which can be picked up ot perform transcription using
creating a batch job which can be picked up to perform transcription using
boilerplate options.
"""
job_id = str(uuid.uuid4())
Expand Down