Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build and deploy speech-to-text #46

Closed
Tracked by #1 ...
edsu opened this issue Nov 18, 2024 · 6 comments · Fixed by #75
Closed
Tracked by #1 ...

Build and deploy speech-to-text #46

edsu opened this issue Nov 18, 2024 · 6 comments · Fixed by #75
Assignees

Comments

@edsu
Copy link
Contributor

edsu commented Nov 18, 2024

Depends on https://github.com/sul-dlss/terraform-aws/issues/1177

The Github CI action should be updated so that when a new release is created:

  1. A new Docker image is built and sent to the Github Docker Registry.
  2. The new Docker image is deployed to the speech-to-text ECS cluster. NOTE: a little uncertainty at the moment about using ECS vs AWS batch, so step 2 might be blocked or obviated by https://github.com/sul-dlss/terraform-aws/issues/1177 (can break out a separate ticket, or do half and backlog part 2 if this is picked up before 1177 is done)
@edsu edsu changed the title Build and deploy Build and deploy speech-to-text Nov 18, 2024
@jmartin-sul
Copy link
Member

iirc, sinopia_editor has an example of this approach?

@jmartin-sul
Copy link
Member

might need to break out second part into a separate ticket if this gets picked up before the ECS terraform code is in place

@edsu
Copy link
Contributor Author

edsu commented Dec 19, 2024

AWS Batch will automatically start using the new container once it is pushed to ECR, which will be nice once it's working.

@edsu edsu self-assigned this Jan 24, 2025
@edsu
Copy link
Contributor Author

edsu commented Jan 27, 2025

We can set up the Github Action build and push a Docker image. We currently have three ECR repositories, one each for qa, stage and prod. Should we:

  1. push to qa and stage when a dependency update branch is created
  2. push to prod when a dependency update branch is merged

The images are quite big (~10GB) because of all the pytorch code and large-v3 model. So I find myself almost wishing there was one repo. But I guess it makes sense for them to be separate?

@edsu
Copy link
Contributor Author

edsu commented Jan 27, 2025

Actually I don't think the above will really work because the publish to prod in step 2 will happen once the dependency-updates have been merged to main, prior to the integration tests running. Ideally the image would only be pushed to production when things have passed integration tests and the developer has decided to deploy to production?

I wonder if instead we can get cap deploy to trigger the right build/push in Github?

I do kind of wish that we could follow the Access Team pattern of creating Releases for code we want to go to production.

edsu added a commit that referenced this issue Jan 27, 2025
This script will build and push the Docker image to a given ECR repository. There needs to be logic in the Github Action about when it is run and to which target (dev, stage, prod).

Closes #46
@jmartin-sul
Copy link
Member

We can set up the Github Action build and push a Docker image. We currently have three ECR repositories, one each for qa, stage and prod. Should we:

1. push to qa and stage when a dependency update branch is created

2. push to prod when a dependency update branch is merged

The images are quite big (~10GB) because of all the pytorch code and large-v3 model. So I find myself almost wishing there was one repo. But I guess it makes sense for them to be separate?

My only idea for combining the ECR repos: is it possible to have an environment use a particular tag to pick an image? E.g. QA and stage use whatever is latest, but prod uses some other tag?

Actually I don't think the above will really work because the publish to prod in step 2 will happen once the dependency-updates have been merged to main, prior to the integration tests running. Ideally the image would only be pushed to production when things have passed integration tests and the developer has decided to deploy to production?

I wonder if instead we can get cap deploy to trigger the right build/push in Github?

I wonder if it's possible to have a Github Action do stuff selectively using branch filters, similar to what we do with CircleCI for Sinopia? e.g. https://github.com/LD4P/sinopia_editor/blob/43b38fb5ea3833e6dc9952b68605748dbf73718b/.circleci/config.yml#L30-L36 (iirc, more towards the end of that config in terms of how the filters are used to selectively build and dat images)

One rough idea that comes to mind:

  • In Github Actions: When we create a Git tag that looks like a release tag, build the Docker image for that Git tag, and push it with a Docker tag like latest-release. Or, if multiple tags are possible for one Docker image, maybe the Docker tag is the actual weekly Git tag (e.g. rel-2025-01-27), so that latest-release can automatically get pointed to the latest image, but it's also easy to revert to the old Docker tag to A/B test if it seems like a release that's being tested is problematic, without having to manually rebuild and re-push the old image?
  • When it's time to put dep updates on prod, Docker tag a current-prod from the latest Docker release tag. And prod would just always uses current-prod?

I'm a little hesitant to have cap deploy trigger an image build/tag, just because it feels like such a surprising use of cap deploy... but idk, maybe that's not a well founded reservation? In the past (when infra had Sinopia), we had to update Terraform with the weekly release tag for the Sinopia apps, in addition to the cap deploys. Maybe we'd just have a new process of Github release tagging alongside sdr-deploy? Upside of a separate process would seem to be clarity, downside would be more stuff for FR to do/forget.

I do kind of wish that we could follow the Access Team pattern of creating Releases for code we want to go to production.

What mechanism does Access use for this? It seems to me like we have the parts to do something like this if we can control Docker image tagging and image pushing with Github tag creation, and if we can control which instance runs which image by having the instance always use a particular Docker image tag. Which might result in even more waste if an image can only have one tag? But that feels to me like it should get us there. Though I can't quite imagine the whole thing in detail off the top of my head.

edsu added a commit that referenced this issue Jan 29, 2025
With this configuration a Github Action will run when a release tag
`release-YYYY-MM-DD` has been created during weekly dependency updates,
The action will build the Docker container and deploy it to the development
(qa in SDR) and staging AWS environments. This will allow the First
Responder for the week to test it using the speech-to-text integration tests.

When the tag has been tested and is ready for production a developer
will need to create a release in Github using the release tag. This will
cause a build and deploy to the production AWS environment. We may want
to think about automated ways for this to happen, but the "serverless"
nature of AWS Batch means there really isn't a server for capistrano
(what we use to do other infra deploys) to talk to.

The keys for the different environments need to be set as
[Github Action Secrets](https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions).

- AWS_ACCESS_KEY_ID_DEVELOPMENT
- AWS_SECRET_ACCESS_KEY_DEVELOPMENT
- AWS_ECR_DOCKER_REPO_DEVELOPMENT
- AWS_ACCESS_KEY_ID_STAGING
- AWS_SECRET_ACCESS_KEY_STAGING
- AWS_ECR_DOCKER_REPO_STAGING
- AWS_ACCESS_KEY_ID_PRODUCTION
- AWS_SECRET_ACCESS_KEY_PRODUCTION
- AWS_ECR_DOCKER_REPO_PRODUCTION

Note: only dlss-ops has permission to see the keys for the speech-to-text
user in production. So for now this is commented out, until we actually
do need to run in production. Maybe this could be ticketed as follow on
work?

Closes #46
edsu added a commit that referenced this issue Jan 29, 2025
With this configuration a Github Action will run when a release tag
`release-YYYY-MM-DD` has been created during weekly dependency updates,
The action will build the Docker container and deploy it to the development
(qa in SDR) and staging AWS environments. This will allow the First
Responder for the week to test it using the speech-to-text integration tests.

When the tag has been tested and is ready for production a developer
will need to create a release in Github using the release tag. This will
cause a build and deploy to the production AWS environment. We may want
to think about automated ways for this to happen, but the "serverless"
nature of AWS Batch means there really isn't a server for capistrano
(what we use to do other infra deploys) to talk to.

The keys for the different environments need to be set as
[Github Action Secrets](https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions).

- AWS_ACCESS_KEY_ID_DEVELOPMENT
- AWS_SECRET_ACCESS_KEY_DEVELOPMENT
- AWS_ECR_DOCKER_REPO_DEVELOPMENT
- AWS_ACCESS_KEY_ID_STAGING
- AWS_SECRET_ACCESS_KEY_STAGING
- AWS_ECR_DOCKER_REPO_STAGING
- AWS_ACCESS_KEY_ID_PRODUCTION
- AWS_SECRET_ACCESS_KEY_PRODUCTION
- AWS_ECR_DOCKER_REPO_PRODUCTION

Note: only dlss-ops has permission to see the keys for the speech-to-text
user in production. So for now this is commented out, until we actually
do need to run in production. Maybe this could be ticketed as follow on
work?

Closes #46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants