Skip to content

Commit

Permalink
Add initial Docker container
Browse files Browse the repository at this point in the history
This commit adds an initial Whisper Docker container, along with program
run.py that pulls job files and media from an "todo" AWS SQS and S3
bucket respectively, and writes the Whisper output back to the bucket
while placing a "done" message in another queue. See README.md for the
details.
  • Loading branch information
edsu committed Sep 24, 2024
1 parent dcd2d05 commit 5098187
Show file tree
Hide file tree
Showing 12 changed files with 489 additions and 2 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.venv
.env
__pycache__/
whisper_models
26 changes: 26 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
FROM ubuntu:22.04

ENV AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID
ENV AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
ENV AWS_REGION=$AWS_REGION
ENV AWS_ROLE_ARN=$AWS_ROLE_ARN
ENV SPEECH_TO_TEXT_S3_BUCKET=$SPEECH_TO_TEXT_S3_BUCKET

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
sudo \
python3.11 \
python3-distutils \
python3-pip \
ffmpeg

WORKDIR /app

ADD ./whisper_models whisper_models
ADD ./requirements.txt requirements.txt

RUN python3.11 -m pip install --upgrade pip
RUN python3.11 -m pip install -r requirements.txt

ADD ./run.py run.py

CMD ["python3.11", "run.py"]
123 changes: 121 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,122 @@
# SUL Speech to Text Tools
# speech-to-text

For now, this is a placeholder repo where we can ticket some things that don't yet have a natural home (and which may end up living in this repo after prototyping/implementation, e.g. a definition for the Docker container we run on cloud based GPU instances, supporting tools and docs, etc).
This repository contains a Docker configuration for performing serverless speech-to-text processing with Whisper using an Amazon S3 bucket for coordinating work.

## Build

To build the container you will need to first download the pytorch models that Whisper uses. This is about 13GB of data and can take some time! The idea here is to have the container come with the models baked in, so it doesn't need to fetch them dynamically every time the container runs. If you know you only need one size model, and want to just include that then edit the `whisper_models/urls.txt` file accordingly before running the `wget` command.

```shell
wget --directory-prefix whisper_models --input-file whisper_models/urls.txt
```

Then you can build the image:

```shell
docker build --tag sul-speech-to-text .
```

## Configure AWS

Create two queues, one for new jobs, and one for completed jobs:

```shell
$ aws sqs create-queue --queue-name sul-speech-to-text-todo
$ aws sqs create-queue --queue-name sul-speech-to-text-done
```

Create a bucket:

```shell
aws s3 mb s3://sul-speech-to-text
```

Configure `.env` with your AWS credentials so the Docker container can find them:

```shell
cp env-example .env
vi .env
```

## Create a Job

Typically common-accessioning robots will initiate new work by:

1. minting a new job ID
2. copying the media file to the S3 bucket
3. putting a job in the TODO queue.

For testing you can simulate these things by running:

```shell
python3 run.py create
```

## Run

Now you can run the container and have it pick up the job you placed into the queue:

```shell
docker run --env-file .env sul-speech-to-text
```

Wait for the results to appear:

```shell
aws ls s3://sul-speech-to-text/out/${JOB_ID}/
```

## The Job File

The job file is a JSON object that contains information about how to run Whisper. Minimally it contains the Job ID, and what media files to process using the service defaults:

```json
{
"id": "8EB51B59-BDFF-4507-B1AA-0DE91ACA388F",
"druid": "gy983cn1444",
"media": [
"8EB51B59-BDFF-4507-B1AA-0DE91ACA388F.mp4"
]
}
```

You can also pass in options for Whisper:

```json
{
"id": "8EB51B59-BDFF-4507-B1AA-0DE91ACA388F",
"druid": "gy983cn1444",
"media": [
"8EB51B59-BDFF-4507-B1AA-0DE91ACA388F.mp4"
],
"options": {
"model": "large",
"max_line_count": 80,
"beam_size": 10
}
}
```

## Testing

To run the tests you want to:

Create a virtual environment, and activate it:

```shell
python -mvenv .venv
source .venv/bin/activate
```

Install the dependencies:

```shell
pip install -r requirements.txt
pip install -r requirements-dev.txt
```

Run the tests, which will also build and run the Docker container:

```shell
pytest
```
5 changes: 5 additions & 0 deletions env-example
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
AWS_ACCESS_KEY_ID=CHANGE_ME
AWS_SECRET_ACCESS_KEY=CHANGE_ME
AWS_REGION=us-west-2
AWS_ROLE_ARN=arn:aws:iam::418214828013:role/DevelopersRole
SPEECH_TO_TEXT_S3_BUCKET="sul-speech-to-text-dev-YOUR-USERNAME"
2 changes: 2 additions & 0 deletions pytest.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[pytest]
pythonpath = .
2 changes: 2 additions & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
pytest
python-dotenv
3 changes: 3 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
boto3
openai-whisper
python-dotenv[cli]
Loading

0 comments on commit 5098187

Please sign in to comment.