Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial Docker container #9

Merged
merged 2 commits into from
Sep 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.venv
.env
__pycache__/
whisper_models
19 changes: 19 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
FROM ubuntu:22.04

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
python3.11 \
python3-distutils \
python3-pip \
ffmpeg

WORKDIR /app

ADD ./whisper_models whisper_models
ADD ./requirements.txt requirements.txt

RUN python3.11 -m pip install --upgrade pip
RUN python3.11 -m pip install -r requirements.txt

ADD ./speech_to_text.py speech_to_text.py

ENTRYPOINT ["python3.11", "speech_to_text.py"]
123 changes: 121 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,122 @@
# SUL Speech to Text Tools
# speech-to-text

For now, this is a placeholder repo where we can ticket some things that don't yet have a natural home (and which may end up living in this repo after prototyping/implementation, e.g. a definition for the Docker container we run on cloud based GPU instances, supporting tools and docs, etc).
This repository contains a Docker configuration for performing serverless speech-to-text processing with Whisper using an Amazon Simple Storage Service (S3) bucket for media files, and Amazon Simple Queue Service (SQS) for coordinating work.

## Build

To build the container you will need to first download the pytorch models that Whisper uses. This is about 13GB of data and can take some time! The idea here is to bake the models into Docker image so they don't need to be fetched dynamically every time the container runs (which will add to the runtime). If you know you only need one size model, and want to just include that then edit the `whisper_models/urls.txt` file accordingly before running the `wget` command.

```shell
wget --directory-prefix whisper_models --input-file whisper_models/urls.txt
```

Then you can build the image:

```shell
docker build --tag sul-speech-to-text .
```

## Configure AWS

Create two queues, one for new jobs, and one for completed jobs:

```shell
$ aws sqs create-queue --queue-name sul-speech-to-text-todo-your-username
$ aws sqs create-queue --queue-name sul-speech-to-text-done-your-username
```

Create a bucket:

```shell
aws s3 mb s3://sul-speech-to-text-dev-your-username
```

Configure `.env` with your AWS credentials so the Docker container can find them:

```shell
cp env-example .env
vi .env
```

## Run

### Create a Job

Usually common-accessioning robots will initiate new speech-to-text work by:

1. minting a new job ID:
3. copying a media file to the S3 bucket
5. putting a job in the TODO queue

For testing you can simulate these things by running the Docker container with the `--create` flag. For example if you have a `file.mp4` file you'd like to create a job for you can:

```shell
docker run --rm --tty --volume .:/app --env-file .env sul-speech-to-text --create file.mp4
```

### Run the Job

Now you can run the container and have it pick up the job you placed into the queue:

```shell
docker run --rm --tty --env-file .env sul-speech-to-text --no-daemon
```

Wait for the results to appear:

```shell
aws s3 ls s3://sul-speech-to-text-dev-your-username/out/${JOB_ID}/
```

Usually the message on the DONE queue will be processed by the captionWF in common-accessioning, but if you want you can pop messages off manually:

```shell
docker run --rm --tty --env-file .env sul-speech-to-text --receive
```

## The Job File

The job file is a JSON object that contains information about how to run Whisper. Minimally it contains the Job ID, which will be used to locate media files in S3 that need to be processed.

```json
{
"id": "8EB51B59-BDFF-4507-B1AA-0DE91ACA388F",
}
```

You can also pass in options for Whisper:

```json
{
"id": "8EB51B59-BDFF-4507-B1AA-0DE91ACA388F",
"options": {
"model": "large",
"max_line_count": 80,
"beam_size": 10
}
}
```

When you receive the message on the DONE SQS queue it will contain the JSON:

```json
{
"id": "8EB51B59-BDFF-4507-B1AA-0DE91ACA388F",
"options": {
"model": "large",
"max_line_count": 80,
"beam_size": 10
}
}
```

## Testing

To run the tests it is probably easiest to create a virtual environment and run the tests with pytest:

```shell
python -mvenv .venv
source .venv/bin/activate
pip install -r requirements.txt
pytest
```
7 changes: 7 additions & 0 deletions env-example
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
AWS_ACCESS_KEY_ID=CHANGE_ME
AWS_SECRET_ACCESS_KEY=CHANGE_ME
AWS_REGION=us-west-2
AWS_ROLE_ARN=arn:aws:iam::418214828013:role/DevelopersRole
SPEECH_TO_TEXT_S3_BUCKET=sul-speech-to-text-dev-your-username
SPEECH_TO_TEXT_TODO_SQS_QUEUE=sul-speech-to-text-todo-dev-your-username
SPEECH_TO_TEXT_DONE_SQS_QUEUE=sul-speech-to-text-done-dev-your-username
2 changes: 2 additions & 0 deletions pytest.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[pytest]
pythonpath = .
4 changes: 4 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
boto3
openai-whisper
python-dotenv
pytest
Loading