Add initial Docker container #9

edsu · 2024-09-16T19:46:58Z

This commit adds an initial Whisper Docker container, along with a program that pulls jobs from SQS, media from S3, and then pushes results back to S3 and SQS. There are more details in the README about how it works, how to run an example, and tests.

Closes #3

This commit adds an initial Whisper Docker container, along with program run.py that pulls job files and media from an "todo" AWS SQS and S3 bucket respectively, and writes the Whisper output back to the bucket while placing a "done" message in another queue. See README.md for the details.

jmartin-sul

a bunch of small questions and suggestions. the suggested docker command didn't work for me (it hung without processing any jobs or producing any output -- idk if this was a result of the env var stuff, didn't look into the issue much). but just running python speech_to_text.py --create test_video.mp4 and python speech_to_text.py to grab that one job worked for me, doing the processing directly on my laptop instead of the container, of course. i pulled the output from the S3 bucket, and it looked good! i did also add SPEECH_TO_TEXT_TODO_SQS_QUEUE and SPEECH_TO_TEXT_DONE_SQS_QUEUE values to my .env.

i think i'd be fine to merge this as-is and do follow up changes, since this is still at a pretty early stage, but fine with whatever.

README.md

speech_to_text.py

README.md

Dockerfile

README.md

- tightened up documentation - removed the secret environment files from getting cooked into the image! - added a --receive option to fetch messages off the DONE queue.

edsu force-pushed the docker branch 4 times, most recently from d99c759 to d9d3b83 Compare September 20, 2024 19:35

edsu mentioned this pull request Sep 23, 2024

[investigate/prototype] speech_to_text_generation_service approach 1: Define a Docker container for running open source Whisper in a container that we define and for which we manage deployment (lives in this repo?) #3

Closed

7 tasks

edsu force-pushed the docker branch 2 times, most recently from a45b7d1 to 6f60075 Compare September 24, 2024 11:09

edsu marked this pull request as ready for review September 24, 2024 21:27

edsu force-pushed the docker branch from 6f60075 to 5098187 Compare September 24, 2024 21:53

edsu mentioned this pull request Sep 24, 2024

Run tests as Github Action #12

Closed

edsu force-pushed the docker branch from 5098187 to 7e3c997 Compare September 25, 2024 17:17

This was referenced Sep 25, 2024

common-accessioning daemon watches SQS for notifications that content has been STTed, or that an error was encountered in processing sul-dlss/common-accessioning#1358

Closed

Send sqs message for speech to text create sul-dlss/common-accessioning#1367

Merged

jmartin-sul approved these changes Sep 26, 2024

View reviewed changes

Address John's review

56e3cdf

- tightened up documentation - removed the secret environment files from getting cooked into the image! - added a --receive option to fetch messages off the DONE queue.

edsu force-pushed the docker branch from 07ffde2 to 56e3cdf Compare September 30, 2024 16:24

edsu merged commit 1fd0eb3 into main Sep 30, 2024

edsu deleted the docker branch September 30, 2024 16:26

jmartin-sul mentioned this pull request Oct 1, 2024

small readme and comment touchups #22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial Docker container #9

Add initial Docker container #9

edsu commented Sep 16, 2024 •

edited

Loading

jmartin-sul left a comment

Add initial Docker container #9

Add initial Docker container #9

Conversation

edsu commented Sep 16, 2024 • edited Loading

jmartin-sul left a comment

Choose a reason for hiding this comment

edsu commented Sep 16, 2024 •

edited

Loading