Add audio to text pipeline #103

eliteprox · 2024-06-12T18:56:54Z

This change adds the audio-to-text pipeline to the AI Runner

This commit contains a quick proof of concept to showcase how easy it is to add a new pipeline.

…ne_poc_n

ad-astra-video · 2024-07-02T05:00:19Z

I did a review of this PR. Initial comments below.

My tests included: speech-mp3-lowbitrate.mp3 (worked), speech-aac-lowbitrate.m4a (worked), vp8-opus.webm (worked), vp8-vorbis.webm (worked), vp9-vorbis.webm (worked), h264 and h265 variants did not work for some reason in mp4 files. Pulling the audio out of the mp4 files were successfully processed.

Communicate processing was not successful when input is in wrong format or failure happens
- I was getting an error on some transcoded files that were intentionally in different input format. The results right now is a null/empty text response: {"chunks":null,"text":""}. I think should be an error response code or at least an error in the response to provide error or "ok" if no error. The last line in the traceback below is where transformers is trying to use ffmpeg in container to parse the file I believe.
- I checked and found that FFmpeg is installed in the runner container but is an ancient version 4.2.7.
  
  File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/pipelines/pt_utils.py", line 186, in __next__ processed = next(self.subiterator) ^^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 362, in preprocess inputs = ffmpeg_read(inputs, self.feature_extractor.sampling_rate) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/pipelines/audio_utils.py", line 41, in ffmpeg_read raise ValueError( ValueError: Soundfile is either not in the correct format or is malformed. Ensure that the soundfile has a valid audio file extension (e.g. wav, flac or mp3) and is not corrupted. If reading from a remote URL, ensure that the URL is the full address to **download** the audio file.
Is seed a parameter for the speech-to-text pipeline? I was not seeing it on Whisper, maybe keeping it in case a model would use it?
Add the download cmd to dl_checkpoints.sh
- huggingface-cli download openai/whisper-large-v3 --include "*.safetensors" "*.json" --cache-dir models

eliteprox · 2024-07-02T16:02:13Z

I've updated the runner to handle these errors.

The ai-runner responds with a 400 Bad Request and logs 2024-07-02 15:52:18,181 INFO: 172.17.0.1:38806 - "POST /speech-to-text HTTP/1.1" 400 Bad Request

The gateway logs I0702 11:50:40.632077 2527064 ai_process.go:561] clientIP=192.168.10.71 request_id=dd12552c Error submitting request cap=31 modelID=openai/whisper-large-v3 try=6 orch=https://0.0.0.0:8936 err=speech-to-text container returned 400

The orchestrator logs 2024/07/02 11:50:40 ERROR speech-to-text container returned 400 err="{\"detail\":{\"msg\":\"Error processing audio file: Soundfile is either not in the correct format or is malformed. Ensure that the soundfile has a valid audio file extension (e.g. wav, flac or mp3) and is not corrupted. If reading from a remote URL, ensure that the URL is the full address to **download** the audio file.\"}}"

Attached is 400 Bad Request response from Swagger UI:

runner/app/routes/speech_to_text.py

runner/dl_checkpoints.sh

Rename pipeline speech-to-text to audio-to-text

eliteprox · 2024-07-08T14:17:42Z

I was getting an error on some transcoded files that were intentionally in different input format. The results right now is a null/empty text response: {"chunks":null,"text":""}. I think should be an error response code or at least an error in the response to provide error or "ok" if no error. The last line in the traceback below is where transformers is trying to use ffmpeg in container to parse the file I believe.

I've added error handling to the AI Runner if the model experiences any error while processing the file. It specifically checks for
"invalid soundfile" and returns these as 400 Bad Request.

Is seed a parameter for the speech-to-text pipeline? I was not seeing it on Whisper, maybe keeping it in case a model would use it?

That's correct, I don't see a purpose for the seed parameter with this model. I think it's unlikely this pipeline will need it.

Add the download cmd to dl_checkpoints.sh

huggingface-cli download openai/whisper-large-v3 --include "*.safetensors" "*.json" --cache-dir models

I've added this to dl_checkpoints.sh

This commit introduces support for the Stable Diffusion 3 Medium model from Hugging Face: [https://huggingface.co/stabilityai/stable-diffusion-3-medium](https://huggingface.co/stabilityai/stable-diffusion-3-medium). Please be aware that this model has restrictive licensing at the time of writing and is not yet advised for public use. Ensure you read and understand the [licensing terms](https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE) before enabling this model on your orchestrator.

This commit applies several code improvements to the audio-to-text codebase. It also restructures the utility functions in the pipelines module.

This commit ensures that both audio-to-text routes have known responses.

Speech to text pipeline poc n review

Add audio to text pipeline --------- Co-authored-by: Rick Staa

rickstaa and others added 3 commits April 27, 2024 08:54

POC: DO NOT MERGE (speech-to-text)

d3fac30

This commit contains a quick proof of concept to showcase how easy it is to add a new pipeline.

Merge remote-tracking branch 'origin/main' into speech_to_text_pipeli…

976b870

…ne_poc_n

Implement bindings for go-livepeer

0153d6a

eliteprox mentioned this pull request Jun 12, 2024

Add audio-to-text pipeline livepeer/go-livepeer#3078

Merged

5 tasks

eliteprox changed the title ~~Speech to text pipeline~~ Add speech to text pipeline Jun 12, 2024

Add speech-to-text host container port

abd181a

eliteprox marked this pull request as ready for review June 19, 2024 07:33

eliteprox requested a review from rickstaa as a code owner June 19, 2024 07:33

eliteprox added 2 commits June 27, 2024 15:45

Add m4a file support to speech-to-text pipeline

7748c9f

Merge branch 'main' into speech_to_text_pipeline_poc_n

6fb78c7

Handle errors in speech-to-text

c706a68

Add missing model download to script

61c475f

emranemran reviewed Jul 2, 2024

View reviewed changes

runner/app/routes/speech_to_text.py Outdated Show resolved Hide resolved

emranemran reviewed Jul 2, 2024

View reviewed changes

runner/dl_checkpoints.sh Show resolved Hide resolved

Improve error handling in pipeline

3e75dc4

rickstaa mentioned this pull request Jul 4, 2024

Speech to text pipeline POC #101

Closed

eliteprox added 4 commits July 4, 2024 14:41

Fix MP4 support for various audio formats

ae79181

Add request size validation

ce7332b

Rename pipeline speech-to-text to audio-to-text

f041465

Merge pull request #1 from eliteprox/rename-audio-to-text

0908b51

Rename pipeline speech-to-text to audio-to-text

eliteprox changed the title ~~Add speech to text pipeline~~ Add audio to text pipeline Jul 5, 2024

rickstaa mentioned this pull request Jul 12, 2024

Add lip sync pipeline [50 LPT] livepeer/bounties#35

Closed

rickstaa and others added 2 commits July 15, 2024 11:31

Merge branch 'main' into speech_to_text_pipeline_poc_n

09ef638

rickstaa mentioned this pull request Jul 16, 2024

Speech to text pipeline poc n review eliteprox/ai-worker#2

Merged

rickstaa force-pushed the main branch 2 times, most recently from 0d03040 to 2600f57 Compare July 16, 2024 12:59

rickstaa force-pushed the main branch 2 times, most recently from cd1feb4 to 0d03040 Compare July 16, 2024 13:10

eliteprox and others added 4 commits July 16, 2024 09:37

Merge branch 'main' into speech_to_text_pipeline_poc_n

ff65585

refactor: cleanup audio-to-text codebase

0ebfc7a

This commit applies several code improvements to the audio-to-text codebase. It also restructures the utility functions in the pipelines module.

refactor(runner): add missing audio-to-text responses info

44470ad

This commit ensures that both audio-to-text routes have known responses.

Merge pull request #2 from livepeer/speech_to_text_pipeline_poc_n_review

4591464

Speech to text pipeline poc n review

rickstaa approved these changes Jul 16, 2024

View reviewed changes

eliteprox merged commit 9fc476e into livepeer:main Jul 16, 2024
1 check passed

rickstaa deleted the speech_to_text_pipeline_poc_n branch July 16, 2024 14:13

rickstaa mentioned this pull request Jul 20, 2024

POC: DO NOT MERGE (speech-to-text) - Whisper #71

Closed

eliteprox added a commit to eliteprox/ai-worker that referenced this pull request Jul 26, 2024

Add audio to text pipeline (livepeer#103)

face006

Add audio to text pipeline --------- Co-authored-by: Rick Staa

eliteprox added a commit to eliteprox/ai-worker that referenced this pull request Jul 26, 2024

Add audio to text pipeline (livepeer#103)

e7f939b

Add audio to text pipeline --------- Co-authored-by: Rick Staa

This was referenced Jul 28, 2024

Implement frame interpolation pipeline at AI worker side [60 LPT] livepeer/bounties#38

Closed

Implement LLM pipeline at the AI runner side [60 LPT] livepeer/bounties#41

Closed

This was referenced Sep 4, 2024

Inpainting Pipeline Implementation on ai-runner Bounty [$850] livepeer/bounties#55

Closed

Outpainting Pipeline Implementation on ai-runner Bounty [$850] livepeer/bounties#56

Closed

Sketch-to-Image Pipeline Implementation Bounty [$900] livepeer/bounties#57

Open

rickstaa mentioned this pull request Sep 10, 2024

End-To-End LivePortrait Implementation Bounty [$2500] livepeer/bounties#58

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add audio to text pipeline #103

Add audio to text pipeline #103

eliteprox commented Jun 12, 2024 •

edited

Loading

ad-astra-video commented Jul 2, 2024 •

edited

Loading

eliteprox commented Jul 2, 2024

eliteprox commented Jul 8, 2024

Add audio to text pipeline #103

Add audio to text pipeline #103

Conversation

eliteprox commented Jun 12, 2024 • edited Loading

ad-astra-video commented Jul 2, 2024 • edited Loading

eliteprox commented Jul 2, 2024

eliteprox commented Jul 8, 2024

eliteprox commented Jun 12, 2024 •

edited

Loading

ad-astra-video commented Jul 2, 2024 •

edited

Loading