Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster whisper serverless template #10

Open
aaa3334 opened this issue Nov 12, 2024 · 4 comments
Open

Faster whisper serverless template #10

aaa3334 opened this issue Nov 12, 2024 · 4 comments

Comments

@aaa3334
Copy link

aaa3334 commented Nov 12, 2024

Hi!

I happily stumbled into your video on faster-whisper and learnt runpod is a thing and that they have serverless. I am wondering if you have a guide or template on how to set up faster whisper serverless? Or if it is the same as eg the Ministral one you set up?

@RonanKMcGovern
Copy link
Contributor

Yeah, so you can run serverless on runpod like this:

image

However, this doesn't do dynamic batching, it spins up a new worker for each request. I haven't found an open source library yet that easily allows dynamic batching.

But if you're anyway using small batches, this will do what you need.

the other issues is that there is a defaul tmodel set, and it's not the turbo model :( so you also have to figure out how to swap that (which may require either adding env variables OR possibly rebuilding the docker).

I'm going to do a video soon on setting up endpoints, I'll see if I can do something on serverless, depends how much work/digging it takes.

@aaa3334
Copy link
Author

aaa3334 commented Nov 13, 2024

Thanks for your reply!
I tried the default FasterWhisper template, but as you mentioned, it doesn't have the turbo model. Was looking at trying to rebuild using the hugging face https://huggingface.co/mobiuslabsgmbh/faster-whisper-large-v3-turbo/tree/main
but am so unsure about what the settings for that would look like to get it set up - I feel the RunPod team says they make it easy, but their documentation seems to be at the level of someone already used to setting up VM's etc. - I am familiar with hugging face, digital ocean etc. but am no expert and only got those set up by following guides. (Docker I am very familiar with though but does not always seem to be the best solution for endpoints (eg on hugging face you have gradio which is much easier and more lightweight than setting up a full docker container which feels like overkill for one endpoint).

That would be really cool! For me right now the serverless ones seem like the way to go (else I feel I could just use Hugging faces interface which I already know the setup for etc.)
Its really cool to see all these different ways to do things more easily and am so happy I ran into RunPod on your channel :)

@RonanKMcGovern
Copy link
Contributor

RonanKMcGovern commented Nov 13, 2024 via email

@RonanKMcGovern RonanKMcGovern changed the title Faster whisper template Faster whisper serverless template Jan 11, 2025
@RonanKMcGovern
Copy link
Contributor

Just a note that I likely won't have time to get around to this, but I'll leave this open and have renamed it in case someone else finds this and is able to spend the time to build a faster whisper runpod instance that supports turbo.

FWIW, Fireworks provides a very fast transcription service that can take 1 GB size files AFAIK. So that may be an interim option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants