Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for BatchedInferencePipeline #169

Closed
Eve-146T opened this issue Dec 2, 2024 · 1 comment
Closed

support for BatchedInferencePipeline #169

Eve-146T opened this issue Dec 2, 2024 · 1 comment

Comments

@Eve-146T
Copy link

Eve-146T commented Dec 2, 2024

faster whisper recently added BatchedInference support in version 1.1.0, allowing up to 4x faster transcription when transcribing large files.

Is there any plan to add this to this server?

Here are my test results, on my RTX 3090:

Model Batch Size Processing Time (s) Total Time (s) Speedup vs No Batch
Large v2 16 51.59 53.39 3.29x
Large v2 9 54.34 56.10 3.12x
Large v2 8 55.21 56.99 3.07x
Large v2 6 61.41 63.35 2.76x
Large v2 No Batch 169.75 171.50 1.00x
Turbo 16 33.37 34.65 2.48x
Turbo 8 33.39 34.52 2.48x
Turbo No Batch 82.75 83.93 1.00x

(This was for a 1 hour audio file, Total Time includes the time it takes to load the model)

I did some testing using the faster whisper server, and its speed matches up with large v2 no batch.
When using smaller audio files (30 seconds) the speed up was around 24% on batch 8 and 16.

Additionally: can we treat the requests of multiple different users as one batch?

@fedirz
Copy link
Owner

fedirz commented Dec 4, 2024

  1. Yep, I'll definitely be bumping the faster-whisper version and adding support for batched inference
  2. I like the idea. I'll take a look at how to implement this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants