You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey!
the HuggingFace text-generation-inference is an inference server solution that can, if you do concurrent HTTP requests, automatically batch compute generations. (Let's say there is a request in progress and another one on the way, it can automatically adapt the batching in-progress)
I want to build an inference solution based on faster-whisper.
Is manual batching supported? I am not sufficient enough to safely implement it on my own, but I would like to build up on top of that, if possible.
The text was updated successfully, but these errors were encountered:
Faster-whisper performs batching internally for various vad_segments. I would be curious to know if we can get more speedups if we batch multiple audios.
Afaik, huggingface one, has a DELTA during which, if multiple requests are sent then they will be batched and run together, otherwise not.
Hey!
the HuggingFace text-generation-inference is an inference server solution that can, if you do concurrent HTTP requests, automatically batch compute generations. (Let's say there is a request in progress and another one on the way, it can automatically adapt the batching in-progress)
I want to build an inference solution based on faster-whisper.
Is manual batching supported? I am not sufficient enough to safely implement it on my own, but I would like to build up on top of that, if possible.
The text was updated successfully, but these errors were encountered: