[FR] Use `server` to make inference faster #26

NightMachinery · 2024-07-02T21:31:05Z

whisper.cpp ships with a server. Isn't using that faster than loading the model again for each request?

Doing this should be much easier than #22.

natrys · 2024-07-02T23:14:41Z

It really depends on how you do cost benefit analysis. I assume server is keeping model hot in RAM since loading it with small.en is at 1.1G idle memory. On the other hand, let's say you use it only a few times per day, does it make sense for you to keep it in memory all day long? Maybe if you have a lot of RAM it makes sense for you. But other people probably needs those RAM for other interesting things they do.

And really, what exactly are we saving here? Even a garbage tier SSD should be able to do 1GB sequential read to load model in less than a second. Is this really what you want to optimise for?

The program "server" as the name suggests is appropriate for people who self-hosts a transcription service on a dedicated hardware for lots of customers, I don't really see how it makes any sense for a normal desktop user to do it this way.

Also I don't see how it has anything to do with whisperlive? That issue is for people who want real-time transcribing. The "server" is not real-time, it's doing a single request/response loop, only difference is in IPC transport (TCP vs unix pipes).

ArthurHeymans · 2025-02-05T09:19:48Z

#35 Implements server mode with both local and remote via openai API. It comes in handy if the local machine is less powerful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] Use `server` to make inference faster #26

[FR] Use `server` to make inference faster #26

NightMachinery commented Jul 2, 2024

natrys commented Jul 2, 2024

ArthurHeymans commented Feb 5, 2025

[FR] Use server to make inference faster #26

[FR] Use server to make inference faster #26

Comments

NightMachinery commented Jul 2, 2024

natrys commented Jul 2, 2024

ArthurHeymans commented Feb 5, 2025

[FR] Use `server` to make inference faster #26

[FR] Use `server` to make inference faster #26