Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Use server to make inference faster #26

Open
NightMachinery opened this issue Jul 2, 2024 · 2 comments
Open

[FR] Use server to make inference faster #26

NightMachinery opened this issue Jul 2, 2024 · 2 comments

Comments

@NightMachinery
Copy link

whisper.cpp ships with a server. Isn't using that faster than loading the model again for each request?

Doing this should be much easier than #22.

@natrys
Copy link
Owner

natrys commented Jul 2, 2024

It really depends on how you do cost benefit analysis. I assume server is keeping model hot in RAM since loading it with small.en is at 1.1G idle memory. On the other hand, let's say you use it only a few times per day, does it make sense for you to keep it in memory all day long? Maybe if you have a lot of RAM it makes sense for you. But other people probably needs those RAM for other interesting things they do.

And really, what exactly are we saving here? Even a garbage tier SSD should be able to do 1GB sequential read to load model in less than a second. Is this really what you want to optimise for?

The program "server" as the name suggests is appropriate for people who self-hosts a transcription service on a dedicated hardware for lots of customers, I don't really see how it makes any sense for a normal desktop user to do it this way.

Also I don't see how it has anything to do with whisperlive? That issue is for people who want real-time transcribing. The "server" is not real-time, it's doing a single request/response loop, only difference is in IPC transport (TCP vs unix pipes).

@ArthurHeymans
Copy link
Contributor

#35 Implements server mode with both local and remote via openai API. It comes in handy if the local machine is less powerful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants