-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server example? #1369
Comments
For what it's worth, I already have a very rudimentary server example working. It's a bit of a frankenstein copy-paste work of It supports configuring the server in the exact same way as the llama server, and it supports (untested) these params:
But not diarization, language or any output option, my goal was to get a working server for my own application. Anyone interested in a PR? |
Maybe, when I finish working on optimizing stable-diffusion.cpp and adding a server to it, I could create a server example for whisper.cpp. |
I posted the code as a PR #1375 |
Hey all, I notice several server examples being proposed. This is super cool! I'm planning to do a major update to |
Hi again! I think we should restart the server efforts now that I like both #1375 and #1380, so not sure how to decide which one to integrate. Also, I think we should aim to support the OpenAI Audio API for speech to text: https://platform.openai.com/docs/api-reference/audio The approach in #1418 is also interesting, so it can be merged as an alternative solution to the REST-based server example. |
Hello! I'm keen on fixing and merging my changes for the server. I've seen that the use for server in llama.cpp has enabled projects such as ollama and others. So i think it's an important application to have, for users to easily create interfaces against. I have also started to create a similar server solution for bark.cpp because in my use case I would like to have some sort of voice(a bit more granular than espeak). Which would complete the full llm robot, a brain(llama.cpp), ears(whisper.cpp) and voice(bark.cpp). |
Yup, I agree that a server can find many interesting applications.
Yes! Great idea - we are getting close :) |
Also agree. To hawk my proposal #1418 (that fork is a bit messy, but something like it) - I think it'd be really great to have the ability to create many types of servers. For instance, I might want a gRPC server. Or a rest server. Or a ROS pub-sub node. Likewise: many types of encodings for the result: maybe json, maybe bson, maybe protobuf, etc. I think it'd require very little refactoring - basically, just the core stream server as a class with a method |
|
First pass of a server example has been merged (#1380). Looks like streaming and diarization are 2 of the most requested features for the server. Not sure if we can do something meaningful for diarization, but we should able to provide a streaming API relatively easy. |
I left the diarization parameters in there so it might be working, I didn't know how it worked or how to test it. |
The server works well, but when speech is short, like "lights on", "lights off", etc, it doesn't produce any text. I suspect
I think a way to provide a context for the server (like |
Ok, I've tried sending single word .wav files to the server and have it respond with the correct work. Did you try using the prompt flag? Should be something similar to what you describe. |
It should work with short audio too. The prompt can help in some situations to make the transcript more robust, but is not required in general. |
I haven't tested this specific server implementation, but the server implementation I was using previously definitely did work with short commands, I specifically made it for that purpose. So either
Have you tried it with longer audio? |
Longer audio always works well; the same problem that happens with Edit: I am using sox to make a wav file, @Azeirah what do you use to make the wav file? |
From @felrock
From @ggerganov
I am using the following (based on https://stackoverflow.com/questions/30006609/using-sox-for-voice-detection-and-streaming) to generate the wav:
I play the wav and it is okay. How do you generate your wavs ? |
Im using a python script, which uses PyAudio. I record for about three seconds per .wav file |
thanks, how do you detect voice? |
Okay, I solved the problem. Post it here for those interested. I simply padded the wav file with 500ms of silence at the beginning and 1 second of silence at the end, and everything works fine now. |
It would be really useful to add the grammar/commands.txt functionality in the |
Ah sorry, about that - I forgot there is logic to ignore sub one second audio: Lines 5193 to 5198 in a5881d6
|
I'm working on a voice-controlled application and I want to run small .wav files through whisper fairly often.
What I noticed is that it takes almost 50% of the total time just to load the model every single time I run
./main -m ... "my-short-spoken-command.wav"
I think it'd be nice if like in llama.cpp this project includes a server example so the model only has to be loaded once and stays in memory after loading.
The text was updated successfully, but these errors were encountered: