Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Super Simple Whisper Server #1380

Merged
merged 11 commits into from
Nov 20, 2023
Merged

Super Simple Whisper Server #1380

merged 11 commits into from
Nov 20, 2023

Conversation

felrock
Copy link
Collaborator

@felrock felrock commented Oct 19, 2023

I made this because i need it for a project, but if people would like to also have it I could extend it a bit and clean it up a bit for you guys.

@eschmidbauer
Copy link
Contributor

Thanks for sharing @felrock

@ggerganov ggerganov mentioned this pull request Nov 16, 2023
@felrock
Copy link
Collaborator Author

felrock commented Nov 17, 2023

Hey @ggerganov, I would like to add support to the OpenAI API request as you mentioned in the issue ticket.

file
So currently i have the first item in the API, but its called audio file I can just change that
model
Not sure if we can support this since the server loads a model before handling request, but perhaps down the line we could add a load request that could change what model is being run.
prompt
Pretty easy to add since we could pass it to the mode.
response_format
Currently we only return json, would you like support for the others as well? I can add them
temperature
Same as prompt.

I've also added a few more of whispers params as request parameters, for example duration and offset.

I have some time during the weekend to implement this. Please add any comments if you thing i should do something differently. Thanks!

@ggerganov
Copy link
Owner

@felrock Awesome - let's do this!

Not sure if we can support this since the server loads a model before handling request, but perhaps down the line we could add a load request that could change what model is being run.

Maybe in the future.

Currently we only return json, would you like support for the others as well? I can add them

json is fine. I guess adding at least plain text format would also be nice since in some application, parsing json is not always an option. And later we can extend to support more formats if necessary

@felrock
Copy link
Collaborator Author

felrock commented Nov 18, 2023

Now the post request to /inference should support the format that was specified by OpenAI. Except the parameters such as model(since we are using a load method instead) and no AUTH tokens are needed. Output can be in json or in text format. It's now also possible to load a new model using the /load post function, passing a file path to a model on the machine that runs the server. Edit: I forgot to mention i added some simple error messages also on the api responses.

Request examples:
/inference

➜  whisper.cpp git:(master) ✗ curl 127.0.0.1:8080/inference \
  -H "Content-Type: multipart/form-data" \
  -F file="<file-path>" \
  -F temperature="0.2"

/load

➜  whisper.cpp git:(master) ✗ curl 127.0.0.1:8080/load \
  -H "Content-Type: multipart/form-data" \
  -F model="<path-to-model-file>"

Let me know what you think @ggerganov.

@ggerganov
Copy link
Owner

ggerganov commented Nov 20, 2023

I'm trying:

curl 127.0.0.1:8080/inference -H "Content-Type: multipart/form-data" -F file="./samples/jfk.wav" -F temperature="0.2" -F response-format="json"

And I get:

whisper server listening at http://127.0.0.1:8080
Received request: 
error: failed to open '' as WAV file
error: failed to read WAV file ''

Any ideas?

Looks like:

audio_file.filename == "";
audio_file.content == "./samples/jfk.wav";

Is this expected? I'm no macOS


Edit: Ok got it. I have to use -F file=@./samples/jfk.wav 👍

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

examples/server/server.cpp Outdated Show resolved Hide resolved
@ggerganov
Copy link
Owner

Apologies for the force-push just now. I think we should be ready to merge

@felrock
Copy link
Collaborator Author

felrock commented Nov 20, 2023

No worries, sorry for not updating the Makefile!

@ggerganov ggerganov merged commit eff3570 into ggerganov:master Nov 20, 2023
35 of 36 checks passed
@teddybear082
Copy link

teddybear082 commented Nov 21, 2023

Hi I have tried this with the release artifacts and I keep getting the following error (including the working main.exe test as well). Running on Windows11:

C:\Other programs\whisper\whisper-blas-bin-Win32>main.exe -debug -f jfk.wav -m ggml-base-q5_1.bin
whisper_init_from_file_with_params_no_state: loading model from 'ggml-base-q5_1.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 9
whisper_model_load: qntvr         = 1
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU buffer size =    68.41 MB
whisper_model_load: model size    =   68.33 MB
whisper_init_state: kv self size  =   16.52 MB
whisper_init_state: kv cross size =   18.43 MB
whisper_init_state: compute buffer (conv)   =   14.41 MB
whisper_init_state: compute buffer (encode) =   85.55 MB
whisper_init_state: compute buffer (cross)  =    4.33 MB
whisper_init_state: compute buffer (decode) =   96.04 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 |

main: processing 'jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:10.560]   And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     load time =    57.88 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     9.75 ms
whisper_print_timings:   sample time =   111.76 ms /   137 runs (    0.82 ms per run)
whisper_print_timings:   encode time =  1372.04 ms /     1 runs ( 1372.04 ms per run)
whisper_print_timings:   decode time =     2.70 ms /     1 runs (    2.70 ms per run)
whisper_print_timings:   batchd time =   225.82 ms /   134 runs (    1.69 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  1788.01 ms

C:\Other programs\whisper\whisper-blas-bin-Win32>server.exe -m ggml-base-q5_1.bin
whisper_init_from_file_with_params_no_state: loading model from 'ggml-base-q5_1.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 9
whisper_model_load: qntvr         = 1
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU buffer size =    68.41 MB
whisper_model_load: model size    =   68.33 MB
whisper_init_state: kv self size  =   16.52 MB
whisper_init_state: kv cross size =   18.43 MB
whisper_init_state: compute buffer (conv)   =   14.41 MB
whisper_init_state: compute buffer (encode) =   85.55 MB
whisper_init_state: compute buffer (cross)  =    4.33 MB
whisper_init_state: compute buffer (decode) =   96.04 MB

whisper server listening at http://127.0.0.1:8080


Received request: jfk.wav
error: failed to open 'jfk.wav' as WAV file
error: failed to read WAV file 'jfk.wav' 

Not sure what I'm doing wrong. Thanks!! Also tried with curl 127.0.0.1:8080/inference -H "Content-Type: multipart/form-data" -F file=@./jfk.wav -F temperature="0.2" -F response-format="json"

@felrock
Copy link
Collaborator Author

felrock commented Nov 21, 2023

I think it's because of the temp file that is written is not being closed. I'll add a follow up commit later today closing the ofstream and I think it will solve the issue.

@teddybear082
Copy link

Thank you, also please don't rule out my incompetence. So if there's something I might be doing wrong as well feel free to let me know. I don't run curl commands typically but when my normal python request code failed to the server I wanted to try to replicate the command above as best as possible.

@felrock
Copy link
Collaborator Author

felrock commented Nov 21, 2023

No worries! Try this out and please let me know if it works after, your curl syntax looks correct.

#1533

@teddybear082
Copy link

teddybear082 commented Nov 21, 2023

curl worked now correctly!!! Thank you!!!

whisper server listening at http://127.0.0.1:8080

Received request: jfk.wav
Successfully loaded jfk.wav

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 |

operator (): processing 'jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

Running whisper.cpp inference on jfk.wav

[00:00:00.000 --> 00:00:10.640]   And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

@teddybear082
Copy link

ehhhh sorry. One new "problem." This version deletes the jfk.wav when it's done with it. Noticed when I ran command again "just because" that directory no longer has jfk.wav in it.

@felrock
Copy link
Collaborator Author

felrock commented Nov 21, 2023

Great! Haha well that isn't good, I took for granted that the file wouldn't exists in the current working directory of the server, but if it is it will remove it. Maybe if we just add some random temp name to it instead we wouldn't have this problem.

@teddybear082
Copy link

teddybear082 commented Nov 21, 2023

ohhhhh ok. well see sometimes you need an idiot to test to do something that no one else would do, I stuck the file in there because I was having issues with paths and such and wanted to just eliminate as many variables as possible of what might be going wrong. That's probably not something most users would do. :) Thanks again, this is such a game changer having this feature (hence why I'm testing it so quickly lol)!

EDIT: Can 100% confirm that if you're a normal person and have the file in samples/file.wav all works well and doesn't delete the file so I would not consider this a high priority at all.

@felrock
Copy link
Collaborator Author

felrock commented Nov 21, 2023

Haha it's great that you find bugs I would of never found! Added the changes here #1535

@teddybear082
Copy link

teddybear082 commented Nov 21, 2023

You're quick thank you! By the way, blown away by performance of server + cublas + distilwhisper large v2 ggml right now. Its near instant.

@karanbangia
Copy link

Is there any support for streaming realtime speech to text?

@karanbangia
Copy link

@felrock

@felrock
Copy link
Collaborator Author

felrock commented Nov 24, 2023

Not by default no, and also it depends on what you mean with realtime really. If you add some logic in the client side you can make it realtime.

@vtmmm
Copy link

vtmmm commented Dec 15, 2023

This server is very useful! Happy to see it added.

Would it be possible to add an endpoint that returns the name of the currently loaded model? I'd like my client to be able to check the model without instructing the server which one to /load.

landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request Dec 16, 2023
…ov#1380)

* Add first draft of server

* Added json support and base funcs for server.cpp

* Add more user input via api-request

also some clean up

* Add reqest params and load post function

Also some general clean up

* Remove unused function

* Add readme

* Add exception handlers

* Update examples/server/server.cpp

* make : add server target

* Add magic curl syntax

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
@kustomzone
Copy link

Just built and ran Whisper server to find it conflicts with the same default port of a running Llama server. (using both defaults at once is therefore blocked) Could possibly offset Whisper's default port by a small amount to save initial adjustment.

@ggerganov
Copy link
Owner

Both are good ideas! PRs welcome

@vtmmm
Copy link

vtmmm commented Feb 12, 2024

Both are good ideas! PRs welcome

I don't know if returning the model with every transcription is desired as default behavior or not, so I created an issue to discuss instead, which shows how I implemented it for my use case.

#1848

@squizzster
Copy link

@felrock Works like a dream.... I am not getting timestamps... I see in ./server you can disable them but how can I enable them?
Or WIP ?
Thank you.

Example:```

time curl 192.168.1.205:8080/inference -H "Content-Type: multipart/form-data" -F temperature="0.1" -F response-format="json" -F file=@/unique_audio/28d99065444b8f27d8a59372ae87232f.wav
{"text":" Good morning. Yeah, can I book a taxi for 4 o'clock p.m. today? So, can you come please to the 27 zzzzzz Avenue, xxxxxxxx?\n So, we're gonna be going to pick up someone in xxxxxx place in yyyyyy as well.\n And then we're gonna be going to yyyyyyy Boulevard in ggggggg.\n And the last stop is gonna be, we're going for skiing and we're gonna go to Escape in kkkkkk rrrrrr.\n And it's, yeah, 4 o'clock please. Could you come to pick up us?\n"}

@felrock
Copy link
Collaborator Author

felrock commented Jun 10, 2024

Nice that you like it! Perhaps try out the vjson_format instead, I think that should have the timestamps you are looking for.

@squizzster
Copy link

Nice that you like it! Perhaps try out the vjson_format instead, I think that should have the timestamps you are looking for.

`-F response_format="verbose_json"` 

:-) CHEERS!

iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request Sep 23, 2024
…ov#1380)

* Add first draft of server

* Added json support and base funcs for server.cpp

* Add more user input via api-request

also some clean up

* Add reqest params and load post function

Also some general clean up

* Remove unused function

* Add readme

* Add exception handlers

* Update examples/server/server.cpp

* make : add server target

* Add magic curl syntax

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants