Super Simple Whisper Server #1380

felrock · 2023-10-19T20:08:28Z

I made this because i need it for a project, but if people would like to also have it I could extend it a bit and clean it up a bit for you guys.

eschmidbauer · 2023-11-07T14:09:24Z

Thanks for sharing @felrock

also some clean up

felrock · 2023-11-17T15:31:01Z

Hey @ggerganov, I would like to add support to the OpenAI API request as you mentioned in the issue ticket.

file
So currently i have the first item in the API, but its called audio file I can just change that
model
Not sure if we can support this since the server loads a model before handling request, but perhaps down the line we could add a load request that could change what model is being run.
prompt
Pretty easy to add since we could pass it to the mode.
response_format
Currently we only return json, would you like support for the others as well? I can add them
temperature
Same as prompt.

I've also added a few more of whispers params as request parameters, for example duration and offset.

I have some time during the weekend to implement this. Please add any comments if you thing i should do something differently. Thanks!

ggerganov · 2023-11-17T16:18:44Z

@felrock Awesome - let's do this!

Not sure if we can support this since the server loads a model before handling request, but perhaps down the line we could add a load request that could change what model is being run.

Maybe in the future.

Currently we only return json, would you like support for the others as well? I can add them

json is fine. I guess adding at least plain text format would also be nice since in some application, parsing json is not always an option. And later we can extend to support more formats if necessary

Also some general clean up

felrock · 2023-11-18T13:16:16Z

Now the post request to /inference should support the format that was specified by OpenAI. Except the parameters such as model(since we are using a load method instead) and no AUTH tokens are needed. Output can be in json or in text format. It's now also possible to load a new model using the /load post function, passing a file path to a model on the machine that runs the server. Edit: I forgot to mention i added some simple error messages also on the api responses.

Request examples:
/inference

➜  whisper.cpp git:(master) ✗ curl 127.0.0.1:8080/inference \
  -H "Content-Type: multipart/form-data" \
  -F file="<file-path>" \
  -F temperature="0.2"

/load

➜  whisper.cpp git:(master) ✗ curl 127.0.0.1:8080/load \
  -H "Content-Type: multipart/form-data" \
  -F model="<path-to-model-file>"

Let me know what you think @ggerganov.

ggerganov · 2023-11-20T19:00:18Z

I'm trying:

curl 127.0.0.1:8080/inference -H "Content-Type: multipart/form-data" -F file="./samples/jfk.wav" -F temperature="0.2" -F response-format="json"

And I get:

whisper server listening at http://127.0.0.1:8080
Received request: 
error: failed to open '' as WAV file
error: failed to read WAV file ''

Any ideas?

Looks like:

audio_file.filename == "";
audio_file.content == "./samples/jfk.wav";

Is this expected? I'm no macOS

Edit: Ok got it. I have to use -F file=@./samples/jfk.wav 👍

ggerganov

Nice!

examples/server/server.cpp

examples/server/README.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggerganov · 2023-11-20T19:27:31Z

Apologies for the force-push just now. I think we should be ready to merge

felrock · 2023-11-20T19:34:35Z

No worries, sorry for not updating the Makefile!

teddybear082 · 2023-11-21T14:48:59Z

Hi I have tried this with the release artifacts and I keep getting the following error (including the working main.exe test as well). Running on Windows11:

C:\Other programs\whisper\whisper-blas-bin-Win32>main.exe -debug -f jfk.wav -m ggml-base-q5_1.bin
whisper_init_from_file_with_params_no_state: loading model from 'ggml-base-q5_1.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 9
whisper_model_load: qntvr         = 1
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU buffer size =    68.41 MB
whisper_model_load: model size    =   68.33 MB
whisper_init_state: kv self size  =   16.52 MB
whisper_init_state: kv cross size =   18.43 MB
whisper_init_state: compute buffer (conv)   =   14.41 MB
whisper_init_state: compute buffer (encode) =   85.55 MB
whisper_init_state: compute buffer (cross)  =    4.33 MB
whisper_init_state: compute buffer (decode) =   96.04 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 |

main: processing 'jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:10.560]   And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     load time =    57.88 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     9.75 ms
whisper_print_timings:   sample time =   111.76 ms /   137 runs (    0.82 ms per run)
whisper_print_timings:   encode time =  1372.04 ms /     1 runs ( 1372.04 ms per run)
whisper_print_timings:   decode time =     2.70 ms /     1 runs (    2.70 ms per run)
whisper_print_timings:   batchd time =   225.82 ms /   134 runs (    1.69 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  1788.01 ms

C:\Other programs\whisper\whisper-blas-bin-Win32>server.exe -m ggml-base-q5_1.bin
whisper_init_from_file_with_params_no_state: loading model from 'ggml-base-q5_1.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 9
whisper_model_load: qntvr         = 1
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU buffer size =    68.41 MB
whisper_model_load: model size    =   68.33 MB
whisper_init_state: kv self size  =   16.52 MB
whisper_init_state: kv cross size =   18.43 MB
whisper_init_state: compute buffer (conv)   =   14.41 MB
whisper_init_state: compute buffer (encode) =   85.55 MB
whisper_init_state: compute buffer (cross)  =    4.33 MB
whisper_init_state: compute buffer (decode) =   96.04 MB

whisper server listening at http://127.0.0.1:8080


Received request: jfk.wav
error: failed to open 'jfk.wav' as WAV file
error: failed to read WAV file 'jfk.wav'

Not sure what I'm doing wrong. Thanks!! Also tried with curl 127.0.0.1:8080/inference -H "Content-Type: multipart/form-data" -F file=@./jfk.wav -F temperature="0.2" -F response-format="json"

felrock · 2023-11-21T15:19:13Z

I think it's because of the temp file that is written is not being closed. I'll add a follow up commit later today closing the ofstream and I think it will solve the issue.

teddybear082 · 2023-11-21T17:27:41Z

Thank you, also please don't rule out my incompetence. So if there's something I might be doing wrong as well feel free to let me know. I don't run curl commands typically but when my normal python request code failed to the server I wanted to try to replicate the command above as best as possible.

felrock · 2023-11-21T18:51:06Z

No worries! Try this out and please let me know if it works after, your curl syntax looks correct.

#1533

teddybear082 · 2023-11-21T19:40:21Z

curl worked now correctly!!! Thank you!!!

whisper server listening at http://127.0.0.1:8080

Received request: jfk.wav
Successfully loaded jfk.wav

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 |

operator (): processing 'jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

Running whisper.cpp inference on jfk.wav

[00:00:00.000 --> 00:00:10.640]   And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

teddybear082 · 2023-11-21T19:43:40Z

ehhhh sorry. One new "problem." This version deletes the jfk.wav when it's done with it. Noticed when I ran command again "just because" that directory no longer has jfk.wav in it.

felrock · 2023-11-21T19:49:20Z

Great! Haha well that isn't good, I took for granted that the file wouldn't exists in the current working directory of the server, but if it is it will remove it. Maybe if we just add some random temp name to it instead we wouldn't have this problem.

teddybear082 · 2023-11-21T19:51:26Z

ohhhhh ok. well see sometimes you need an idiot to test to do something that no one else would do, I stuck the file in there because I was having issues with paths and such and wanted to just eliminate as many variables as possible of what might be going wrong. That's probably not something most users would do. :) Thanks again, this is such a game changer having this feature (hence why I'm testing it so quickly lol)!

EDIT: Can 100% confirm that if you're a normal person and have the file in samples/file.wav all works well and doesn't delete the file so I would not consider this a high priority at all.

felrock · 2023-11-21T20:18:16Z

Haha it's great that you find bugs I would of never found! Added the changes here #1535

teddybear082 · 2023-11-21T21:32:13Z

You're quick thank you! By the way, blown away by performance of server + cublas + distilwhisper large v2 ggml right now. Its near instant.

karanbangia · 2023-11-23T15:47:08Z

Is there any support for streaming realtime speech to text?

karanbangia · 2023-11-24T21:03:24Z

@felrock

felrock · 2023-11-24T22:57:12Z

Not by default no, and also it depends on what you mean with realtime really. If you add some logic in the client side you can make it realtime.

vtmmm · 2023-12-15T18:24:31Z

This server is very useful! Happy to see it added.

Would it be possible to add an endpoint that returns the name of the currently loaded model? I'd like my client to be able to check the model without instructing the server which one to /load.

…ov#1380) * Add first draft of server * Added json support and base funcs for server.cpp * Add more user input via api-request also some clean up * Add reqest params and load post function Also some general clean up * Remove unused function * Add readme * Add exception handlers * Update examples/server/server.cpp * make : add server target * Add magic curl syntax Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

kustomzone · 2023-12-17T10:22:59Z

Just built and ran Whisper server to find it conflicts with the same default port of a running Llama server. (using both defaults at once is therefore blocked) Could possibly offset Whisper's default port by a small amount to save initial adjustment.

ggerganov · 2023-12-18T08:20:52Z

Both are good ideas! PRs welcome

vtmmm · 2024-02-12T11:33:02Z

Both are good ideas! PRs welcome

I don't know if returning the model with every transcription is desired as default behavior or not, so I created an issue to discuss instead, which shows how I implemented it for my use case.

#1848

squizzster · 2024-06-10T14:32:04Z

@felrock Works like a dream.... I am not getting timestamps... I see in ./server you can disable them but how can I enable them?
Or WIP ?
Thank you.

Example:```

time curl 192.168.1.205:8080/inference -H "Content-Type: multipart/form-data" -F temperature="0.1" -F response-format="json" -F file=@/unique_audio/28d99065444b8f27d8a59372ae87232f.wav
{"text":" Good morning. Yeah, can I book a taxi for 4 o'clock p.m. today? So, can you come please to the 27 zzzzzz Avenue, xxxxxxxx?\n So, we're gonna be going to pick up someone in xxxxxx place in yyyyyy as well.\n And then we're gonna be going to yyyyyyy Boulevard in ggggggg.\n And the last stop is gonna be, we're going for skiing and we're gonna go to Escape in kkkkkk rrrrrr.\n And it's, yeah, 4 o'clock please. Could you come to pick up us?\n"}

felrock · 2024-06-10T15:41:30Z

Nice that you like it! Perhaps try out the vjson_format instead, I think that should have the timestamps you are looking for.

squizzster · 2024-06-10T16:37:19Z

Nice that you like it! Perhaps try out the vjson_format instead, I think that should have the timestamps you are looking for.

`-F response_format="verbose_json"`

:-) CHEERS!

…ov#1380) * Add first draft of server * Added json support and base funcs for server.cpp * Add more user input via api-request also some clean up * Add reqest params and load post function Also some general clean up * Remove unused function * Add readme * Add exception handlers * Update examples/server/server.cpp * make : add server target * Add magic curl syntax Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

felrock added 2 commits October 19, 2023 16:45

Add first draft of server

b33e923

Added json support and base funcs for server.cpp

24c208f

felrock force-pushed the master branch from 965ac7d to 24c208f Compare October 19, 2023 20:09

ggerganov mentioned this pull request Nov 16, 2023

Server example? #1369

Open

Add more user input via api-request

67e6d2c

also some clean up

felrock force-pushed the master branch from 68c6f46 to 67e6d2c Compare November 18, 2023 10:11

felrock added 2 commits November 18, 2023 14:00

Add reqest params and load post function

352dc28

Also some general clean up

Remove unused function

c2f0a2c

felrock added 2 commits November 19, 2023 11:00

Add readme

db6ecd9

Add exception handlers

60f615e

ggerganov approved these changes Nov 20, 2023

View reviewed changes

examples/server/server.cpp Outdated Show resolved Hide resolved

Update examples/server/server.cpp

aea334c

ggerganov reviewed Nov 20, 2023

View reviewed changes

examples/server/README.md Outdated Show resolved Hide resolved

ggerganov and others added 3 commits November 20, 2023 21:19

make : add server target

bf6d79f

Merge branch 'master' into HEAD

1da258a

Add magic curl syntax

7fa24cf

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggerganov force-pushed the master branch from 29c3af4 to 7fa24cf Compare November 20, 2023 19:27

ggerganov merged commit eff3570 into ggerganov:master Nov 20, 2023
35 of 36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Super Simple Whisper Server #1380

Super Simple Whisper Server #1380

felrock commented Oct 19, 2023

eschmidbauer commented Nov 7, 2023

felrock commented Nov 17, 2023

ggerganov commented Nov 17, 2023

felrock commented Nov 18, 2023 •

edited

Loading

ggerganov commented Nov 20, 2023 •

edited

Loading

ggerganov left a comment

ggerganov commented Nov 20, 2023

felrock commented Nov 20, 2023

teddybear082 commented Nov 21, 2023 •

edited

Loading

felrock commented Nov 21, 2023 •

edited

Loading

teddybear082 commented Nov 21, 2023

felrock commented Nov 21, 2023

teddybear082 commented Nov 21, 2023 •

edited

Loading

teddybear082 commented Nov 21, 2023

felrock commented Nov 21, 2023

teddybear082 commented Nov 21, 2023 •

edited

Loading

felrock commented Nov 21, 2023

teddybear082 commented Nov 21, 2023 •

edited

Loading

karanbangia commented Nov 23, 2023

karanbangia commented Nov 24, 2023

felrock commented Nov 24, 2023

vtmmm commented Dec 15, 2023

kustomzone commented Dec 17, 2023

ggerganov commented Dec 18, 2023

vtmmm commented Feb 12, 2024

squizzster commented Jun 10, 2024

felrock commented Jun 10, 2024

squizzster commented Jun 10, 2024

Super Simple Whisper Server #1380

Super Simple Whisper Server #1380

Conversation

felrock commented Oct 19, 2023

eschmidbauer commented Nov 7, 2023

felrock commented Nov 17, 2023

ggerganov commented Nov 17, 2023

felrock commented Nov 18, 2023 • edited Loading

ggerganov commented Nov 20, 2023 • edited Loading

ggerganov left a comment

Choose a reason for hiding this comment

ggerganov commented Nov 20, 2023

felrock commented Nov 20, 2023

teddybear082 commented Nov 21, 2023 • edited Loading

felrock commented Nov 21, 2023 • edited Loading

teddybear082 commented Nov 21, 2023

felrock commented Nov 21, 2023

teddybear082 commented Nov 21, 2023 • edited Loading

teddybear082 commented Nov 21, 2023

felrock commented Nov 21, 2023

teddybear082 commented Nov 21, 2023 • edited Loading

felrock commented Nov 21, 2023

teddybear082 commented Nov 21, 2023 • edited Loading

karanbangia commented Nov 23, 2023

karanbangia commented Nov 24, 2023

felrock commented Nov 24, 2023

vtmmm commented Dec 15, 2023

kustomzone commented Dec 17, 2023

ggerganov commented Dec 18, 2023

vtmmm commented Feb 12, 2024

squizzster commented Jun 10, 2024

felrock commented Jun 10, 2024

squizzster commented Jun 10, 2024

felrock commented Nov 18, 2023 •

edited

Loading

ggerganov commented Nov 20, 2023 •

edited

Loading

teddybear082 commented Nov 21, 2023 •

edited

Loading

felrock commented Nov 21, 2023 •

edited

Loading

teddybear082 commented Nov 21, 2023 •

edited

Loading

teddybear082 commented Nov 21, 2023 •

edited

Loading

teddybear082 commented Nov 21, 2023 •

edited

Loading