Support for Kokoro-FastAPI TTS Model #8685

Omni-NexusAI · 2025-01-19T16:17:55Z

Omni-NexusAI
Jan 19, 2025

Feature Request

Issue: Kokoro-FastAPI theoretically should support OpenAI's endpoints out of the box. Kokoro-FastAPI is designed to be compatible with OpenAI's speech endpoints. However, when attempting to replace values under the correct fields within Open_WebUI for OpenAI TTS settings, there is no audio output after LLM generates its model response. So this suggests that a backend adjustment is needed to be able to utilize Kokoro-FastAPI in Open-WebUI's advanced voice mode.

Benefit: Kokoro-FastAPI provides high-quality, fast, and locally runnable text-to-speech. This feature would expand Open WebUI's functionality, enhancing user experience.

khronimo · 2025-01-21T13:46:37Z

khronimo
Jan 21, 2025

i'd like to add that the onnx version of kokoro is fantastic, models are pretty small, sound really good and run fast even on cpu. I would personally love to see this built in to open-webui
https://github.com/thewh1teagle/kokoro-onnx

0 replies

jtourt · 2025-01-23T17:35:26Z

jtourt
Jan 23, 2025

Seems to be working fine in v0.5.6.

Settings:

Text-to-Speech Engine: OpenAI
API Base URL: http://localhost:8880/v1
API Key: not-needed
TTS Voice: af_bella
TTS Model: kokoro

16 replies

jtourt Feb 8, 2025

It's not waiting for a response from the server. It's not attempting to play the audio or send to the Kokoro server. Here is the code where you are....

` if (audioCache.has(content)) {
// If content is available in the cache, play it

			} else {
				// If not available in the cache, push it back to the queue and delay
				console.log(`Audio for "${content}" not yet available in the cache, re-queued...`);
`

But at this point, we are troubleshooting why the call feature is not working as expected -- not TTS.

Keep it simple. Take the call feature out of the equation. If you click the speaker icon under the response (Read Aloud), what happens?

Omni-NexusAI Feb 9, 2025
Author

One of the first things I tried, but when pressing the speaker nothing happens there either just POST errors.

Omni-NexusAI Feb 13, 2025
Author

Kokoro is now integrated directly into OpenWebUI, so thankfully this is no longer required to get it working. Did some tests and its working great!

RBEmerson970 Feb 13, 2025

Updating Open WebUI 90% broke using Kokoro. Only one voice (am_michael) works, anything else just paints the rippling busy icon over the speaker. Unfortunately, I don't see error messages anywhere.

RBEmerson970 Feb 13, 2025

I found the needed log for this problem. Please see the attached .log, collected from Docker:
Kokoro-Open WebUI fail.log

oculairmedia · 2025-01-23T18:41:55Z

oculairmedia
Jan 23, 2025

Settings:

Text-to-Speech Engine: OpenAI
API Base URL: http://localhost:8880/v1
API Key: not-needed
TTS Voice: af_bella
TTS Model: kokoro
returns
External: 422, message='Unprocessable Entity', url='http://100.81.139.20:8880/v1/audio/speech'

5 replies

jtourt Jan 23, 2025

The error message is saying that there are characters in the input that it cannot process. Carefully check that you do not have any unseen extra spaces before or after each setting. I can recreate the same error if I add extra spaces after the model name.

oculairmedia Jan 25, 2025

You where right this resolved it for me

Opeth66 Jan 25, 2025

this worked for me https://www.youtube.com/watch?v=UzpGgC2SmzI

ALBIHANY Feb 14, 2025

this worked for me https://www.youtube.com/watch?v=UzpGgC2SmzI

Thank you this is actually make it work for me using docker.host instead of local host make the connection from Open webUI to the Kokoro-FastAPI TTS server work

sorryjack Feb 17, 2025

this worked for me https://www.youtube.com/watch?v=UzpGgC2SmzI

works,, thanks

jwickers · 2025-01-30T02:35:02Z

jwickers
Jan 30, 2025

Is there any way to set the speech speed? It is a parameter like "voice" so it should probably be in the settings page as well.

1 reply

RBEmerson970 Feb 2, 2025

From Chat page > Settings>Audio>Speech Playback Speed> pulldown with speeds

RBEmerson970 · 2025-02-06T15:55:36Z

RBEmerson970
Feb 6, 2025

Can you be more specific about when you expect audio output, please. Attached are screen grabs of what's working here. All I do is click on the speaker icon to get an LLM response spoken. Cheers, Rick Emerson

…

On Wed, Feb 5, 2025 at 3:51 PM Omni-NexusAI ***@***.***> wrote: Did any of you guys who had problems getting it to work manage to find a way to get audio output? — Reply to this email directly, view it on GitHub <#8685 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BFI36ZAGIJLE3WYDEVDS7332OJ2UBAVCNFSM6AAAAABVX4U54KVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEMBXGQ2DSNA> . You are receiving this because you commented.Message ID: ***@***.***>

1 reply

Omni-NexusAI Feb 8, 2025
Author

Can you be more specific about when you expect audio output, please. Attached are screen grabs of what's working here. All I do is click on the speaker icon to get an LLM response spoken. Cheers, Rick Emerson
…
On Wed, Feb 5, 2025 at 3:51 PM Omni-NexusAI @.> wrote: Did any of you guys who had problems getting it to work manage to find a way to get audio output? — Reply to this email directly, view it on GitHub <#8685 (reply in thread)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/BFI36ZAGIJLE3WYDEVDS7332OJ2UBAVCNFSM6AAAAABVX4U54KVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEMBXGQ2DSNA . You are receiving this because you commented.Message ID: @.>

When I start the advanced voice mode I can talk and the model receives the STT data, but then after the LLM generates the text back it just keeps waiting for the TTS and there is no audio output in the end.

RBEmerson970 · 2025-02-07T00:03:31Z

RBEmerson970
Feb 7, 2025

FWIW, I moved to Kokoro FastAPI v0.1.5-pre (current release is v0.1.4), and it seems to be working well.

There are two points in Open WebUI Audio which need attention:

While v0.1.5-pre has many voices, Open WebUI's Audio settings (user or admin) don't show a meaningful list in the form lm|f_name where l is the language code (a=American, b=British, etc.), m=male, f=female and name is the voice's name (e.g., sky, kore, alloy, george, etc.). All parameters are lower case and AFAIK case-sensitive (should be case-insensitive IMHO).
The playback speed should be more granular. Some voices are inherently faster than others. At present the speed delta is +/- ,25, .50, and "why bother". Most web UI's have .10 increments and decrement. IMHO, .05 would be more helpful with fine-tuning playback.

1 reply

RBEmerson970 Feb 13, 2025

BUMP on points 1 & 2 above.

bet0x · 2025-02-13T22:22:39Z

bet0x
Feb 13, 2025

I been using this https://github.com/remsky/Kokoro-FastAPI Before the integration was even done and it was already working. CPU and GPU versions both works like a charm.

0 replies

iChristGit · 2025-02-14T15:03:45Z

iChristGit
Feb 14, 2025

When using the included kokoro integration, its seem to only use CPU, as my VRAM usage stays very low and ram is on 20GB~ (no LLM yet on ollama just trying to listen to old messages)

0 replies

RBEmerson970 · 2025-02-14T20:43:16Z

RBEmerson970
Feb 14, 2025

At this point anything touching TTS, from Open WebUI to Win 11 has been updated with no change.
2025-02-14 150107 INFO ('172.17.txt

1 reply

RBEmerson970 Feb 15, 2025

I changed the Admin:Audio setting for Text-To--Speech engine to
http://host.docker.internal:8880/v1
and Open WebUI is back to talking to me.
This replaces
http://localhost:8880/v1
At this point I have no idea whether the localhost URL was entered by mistake at some point, or it worked earlier, then stopped working. Be that as it may, if Kokoro-FastAPI's running on Docker, Open WebUI speaks.

Credit goes to this YT video for the idea
Run Free Text-to-Speech Locally on Open WebUI: Kokoro TTS Setup Guide (Windows)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Kokoro-FastAPI TTS Model #8685

{{title}}

Replies: 9 comments 25 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Support for Kokoro-FastAPI TTS Model #8685

Feature Request

Replies: 9 comments · 25 replies

Omni-NexusAI Feb 9, 2025 Author

Omni-NexusAI Feb 13, 2025 Author

Omni-NexusAI Feb 8, 2025 Author

Replies: 9 comments 25 replies

Omni-NexusAI Feb 9, 2025
Author

Omni-NexusAI Feb 13, 2025
Author

Omni-NexusAI Feb 8, 2025
Author