talk and talk-llama: Pass text_to_speak as a file #1865

tamo · 2024-02-13T14:28:10Z

Fix win 11 / linux - talk-llama voice interface is broken. Commit d6b9be2 from 19.01.2204 #1796
Make speak.sh work without modification in more cases
Enable eleven-labs.py to play audio
Implement as much information in comments as possible

(Drop some commits if you think I made too many changes in one PR.)

1:
Currently talk and talk-llama pass text_to_speak as an argument to TTS in system() call.
That's a little scary.

True, we can escape dangerous characters confidently if we can focus on bash.
But we have to take care of Windows, whose pwsh once changed the way of handling quotes.
I'm not sure that what is safe today will be safe tomorrow.

So I suggest we output text_to_speak to a file, and then pass it to TTS.

Pipe would be better, but I don't know how to implement piping in a portable way (and I guess we prefer avoiding #ifdefs in cpp).

2:
Testing a number of TTS's is hard because we have to edit speak.sh every time.
Simple "export PATH=somewhere" would be better.
So I suggest we check "command -v" in speak.sh.
You can change the order if you want. (For example, to increase the priority of "say")

3:
The current example of the usage of elevenlabs in speak.sh is, save audio.mp3 and ffplay it.
But elevenlabs itself plays the audio with ffplay if we call "play" instead of "save".
https://github.com/elevenlabs/elevenlabs-python/blob/v0.2.27/elevenlabs/utils.py

Also, when I tested "save", ffplay miscalculated the audio duration. So I think "play" is more stable.

4:
Comments have much useful information, such as how to install TTS programs and voice options.
It would be better to code it than keep it hidden in comments.

ggerganov · 2024-02-19T08:59:54Z

examples/talk-llama/talk-llama.cpp

+                    std::ofstream speak_file(params.speak_file.c_str());
+                    if (speak_file.fail()) {
+                        fprintf(stderr, "%s: failed to open speak_file\n", __func__);
+                    } else {
+                        speak_file.write(params.heard_ok.c_str(), params.heard_ok.size());
+                        speak_file.close();
+                        int ret = system((params.speak + " " + std::to_string(voice_id) + " " + params.speak_file).c_str());
+                        if (ret != 0) {
+                            fprintf(stderr, "%s: failed to speak\n", __func__);
+                        }


Let's factor out this as a helper function speak_with_file() in common and reuse it

Yes, let's! 3366e2a

it is too hard to quote text in a portable way

Options: -v voice, -s savefile, -p (--play)

Pass "-q" to eleven-labs.py to skip checking whether elevenlabs is installed

in order to sync talk with talk-llama

Passing text_to_speak as a file is safer and more portable cf. https://stackoverflow.com/a/59036879/45375

and remove DEFAULT_VOICE because it is deprecated (https://www.reddit.com/r/ElevenLabs/comments/1830abt/what_happened_to_bella/) ``` usage: eleven-labs.py [-q] [-l] [-h] [-n NAME | -v NUMBER] [-f KEY=VAL] [-s FILE | -p] [TEXTFILE] options: -q, --quick skip checking the required library action: TEXTFILE read the text file (default: stdin) -l, --list show the list of voices and exit -h, --help show this help and exit voice selection: -n NAME, --name NAME get a voice object by name (default: Arnold) -v NUMBER, --voice NUMBER get a voice object by number (see --list) -f KEY=VAL, --filter KEY=VAL filter voices by labels (default: "use case=narration") this option can be used multiple times filtering will be disabled if the first -f has no "=" (e.g. -f "any") output: -s FILE, --save FILE save the TTS to a file (default: audio.mp3) -p, --play play the TTS with ffplay ```

as suggested in the review

* talk-llama: pass file instead of arg it is too hard to quote text in a portable way * talk-llama: pass heard_ok as a file * talk-llama: let eleven-labs.py accept options Options: -v voice, -s savefile, -p (--play) * talk-llama: check installed commands in "speak" Pass "-q" to eleven-labs.py to skip checking whether elevenlabs is installed * talk-llama: pass voice_id again in order to sync talk with talk-llama * talk: sync with talk-llama Passing text_to_speak as a file is safer and more portable cf. https://stackoverflow.com/a/59036879/45375 * talk and talk-llama: get all installed voices in speak.ps1 * talk and talk-llama: get voices from api * talk and talk-llama: add more options to eleven-labs.py and remove DEFAULT_VOICE because it is deprecated (https://www.reddit.com/r/ElevenLabs/comments/1830abt/what_happened_to_bella/) ``` usage: eleven-labs.py [-q] [-l] [-h] [-n NAME | -v NUMBER] [-f KEY=VAL] [-s FILE | -p] [TEXTFILE] options: -q, --quick skip checking the required library action: TEXTFILE read the text file (default: stdin) -l, --list show the list of voices and exit -h, --help show this help and exit voice selection: -n NAME, --name NAME get a voice object by name (default: Arnold) -v NUMBER, --voice NUMBER get a voice object by number (see --list) -f KEY=VAL, --filter KEY=VAL filter voices by labels (default: "use case=narration") this option can be used multiple times filtering will be disabled if the first -f has no "=" (e.g. -f "any") output: -s FILE, --save FILE save the TTS to a file (default: audio.mp3) -p, --play play the TTS with ffplay ``` * examples: add speak_with_file() as suggested in the review * talk and talk-llama: ignore to_speak.txt

bobqianic approved these changes Feb 14, 2024

View reviewed changes

ggerganov reviewed Feb 19, 2024

View reviewed changes

tamo force-pushed the passfile branch from e6c7cc4 to 3366e2a Compare February 19, 2024 16:49

tamo added 11 commits February 24, 2024 16:08

talk-llama: pass file instead of arg

df26676

it is too hard to quote text in a portable way

talk-llama: pass heard_ok as a file

83d48ee

talk-llama: let eleven-labs.py accept options

766c8cb

Options: -v voice, -s savefile, -p (--play)

talk-llama: check installed commands in "speak"

f5ec91c

Pass "-q" to eleven-labs.py to skip checking whether elevenlabs is installed

talk-llama: pass voice_id again

c72857b

in order to sync talk with talk-llama

talk: sync with talk-llama

6584e63

Passing text_to_speak as a file is safer and more portable cf. https://stackoverflow.com/a/59036879/45375

talk and talk-llama: get all installed voices in speak.ps1

fd8ee90

talk and talk-llama: get voices from api

650a966

examples: add speak_with_file()

4e8cce7

as suggested in the review

talk and talk-llama: ignore to_speak.txt

0e4977a

tamo force-pushed the passfile branch from 3366e2a to 0e4977a Compare February 24, 2024 07:16

ggerganov approved these changes Feb 24, 2024

View reviewed changes

ggerganov merged commit f18738f into ggerganov:master Feb 24, 2024
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

talk and talk-llama: Pass text_to_speak as a file #1865

talk and talk-llama: Pass text_to_speak as a file #1865

tamo commented Feb 13, 2024

ggerganov Feb 19, 2024

tamo Feb 19, 2024

talk and talk-llama: Pass text_to_speak as a file #1865

talk and talk-llama: Pass text_to_speak as a file #1865

Conversation

tamo commented Feb 13, 2024

ggerganov Feb 19, 2024

Choose a reason for hiding this comment

tamo Feb 19, 2024

Choose a reason for hiding this comment