-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Text-to-Speech Implementations & CLI App #57
Conversation
src/stretch/audio/text_to_speech.py
Outdated
DEFAULT_LOGGER = logging.getLogger(__name__) | ||
|
||
|
||
class TextToSpeechEngineType(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer enums in a separate file usually, and split these 2 as well. that makes it easier to make them optional dependencies/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this enum and put pyttsx3
and gtts
in sepaarte files. I don't agree that enums should be in separate files as a rule; for example, in text_to_speech/executor.py
I feel its appropriate to keep the enum TextToSpeechOverrideBehavior
in the same file as TextToSpeechExecutor
. LMK if you feel otherwise (for that specific case).
|
||
# Adapted from https://github.com/markstent/audio-similarity/blob/main/audio_similarity/audio_similarity.py | ||
# Note that that script has other audio similarity metrics as well | ||
def spectral_contrast_similarity(ground_truth_filepath, comparison_filepath, sample_rate=16000): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i could see this being really useful for e.g. wake words, would it make sense in audio/utils or utils/audio or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe that would not be using filepaths though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Filepaths were the most convenient way to load into librosa
, and imo we can generalize it if/when we find another use for the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add an application for TTS inside stretch.apps
? Currently there is no main entry point inside <path/to/stretch_ai>/src/stretch/audio/text_to_speech_cli.py
Can you add the mp3 files to .gitattributes and make sure they are under git-lfs @hello-amal ? |
We want to make sure large files are never added to git history! |
@hello-cpaxton @hello-atharva Done with all suggested changes from this PR and
I did put |
@hello-cpaxton could you add |
"openai-whisper", | ||
"overrides", # better inheritance of docstrings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one may have caused issues during installation - did you try it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I uninstalled stretchpy
and overrides
, and then re-installed from src
and it worked on my machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending minor comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Added TTS * Update setup.py * Install `libasound2-dev` in workflow * Github workflows require `sudo apt-get` installs * Add portaudio to the `apt-get` installs * Fixes from pre-commits * Add espeak to github actions installation * Remove mp3s * Configure LFS to track MP3s * Added a check for large files in the pre-commit * Changes from PR review * Update github actions dep to fake audio capabilities * Update the apt install * updates to docker * workflow updates * Add espeak to README audio deps * Add ffmpeg * [WIP] list audioread backends in github actions * Refactor available formats * Implemented GoogleCloudTTS * [WIP] list audioread backends in github actions * [WIP] add verbose logs to failing test case * Remove GoogleCloudTTS on GithubActions * [WIP] verify the named temp file has size > 0 * [WIP] check if FFMPeg gets a decoder error on the mp3s * Pull LFS files in Github Action * Add Git LFS to the action workflow * Mark the git directory as safe before pulling LFS files * Move git-lfs from action workflow to docker file * Re-trigger github actions --------- Co-authored-by: Amal Nanavati <amaln@uw.edu> Co-authored-by: Chris Paxton <cpaxton@hello-robot.com> Co-authored-by: Chris Paxton <165678659+hello-cpaxton@users.noreply.github.com>
Description
This PR adds a generic text-to-speech (TTS) abstract class,
TextToSpeechEngine
, as well as two implementations of that abstract class, one usinggTTS
(preferred) and the other usingpyttsx3
(worse voice quality, but can be used offline). It also adds test cases for each of the engines, using ground-truth saved files. Finally, it adds a command-line interface (CLI) to allow users to easily use text-to-speech (with convenient features like storing history, loading pre-saved utterances, and tab completion).Testing
cd src; pip3 install .
python3 test/audio/test_text_to_speech.py
python3 test/audio/manual_test_text_to_speech.py
. Verify the first utterance in in an American accent and completes, and the second is in a British accent and gets interrupted.python3 -m stretch.app.text_to_speech
. Verify the following:S
, verify the robot stops.--history_file <path/to/new/file>.txt
to the script. Type a few utterances. Quit the CLI by typingQ
. Verify the history was saved in the history file.Checklist
Additional context
This is a copy of
stretchpy
#61, so thatstretch_ai
also has TTS capabilities. Eventually, we should store this code in only one place.