Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rai_tts&rai_hmi): stable elevenlabs voice, reintroduced chunking #247

Merged
merged 4 commits into from
Sep 27, 2024

Conversation

maciejmajek
Copy link
Member

@maciejmajek maciejmajek commented Sep 26, 2024

Purpose

During previous testing, a noticeable issue was observed where the ElevenLabs TTS system would change voice modulation unexpectedly, often resulting in unintended and sometimes humorous outcomes.

Additionally, the hmi refactoring #143 removed chunking feature, which increased latency.

Proposed Changes

A VoiceSettings feature has been added, which includes a stability parameter to address this issue.
Chunking has been reintroduced

Issues

  • Links to relevant issues

Testing

ros2 launch rai_bringup hri.launch.py keep_speaker_busy:=false recording_device:=3 silence_grace_period:=0.5 asr_vendor:=whisper

The voice is much stabler now, the latency has been reduced.

@maciejmajek maciejmajek changed the base branch from development to fix/hri September 26, 2024 17:58
Copy link
Member

@boczekbartek boczekbartek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maciejmajek
I tried new TTS and it doesn't work correctly for me. For longer jobs it drops some of the response.

For this response:

[voice_hmi_node-4] [INFO] [1727426018.660800626] [voice_hmi_node]: Running thinker
[voice_hmi_node-4] [INFO] [1727426022.771298198] [voice_hmi_node]: Sending message to human: "Certainly! Here's another story:
[voice_hmi_node-4]
[voice_hmi_node-4] In a quiet village nestled between tall mountains and lush forests, lived an elderly inventor named Mr. Whitaker. He spent his days tinkering away in his workshop, creating marvelous gadgets and inventions. One day, he decided to build a companion to help him with his work – a small, cheerful robot named Tinker.
[voice_hmi_node-4]
[voice_hmi_node-4] Tinker was designed to assist Mr. Whitaker with his inventions, but he soon discovered a passion for gardening. Every morning, Tinker would water the plants, tend to the flowers, and even talk to the trees. The garden flourished under Tinker’s care, becoming the most beautiful spot in the village.
[voice_hmi_node-4]
[voice_hmi_node-4] The villagers noticed the transformation and began visiting Mr. Whitaker's garden, inspired by its beauty and tranquility. Tinker’s efforts brought the community together, and people started sharing stories, laughter, and joy in the garden.
[voice_hmi_node-4]
[voice_hmi_node-4] One day, a terrible storm hit the village, threatening to destroy the beloved garden. Tinker, with his unyielding determination, worked tirelessly to protect the plants. The villagers, moved by Tinker’s dedication, joined forces to save the garden. Together, they braced the storm and preserved the garden's beauty.
[voice_hmi_node-4]
[voice_hmi_node-4] From that day on, the garden became a symbol of unity and resilience. Tinker, the little robot with a big heart, had not only nurtured the plants but also cultivated a thriving community.
[voice_hmi_node-4]
[voice_hmi_node-4] The end.
[voice_hmi_node-4]
[voice_hmi_node-4] I hope you enjoyed this story too! Let me know if there’s anything else you’d like to hear or do."
[voice_hmi_node-4] [INFO] [1727426022.773925168] [voice_hmi_node]: Processing finished
[tts_node-1] [INFO] [1727426022.774629928] [tts_node]: Registering new TTS job: 17 length: 117 chars.
[tts_node-1] [INFO] [1727426022.778490986] [tts_node]: Registering new TTS job: 18 length: 86 chars.
[tts_node-1] [INFO] [1727426022.782179091] [tts_node]: Registering new TTS job: 19 length: 83 chars.

TTS started playing from this sentence:

Tinker was designed to assist Mr. Whitaker

See the full log below:

Log
(rai-py3.12) robo-pc-005 ➜  12_rai git:(feat/stable-tts) ✗ ros2 launch rai_bringup hri.launch.py keep_speaker_busy:=false recording_device:=7 silence_grace_period:=0.5 asr_vendor:=whisper # local solution, use openai for cloud based
[INFO] [launch]: All log files can be found below /home/bboczek/.ros/log/2024-09-27-10-32-09-636369-robo-pc-005-230389
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [asr_node-2]: process started with pid [230393]
[INFO] [tts_node-1]: process started with pid [230392]
[INFO] [rai_whoami_node-3]: process started with pid [230394]
[INFO] [voice_hmi_node-4]: process started with pid [230395]
[rai_whoami_node-3] [INFO] [1727425930.513173083] [rai_whoami_node]: Robot constitution loaded from /home/bboczek/projects/01_internal/01_repos/12_rai/install/rosbot_xl_whoami/share/rosbot_xl_whoami/description/robot_constitution.txt
[rai_whoami_node-3] [INFO] [1727425930.694757622] [rai_whoami_node]: Incoming request for RAI constitution, responding
[rai_whoami_node-3] [INFO] [1727425930.695989837] [rai_whoami_node]: Incoming request for RAI identity, responding
[voice_hmi_node-4] [INFO] [1727425930.733376083] [voice_hmi_node]: System prompt initialized!
[voice_hmi_node-4] [INFO] [1727425930.768376757] [voice_hmi_node]: FAISS index for rosbot_xl_whoami loaded!
[voice_hmi_node-4] [INFO] [1727425930.780563858] [voice_hmi_node]: HMI Node has been started
[voice_hmi_node-4] [INFO] [1727425930.781542825] [voice_hmi_node]: Voice HMI node initialized
[voice_hmi_node-4] [WARN] [1727425930.781645220] [rcl.logging_rosout]: Publisher already registered for node name: 'voice_hmi_node'. If this is due to multiple nodes with the same name then all logs for the logger named 'voice_hmi_node' will go out over the existing publisher. As soon as any node with that name is destructed it will unregister the publisher, preventing any further logs for that name from being published on the rosout topic.
[voice_hmi_node-4] [INFO] [1727425930.815315301] [voice_hmi_node]: Creating state based agent
[voice_hmi_node-4] [INFO] [1727425930.820678019] [voice_hmi_node]: State based agent created
[tts_node-1] [INFO] [1727425930.852350799] [tts_node]: TTS Node has been started
[asr_node-2] [INFO] [1727425931.341199489] [rai_asr]: Parameters have been initialized
[asr_node-2] Using cache found in /home/bboczek/.cache/torch/hub/snakers4_silero-vad_master
[asr_node-2] [INFO] [1727425932.451290160] [rai_asr]: ASR Node has been initialized
[asr_node-2] [INFO] [1727425936.055725241] [rai_asr]: Recording...
[asr_node-2] [INFO] [1727425937.015778952] [rai_asr]: Stopped recording. Transcribing...
[asr_node-2] [INFO] [1727425937.324824282] [rai_asr]: Transcription:  Hello
[asr_node-2] [INFO] [1727425937.325782883] [rai_asr]: Done transcribing.
[voice_hmi_node-4] [INFO] [1727425937.327503470] [voice_hmi_node]: Processing started
[voice_hmi_node-4] [INFO] [1727425937.328422685] [voice_hmi_node]: Running thinker
[asr_node-2] [WARN] [1727425937.397107313] [rai_asr]: Stream status: input overflow
[voice_hmi_node-4] [INFO] [1727425938.060352912] [voice_hmi_node]: Sending message to human: "Hello! How can I assist you today?"
[voice_hmi_node-4] [INFO] [1727425938.060789827] [voice_hmi_node]: Processing finished
[tts_node-1] [INFO] [1727425938.061123492] [tts_node]: Registering new TTS job: 0 length: 5 chars.
[tts_node-1] [INFO] [1727425938.063319667] [tts_node]: Registering new TTS job: 1 length: 28 chars.
[tts_node-1] [INFO] [1727425939.125427900] [tts_node]: Job 0 completed.
[tts_node-1] [INFO] [1727425939.130303450] [tts_node]: Playing audio for job 0. /tmp/tmpwrm7ncxi.mp3
[tts_node-1] [INFO] [1727425939.633407053] [tts_node]: Job 1 completed.
[tts_node-1] [INFO] [1727425940.037177846] [tts_node]: Playing audio for job 1. /tmp/tmp1rqy4ui_.mp3
[asr_node-2] [INFO] [1727425942.758631319] [rai_asr]: Recording...
[asr_node-2] [INFO] [1727425944.117641002] [rai_asr]: Stopped recording. Transcribing...
[asr_node-2] [INFO] [1727425944.178553861] [rai_asr]: Transcription:  That'll be a story.
[asr_node-2] [INFO] [1727425944.178927429] [rai_asr]: Done transcribing.
[voice_hmi_node-4] [INFO] [1727425944.181435466] [voice_hmi_node]: Processing started
[voice_hmi_node-4] [INFO] [1727425944.182107273] [voice_hmi_node]: Running thinker
[voice_hmi_node-4] [INFO] [1727425944.899192772] [voice_hmi_node]: Sending message to human: "I’d love to hear your story! What would you like to share with me today?"
[voice_hmi_node-4] [INFO] [1727425944.899826840] [voice_hmi_node]: Processing finished
[tts_node-1] [INFO] [1727425944.899924234] [tts_node]: Registering new TTS job: 2 length: 27 chars.
[tts_node-1] [INFO] [1727425944.900884772] [tts_node]: Registering new TTS job: 3 length: 44 chars.
[tts_node-1] [INFO] [1727425946.478113720] [tts_node]: Job 2 completed.
[tts_node-1] [INFO] [1727425946.482699199] [tts_node]: Playing audio for job 2. /tmp/tmpr9lc7666.mp3
[tts_node-1] [INFO] [1727425946.567900008] [tts_node]: Job 3 completed.
[tts_node-1] [INFO] [1727425948.197941359] [tts_node]: Playing audio for job 3. /tmp/tmpuhpv2rzi.mp3
[asr_node-2] [INFO] [1727425950.998851033] [rai_asr]: Recording...
[asr_node-2] [INFO] [1727425951.878815312] [rai_asr]: Stopped recording. Transcribing...
[asr_node-2] [INFO] [1727425951.919124268] [rai_asr]: Transcription:  Yay!
[asr_node-2] [INFO] [1727425951.919524964] [rai_asr]: Done transcribing.
[voice_hmi_node-4] [INFO] [1727425951.922035025] [voice_hmi_node]: Processing started
[voice_hmi_node-4] [INFO] [1727425951.922642070] [voice_hmi_node]: Running thinker
[voice_hmi_node-4] [INFO] [1727425952.615906941] [voice_hmi_node]: Sending message to human: "I'm glad you're excited! What would you like to talk about or do today?"
[tts_node-1] [INFO] [1727425952.616504395] [tts_node]: Registering new TTS job: 4 length: 23 chars.
[voice_hmi_node-4] [INFO] [1727425952.616546416] [voice_hmi_node]: Processing finished
[tts_node-1] [INFO] [1727425952.617500482] [tts_node]: Registering new TTS job: 5 length: 47 chars.
[tts_node-1] [INFO] [1727425954.107865299] [tts_node]: Job 4 completed.
[tts_node-1] [INFO] [1727425954.113900080] [tts_node]: Playing audio for job 4. /tmp/tmpvzgi9qpv.mp3
[tts_node-1] [INFO] [1727425954.292731034] [tts_node]: Job 5 completed.
[tts_node-1] [INFO] [1727425955.580562632] [tts_node]: Playing audio for job 5. /tmp/tmpqpl6ai1c.mp3
[asr_node-2] [INFO] [1727425961.319170170] [rai_asr]: Recording...
[asr_node-2] [INFO] [1727425962.918891101] [rai_asr]: Stopped recording. Transcribing...
[asr_node-2] [INFO] [1727425962.970142497] [rai_asr]: Transcription:  Please tell me a story.
[asr_node-2] [INFO] [1727425962.970474476] [rai_asr]: Done transcribing.
[voice_hmi_node-4] [INFO] [1727425962.972033932] [voice_hmi_node]: Processing started
[voice_hmi_node-4] [INFO] [1727425962.972661295] [voice_hmi_node]: Running thinker
[voice_hmi_node-4] [INFO] [1727425966.048740309] [voice_hmi_node]: Sending message to human: "Sure! Here's a short story:
[voice_hmi_node-4]
[voice_hmi_node-4] Once upon a time in a bustling city, there was a small, curious robot named Robbie. Robbie was designed to help people with their everyday tasks, but he had a special interest in learning about the world around him. Every day, he would wander the city, helping those in need and collecting interesting facts and stories.
[voice_hmi_node-4]
[voice_hmi_node-4] One day, Robbie met a young girl named Lily who was fascinated by the stars. Lily dreamed of becoming an astronomer, but she didn’t have a telescope to explore the night sky. Robbie decided to help her. He scoured the city, gathering parts and pieces to build a makeshift telescope. With the help of a friendly group of engineers, Robbie managed to create a telescope for Lily.
[voice_hmi_node-4]
[voice_hmi_node-4] When Robbie presented the telescope to Lily, her face lit up with joy. Together, they spent countless nights gazing at the stars, learning about constellations, and dreaming of distant galaxies. Robbie's curiosity and kindness had not only helped Lily but also forged a lifelong friendship.
[voice_hmi_node-4]
[voice_hmi_node-4] And so, in the heart of the city, under the vast night sky, Robbie and Lily discovered the wonders of the universe together.
[voice_hmi_node-4]
[voice_hmi_node-4] The end.
[voice_hmi_node-4]
[voice_hmi_node-4] I hope you enjoyed the story! Would you like to hear another one, or is there something else you'd like to do?"
[tts_node-1] [INFO] [1727425966.049503586] [tts_node]: Registering new TTS job: 6 length: 4 chars.
[voice_hmi_node-4] [INFO] [1727425966.049862995] [voice_hmi_node]: Processing finished
[tts_node-1] [INFO] [1727425966.050241584] [tts_node]: Registering new TTS job: 7 length: 27 chars.
[tts_node-1] [INFO] [1727425966.050913564] [tts_node]: Registering new TTS job: 8 length: 79 chars.
[tts_node-1] [INFO] [1727425966.051492254] [tts_node]: Registering new TTS job: 9 length: 94 chars.
[tts_node-1] [INFO] [1727425966.053186727] [tts_node]: Registering new TTS job: 10 length: 71 chars.
[tts_node-1] [INFO] [1727425966.055968151] [tts_node]: Registering new TTS job: 11 length: 123 chars.
[tts_node-1] [INFO] [1727425966.123662233] [tts_node]: Registering new TTS job: 12 length: 95 chars.
[tts_node-1] [INFO] [1727425966.431121791] [tts_node]: Registering new TTS job: 13 length: 125 chars.
[tts_node-1] [INFO] [1727425966.824663062] [tts_node]: Registering new TTS job: 14 length: 9 chars.
[tts_node-1] [INFO] [1727425967.120278043] [tts_node]: Registering new TTS job: 15 length: 29 chars.
[tts_node-1] [INFO] [1727425968.138574278] [tts_node]: Registering new TTS job: 16 length: 81 chars.
[tts_node-1] [INFO] [1727425968.964567575] [tts_node]: Job 6 completed.
[tts_node-1] [INFO] [1727425968.988857134] [tts_node]: Job 7 completed.
[tts_node-1] [INFO] [1727425968.994770239] [tts_node]: Playing audio for job 6. /tmp/tmp5bvgi6eq.mp3
[tts_node-1] [INFO] [1727425969.007218909] [tts_node]: Job 10 completed.
[tts_node-1] [INFO] [1727425969.008339182] [tts_node]: Job 9 completed.
[tts_node-1] [INFO] [1727425969.009100551] [tts_node]: Job 8 completed.
[tts_node-1] [INFO] [1727425969.945609655] [tts_node]: Playing audio for job 7. /tmp/tmpu7mqw7tb.mp3
[tts_node-1] [INFO] [1727425970.755918347] [tts_node]: Job 14 completed.
[tts_node-1] [INFO] [1727425970.756379338] [tts_node]: Job 15 completed.
[tts_node-1] [INFO] [1727425971.544275063] [tts_node]: Job 12 completed.
[tts_node-1] [INFO] [1727425971.619704539] [tts_node]: Job 11 completed.
[tts_node-1] [INFO] [1727425971.686365374] [tts_node]: Playing audio for job 8. /tmp/tmp08ibw08r.mp3
[tts_node-1] [INFO] [1727425971.713158491] [tts_node]: Job 13 completed.
[tts_node-1] [INFO] [1727425972.798796253] [tts_node]: Job 16 completed.
[tts_node-1] [INFO] [1727425976.519274398] [tts_node]: Playing audio for job 9. /tmp/tmpsyuqtdss.mp3
[tts_node-1] [INFO] [1727425981.831607738] [tts_node]: Playing audio for job 10. /tmp/tmpyppcqwh0.mp3
[tts_node-1] [INFO] [1727425986.355063966] [tts_node]: Playing audio for job 11. /tmp/tmpej0k7g2v.mp3
[tts_node-1] [INFO] [1727425993.756435147] [tts_node]: Playing audio for job 12. /tmp/tmp8g0ul5e4.mp3
[tts_node-1] [INFO] [1727425999.645114005] [tts_node]: Playing audio for job 13. /tmp/tmp3ggfo9u6.mp3
[tts_node-1] [INFO] [1727426007.111526283] [tts_node]: Playing audio for job 14. /tmp/tmprlclu2o5.mp3
[tts_node-1] [INFO] [1727426007.964904167] [tts_node]: Playing audio for job 15. /tmp/tmpwjunbk3k.mp3
[tts_node-1] [INFO] [1727426009.663426380] [tts_node]: Playing audio for job 16. /tmp/tmpjjx0pkaq.mp3
[asr_node-2] [INFO] [1727426016.919719316] [rai_asr]: Recording...
[asr_node-2] [INFO] [1727426018.599715074] [rai_asr]: Stopped recording. Transcribing...
[asr_node-2] [INFO] [1727426018.658618086] [rai_asr]: Transcription:  Please tell me another one.
[asr_node-2] [INFO] [1727426018.658986291] [rai_asr]: Done transcribing.
[voice_hmi_node-4] [INFO] [1727426018.660145087] [voice_hmi_node]: Processing started
[voice_hmi_node-4] [INFO] [1727426018.660800626] [voice_hmi_node]: Running thinker
[voice_hmi_node-4] [INFO] [1727426022.771298198] [voice_hmi_node]: Sending message to human: "Certainly! Here's another story:
[voice_hmi_node-4]
[voice_hmi_node-4] In a quiet village nestled between tall mountains and lush forests, lived an elderly inventor named Mr. Whitaker. He spent his days tinkering away in his workshop, creating marvelous gadgets and inventions. One day, he decided to build a companion to help him with his work – a small, cheerful robot named Tinker.
[voice_hmi_node-4]
[voice_hmi_node-4] Tinker was designed to assist Mr. Whitaker with his inventions, but he soon discovered a passion for gardening. Every morning, Tinker would water the plants, tend to the flowers, and even talk to the trees. The garden flourished under Tinker’s care, becoming the most beautiful spot in the village.
[voice_hmi_node-4]
[voice_hmi_node-4] The villagers noticed the transformation and began visiting Mr. Whitaker's garden, inspired by its beauty and tranquility. Tinker’s efforts brought the community together, and people started sharing stories, laughter, and joy in the garden.
[voice_hmi_node-4]
[voice_hmi_node-4] One day, a terrible storm hit the village, threatening to destroy the beloved garden. Tinker, with his unyielding determination, worked tirelessly to protect the plants. The villagers, moved by Tinker’s dedication, joined forces to save the garden. Together, they braced the storm and preserved the garden's beauty.
[voice_hmi_node-4]
[voice_hmi_node-4] From that day on, the garden became a symbol of unity and resilience. Tinker, the little robot with a big heart, had not only nurtured the plants but also cultivated a thriving community.
[voice_hmi_node-4]
[voice_hmi_node-4] The end.
[voice_hmi_node-4]
[voice_hmi_node-4] I hope you enjoyed this story too! Let me know if there’s anything else you’d like to hear or do."
[voice_hmi_node-4] [INFO] [1727426022.773925168] [voice_hmi_node]: Processing finished
[tts_node-1] [INFO] [1727426022.774629928] [tts_node]: Registering new TTS job: 17 length: 117 chars.
[tts_node-1] [INFO] [1727426022.778490986] [tts_node]: Registering new TTS job: 18 length: 86 chars.
[tts_node-1] [INFO] [1727426022.782179091] [tts_node]: Registering new TTS job: 19 length: 83 chars.
[tts_node-1] [INFO] [1727426022.785406571] [tts_node]: Registering new TTS job: 20 length: 78 chars.
[tts_node-1] [INFO] [1727426022.788414649] [tts_node]: Registering new TTS job: 21 length: 66 chars.
[tts_node-1] [INFO] [1727426022.791189304] [tts_node]: Registering new TTS job: 22 length: 70 chars.
[tts_node-1] [INFO] [1727426022.955545088] [tts_node]: Registering new TTS job: 23 length: 117 chars.
[tts_node-1] [INFO] [1727426023.201289616] [tts_node]: Registering new TTS job: 24 length: 9 chars.
[tts_node-1] [INFO] [1727426023.833838988] [tts_node]: Registering new TTS job: 25 length: 34 chars.
[tts_node-1] [INFO] [1727426024.812563886] [tts_node]: Registering new TTS job: 26 length: 63 chars.
[tts_node-1] [INFO] [1727426028.136749723] [tts_node]: Job 21 completed.
[tts_node-1] [INFO] [1727426028.145750917] [tts_node]: Job 20 completed.
[tts_node-1] [INFO] [1727426028.147781782] [tts_node]: Job 19 completed.
[tts_node-1] [INFO] [1727426028.147854209] [tts_node]: Job 18 completed.
[tts_node-1] [INFO] [1727426028.148098257] [tts_node]: Job 17 completed.
[tts_node-1] [INFO] [1727426028.153436347] [tts_node]: Playing audio for job 17. /tmp/tmpzmvgopck.mp3
[tts_node-1] [INFO] [1727426029.563381114] [tts_node]: Job 24 completed.
[tts_node-1] [INFO] [1727426029.923299877] [tts_node]: Job 26 completed.
[tts_node-1] [INFO] [1727426030.052775292] [tts_node]: Job 22 completed.
[tts_node-1] [INFO] [1727426030.144457633] [tts_node]: Job 25 completed.
[tts_node-1] [INFO] [1727426030.848227814] [tts_node]: Job 23 completed.

@@ -62,7 +73,7 @@ def synthesize_speech_to_file(self, text: str) -> str:
audio_data = b"".join(response)
return self.save_audio_to_file(audio_data, suffix=".mp3")
except Exception as e:
logger.warn(f"Error occurred during sythesizing speech: {e}.") # type: ignore
logger.warn(f"Error occurred during synthesizing speech: {e}.") # type: ignore
tries += 1
audio_data = b"".join(response)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If TTS will not respond withing TTS_TRIES this variable is unbound.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 829b91f

Base automatically changed from fix/hri to development September 27, 2024 09:17
…gmentation for improved message processing

fix(voice_hmi.py): replace direct message publishing with split_and_publish to ensure messages are sent as individual sentences (reintroduced chunking)
…ngs and validate voice existence during initialization

fix(tts_clients.py): correct typo in logging message for synthesizing speech error handling
… synthesis reliability

feat(tts_clients.py): add TTS_RETRY_DELAY to introduce a delay between retries for better handling of transient errors
@maciejmajek
Copy link
Member Author

@boczekbartek thank you for finding the bug. I was able to replicate this behavior. As it turns out, the problem was the queue length of the subscribers/publishers (10). If there were more than 10 chunks (sentences) in a single response, some of them would get dropped.
I've fixed this by using reliable & keep_all qos setting

Copy link
Member

@boczekbartek boczekbartek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maciejmajek I tested again. It works like a charm now! Approved

@maciejmajek maciejmajek merged commit 7bee1f0 into development Sep 27, 2024
4 checks passed
@maciejmajek maciejmajek deleted the feat/stable-tts branch September 27, 2024 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants