Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Playback issue #271

Open
haginator opened this issue Jan 5, 2025 · 5 comments
Open

Playback issue #271

haginator opened this issue Jan 5, 2025 · 5 comments

Comments

@haginator
Copy link

I'm experiencing a playback issue with my Voice PE. I have tested both firmware version 24.10.15 and the latest manually installed version (Firmware: 25.1.0 (ESPHome 2024.12.2)) from the official site .

The problem arises during audio playback. The playback stops after about 30-40 seconds with a brief hiccup, displaying the error message "The media pipeline's file reader encountered an error." in the ESP console. I have tested multiple sources, including Music Assistant, external MP3 files, and TTS, and all function correctly with other media players on my network. However, with TTS, long responses do not play at all, while shorter ones work perfectly. The Voice PE responds quickly on my end.

Here are two examples of playback failures:

Music Assistant & External MP3 File : Stops after a few seconds.


[17:58:41][D][media_player:080]: 'Media Player' - Setting
[17:58:41][D][media_player:087]:   Media URL: http://192.168.10.8:8097/flow/media_player.home_assistant_voice_092295_media_player/6fac555f15844e36bf8579683e70c09a.flac?ts=1735923521
[17:58:41][D][nabu_media_player.pipeline:173]: Reading FLAC file type
[17:58:43][D][nabu_media_player.pipeline:184]: Decoded audio has 2 channels, 48000 Hz sample rate, and 16 bits per sample
[17:59:27][E][nabu_media_player:297]: The media pipeline's file reader encountered an error.
[18:01:40][D][media_player:080]: 'Media Player' - Setting
[18:01:40][D][media_player:087]:   Media URL: http://192.168.10.8:8123/api/esphome/ffmpeg_proxy/949340ec89ebf409682a7bb60bb6c7f8/ysFKDDARw1a9hJPglhqzYg.flac
[18:01:40][D][nabu_media_player.pipeline:173]: Reading FLAC file type
[18:01:40][D][nabu_media_player.pipeline:184]: Decoded audio has 2 channels, 48000 Hz sample rate, and 16 bits per sample
[18:02:30][E][nabu_media_player:297]: The media pipeline's file reader encountered an error.

TTS (Text-to-Speech) : Long TTS responses do not play fully or at all.

[18:16:15][D][voice_assistant:641]: Event Type: 7
[18:16:15][D][voice_assistant:697]: Response: "Es war einmal in der pulsierenden Stadt Hamburg, wo der junge Olaf Scholz aufwuchs. Schon in seiner Kindheit zeigte er ein großes Interesse an Politik und sozialer Gerechtigkeit. In der Schule engagierte er sich in verschiedenen Projekten und trat der Schülervertretung bei, wo er lernte, die Stimmen seiner Mitschüler zu hören und ihre Anliegen zu vertreten.
[18:16:15]
[18:16:15]Nach dem Abitur entschied sich Olaf, Rechtswissenschaften zu studieren. Seine Leidenschaft für die Po
[18:16:15][D][light:036]: 'voice_assistant_leds' Setting:
[18:16:15][D][light:051]:   Brightness: 66%
[18:16:15][D][light:109]:   Effect: 'Replying'
[18:16:15][D][voice_assistant:641]: Event Type: 8
[18:16:15][D][voice_assistant:719]: Response URL: "http://192.168.10.8:8123/api/tts_proxy/hnWkYqVz89ZlOj8YWJHAyg.flac"
[18:16:15][D][voice_assistant:515]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[18:16:15][D][voice_assistant:522]: Desired state set to STREAMING_RESPONSE
[18:16:15][D][media_player:080]: 'Media Player' - Setting
[18:16:15][D][media_player:087]:   Media URL: http://192.168.10.8:8123/api/tts_proxy/hnWkYqVz89ZlOj8YWJHAyg.flac
[18:16:15][D][media_player:093]:  Announcement: yes
[18:16:15][D][power_supply:033]: Enabling power supply.
[18:16:15][D][voice_assistant:641]: Event Type: 2
[18:16:15][D][voice_assistant:733]: Assist Pipeline ended
[18:16:20][D][esp-idf:000][ann_read]: W (1839751) HTTP_CLIENT: Connection timed out before data was ready!
[18:16:20]
[18:16:20][D][nabu_media_player.pipeline:173]: Reading FLAC file type
[18:16:20][D][voice_assistant:515]: State changed from STREAMING_RESPONSE to IDLE
[18:16:20][D][voice_assistant:522]: Desired state set to IDLE

For reference regarding connection quality, I have observed good Wi-Fi performance (-49 dB) and minimal ping issues:

PING 192.168.20.59 (192.168.20.59): 56 data bytes
64 bytes from 192.168.20.59: icmp_seq=0 ttl=63 time=102.758 ms
64 bytes from 192.168.20.59: icmp_seq=1 ttl=63 time=17.109 ms
64 bytes from 192.168.20.59: icmp_seq=2 ttl=63 time=38.218 ms
64 bytes from 192.168.20.59: icmp_seq=3 ttl=63 time=59.010 ms
64 bytes from 192.168.20.59: icmp_seq=4 ttl=63 time=75.650 ms
64 bytes from 192.168.20.59: icmp_seq=5 ttl=63 time=108.168 ms
64 bytes from 192.168.20.59: icmp_seq=6 ttl=63 time=16.855 ms
64 bytes from 192.168.20.59: icmp_seq=7 ttl=63 time=38.400 ms
64 bytes from 192.168.20.59: icmp_seq=8 ttl=63 time=59.649 ms
64 bytes from 192.168.20.59: icmp_seq=9 ttl=63 time=79.076 ms
64 bytes from 192.168.20.59: icmp_seq=10 ttl=63 time=99.795 ms
64 bytes from 192.168.20.59: icmp_seq=11 ttl=63 time=17.057 ms
64 bytes from 192.168.20.59: icmp_seq=12 ttl=63 time=39.325 ms
64 bytes from 192.168.20.59: icmp_seq=13 ttl=63 time=64.371 ms
64 bytes from 192.168.20.59: icmp_seq=14 ttl=63 time=82.828 ms
64 bytes from 192.168.20.59: icmp_seq=15 ttl=63 time=106.872 ms
64 bytes from 192.168.20.59: icmp_seq=16 ttl=63 time=24.074 ms
64 bytes from 192.168.20.59: icmp_seq=17 ttl=63 time=41.324 ms
--- 192.168.20.59 ping statistics ---
18 packets transmitted, 18 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 16.855/59.474/108.168/31.294 ms

Thank you for your assistance!

Originally posted by @haginator in #255 (comment)

@mph070770
Copy link

Yep. I get the same issue. Mine is when playing a BBC radio stream, described below.

This issue happens when I play a BBC radio stream from Music Assistant to Voice PE.

If I play the same stream to a Google speaker, the problem doesn't happen.

If I play the same URL from RadioBrowser within Home Assistant, the stream will play on the Voice PE H/W.

It seems that this problem is a combination of using PE hardware, with MA radio stations, and the BBC streams.

The BBC stream is created in MA using the add custom station option:

Image

When the stream is played, this is the error on the PE hardware:

[21:00:05][D][media_player:080]: 'Media Player' - Setting
[21:00:05][D][media_player:087]:   Media URL: http://192.168.1.137:8097/flow/media_player.music_room_speaker_media_player/12a7511d3a0040c3849aa6550def475c.mp3?ts=1737493204
[21:00:05][D][ring_buffer:034]: Created ring buffer with size 48000
[21:00:05][D][ring_buffer:034]: Created ring buffer with size 48000
[21:00:05][D][ring_buffer:034]: Created ring buffer with size 65536
[21:00:05][D][ring_buffer:034]: Created ring buffer with size 65536
[21:00:05][D][nabu_media_player.pipeline:173]: Reading MP3 file type
[21:00:08][D][nabu_media_player.pipeline:184]: Decoded audio has 2 channels, 48000 Hz sample rate, and 16 bits per sample
[21:00:08][D][ring_buffer:034][speaker_task]: Created ring buffer with size 19200
[21:00:08][D][i2s_audio.speaker:111]: Starting Speaker
[21:00:08][D][i2s_audio.speaker:116]: Started Speaker
[21:00:20][E][nabu_media_player:299]: The media pipeline's audio decoder encountered an error.
[21:00:48][D][media_player:080]: 'Media Player' - Setting
[21:00:48][D][media_player:087]:   Media URL: http://192.168.1.137:8097/flow/media_player.music_room_speaker_media_player/f946a588cc4c4276bdab8856699db1fa.mp3?ts=1737493247
[21:00:48][D][nabu_media_player.pipeline:173]: Reading MP3 file type
[21:00:50][D][nabu_media_player.pipeline:184]: Decoded audio has 2 channels, 48000 Hz sample rate, and 16 bits per sample
[21:01:07][E][nabu_media_player:299]: The media pipeline's audio decoder encountered an error.

@eslavnov
Copy link

I also get "HTTP_CLIENT: Connection timed out before data was ready!" when trying to announce a long response with Google Cloud TTS. It seems there is a hard-coded 5-second timeout somewhere in HAVPE and it fails if the TTS takes longer than that to return a response:

[20:49:31][D][media_player:093]:  Announcement: yes
[20:49:31][D][voice_assistant:641]: Event Type: 2
[20:49:31][D][voice_assistant:733]: Assist Pipeline ended
[20:49:36][D][esp-idf:000][ann_read]: W (552662) HTTP_CLIENT: Connection timed out before data was ready!

I looked in the HASS debug, and if I take the output text and run it through the same TTS via home assistant, then it works perfectly fine (but it takes longer than 5 seconds). So it seems there is some kind of a timeout in the HAVPE config, but I had no luck finding where it might be looking at the code :( I've tried changing the default timeout in http_client but also had no luck.

@eslavnov
Copy link

I've investigated it more and traced the timeout to this line: client_config.timeout_ms = 5000;

I've increased it and now HAVPE properly waits for TTS to finish processing long responses. Would be really nice to have this as a configurable option someday!

@eslavnov
Copy link

eslavnov commented Jan 27, 2025

Sorry for the triple-posting, I'll shut up after this one :)

I wonder what was the reason behind the 5s timeout - was it a random value or was there any specific reason to land on such a low value? The problem here (besides not being able to generate longer responses) is that HASS TTS integrations have different timeouts - for example, it's 10s for Google Cloud TTS. Let's say you use it and it takes 6 seconds to generate the audio:

  1. You can generate the audio fine inside home assistant (debug interface, actions, etc.).
  2. But if you try to have a convo with HAVPE using the same text, it would fail.

So you end up with the same component (your TTS engine) behaving differently with the same input depending on if HAVPE is in the mix or not. To make things even worse, this problem will occur seemingly at random, as shorter responses will play fine. This is not a great user experience, since it creates a (wrong) impression that HAVPE is unstable. It's also hard to debug because of this implicit timeout, so it's difficult to understand what's going on, especially given the error which is too cryptic/generic for an average user.. Here is a good example of a person likely hitting the same issue and being confused about the reasons: https://community.home-assistant.io/t/http-timeout-for-voice-assistant-pe-even-though-the-response-is-recieved/834200.

At the very least, I would propose increasing the timeout. But I think the best solution is making it a configurable parameter: power users could tweak it to fit their use cases, and it will also make it explicit, reducing the number of confused users thinking there is something wrong with their HAVPEs. Perhaps better error handling with a more actionable error could be a nice cherry on top (something like "Your TTS request timed out after X seconds! If this happens on longer responses, try increasing the timeout in the configuration!").

@eslavnov
Copy link

I lied - I won't shut up, I guess :)

Here is an alternative workaround for streaming LLMs responses directly into TTS: https://github.com/eslavnov/llm-stream-tts.
With this approach I get back responses in under 3 seconds, even for large outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants