Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some devices slow to play audio #257

Open
HenryDParker opened this issue Dec 31, 2024 · 10 comments
Open

Some devices slow to play audio #257

HenryDParker opened this issue Dec 31, 2024 · 10 comments

Comments

@HenryDParker
Copy link

Mirrored from Home Assistant forum here

One of my two devices is slow to play audio. I initially thought this was just a slow response from the assistant or the STT/TTS processing, but looking at the timings, these are relatively fast. It processes everything within 3 seconds, but takes a good 20+ seconds to actually play the response.

Voice Processing:
Debug Flow

I noticed when looking at the actual device on HA that is sits in the “Responding” phase for a long time (that 20s gap), before then moving to “Playing”.

Here are relevant logs for two devices, where you can see the massive delay in one compared to the other even when asking a local question such as “What is the time?".

Device 1 - Normal Response Speed

[00:29:40][D][micro_wake_word:417]: State changed from IDLE to DETECTING_WAKE_WORD
[00:29:50][D][power_supply:048]: Disabling power supply.
[00:29:59][D][micro_wake_word:355]: Detected 'Okay Nabu' with sliding average probability is 0.98 and max probability is 1.00
[00:29:59][D][media_player:080]: 'Media Player' - Setting
[00:29:59][D][media_player:084]:   Command: STOP
[00:29:59][D][media_player:093]:  Announcement: yes
[00:29:59][D][media_player:080]: 'Media Player' - Setting
[00:29:59][D][media_player:093]:  Announcement: yes
[00:29:59][D][ring_buffer:034]: Created ring buffer with size 48000
[00:29:59][D][ring_buffer:034]: Created ring buffer with size 48000
[00:29:59][D][ring_buffer:034]: Created ring buffer with size 65536
[00:29:59][D][ring_buffer:034]: Created ring buffer with size 65536
[00:29:59][D][nabu_media_player.pipeline:173]: Reading FLAC file type
[00:29:59][D][nabu_media_player.pipeline:184]: Decoded audio has 1 channels, 48000 Hz sample rate, and 16 bits per sample
[00:29:59][D][nabu_media_player.pipeline:211]: Converting mono channel audio to stereo channel audio
[00:29:59][D][ring_buffer:034][speaker_task]: Created ring buffer with size 19200
[00:29:59][D][i2s_audio.speaker:111]: Starting Speaker
[00:29:59][D][i2s_audio.speaker:116]: Started Speaker
[00:29:59][D][voice_assistant:515]: State changed from IDLE to START_MICROPHONE
[00:29:59][D][voice_assistant:522]: Desired state set to START_PIPELINE
[00:29:59][D][voice_assistant:225]: Starting Microphone
[00:29:59][D][ring_buffer:034]: Created ring buffer with size 16384
[00:29:59][D][voice_assistant:515]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[00:29:59][D][voice_assistant:515]: State changed from STARTING_MICROPHONE to START_PIPELINE
[00:29:59][D][voice_assistant:280]: Requesting start...
[00:29:59][D][voice_assistant:515]: State changed from START_PIPELINE to STARTING_PIPELINE
[00:29:59][D][voice_assistant:537]: Client started, streaming microphone
[00:29:59][D][voice_assistant:515]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[00:29:59][D][voice_assistant:522]: Desired state set to STREAMING_MICROPHONE
[00:29:59][D][voice_assistant:641]: Event Type: 1
[00:29:59][D][voice_assistant:644]: Assist Pipeline running
[00:29:59][D][voice_assistant:641]: Event Type: 3
[00:29:59][D][voice_assistant:655]: STT started
[00:29:59][D][light:036]: 'voice_assistant_leds' Setting:
[00:29:59][D][light:047]:   State: ON
[00:29:59][D][light:051]:   Brightness: 66%
[00:29:59][D][light:109]:   Effect: 'Waiting for Command'
[00:29:59][D][power_supply:033]: Enabling power supply.
[00:30:00][D][esp32.preferences:114]: Saving 4 preferences to flash...
[00:30:00][D][esp32.preferences:142]: Saving 4 preferences to flash: 3 cached, 1 written, 0 failed
[00:30:01][D][voice_assistant:641]: Event Type: 11
[00:30:01][D][voice_assistant:804]: Starting STT by VAD
[00:30:01][D][light:036]: 'voice_assistant_leds' Setting:
[00:30:01][D][light:051]:   Brightness: 66%
[00:30:01][D][light:109]:   Effect: 'Listening For Command'
[00:30:03][D][voice_assistant:641]: Event Type: 12
[00:30:03][D][voice_assistant:808]: STT by VAD end
[00:30:03][D][voice_assistant:515]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[00:30:03][D][voice_assistant:522]: Desired state set to AWAITING_RESPONSE
[00:30:03][D][voice_assistant:515]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[00:30:03][D][light:036]: 'voice_assistant_leds' Setting:
[00:30:03][D][light:051]:   Brightness: 66%
[00:30:03][D][light:109]:   Effect: 'Thinking'
[00:30:03][D][voice_assistant:515]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[00:30:03][D][voice_assistant:515]: State changed from AWAITING_RESPONSE to AWAITING_RESPONSE
[00:30:03][D][power_supply:033]: Enabling power supply.
[00:30:04][D][power_supply:033]: Enabling power supply.
[00:30:04][D][power_supply:033]: Enabling power supply.
[00:30:04][D][power_supply:033]: Enabling power supply.
[00:30:05][D][power_supply:033]: Enabling power supply.
[00:30:05][D][power_supply:033]: Enabling power supply.
[00:30:06][D][power_supply:033]: Enabling power supply.
[00:30:06][D][power_supply:033]: Enabling power supply.
[00:30:06][D][power_supply:033]: Enabling power supply.
[00:30:06][D][voice_assistant:641]: Event Type: 4
[00:30:06][D][voice_assistant:669]: Speech recognised as: " What's the time?"
[00:30:06][D][voice_assistant:641]: Event Type: 5
[00:30:06][D][voice_assistant:674]: Intent started
[00:30:06][D][voice_assistant:641]: Event Type: 6
[00:30:06][D][voice_assistant:641]: Event Type: 7
[00:30:06][D][voice_assistant:697]: Response: "0:30 AM"
[00:30:06][D][light:036]: 'voice_assistant_leds' Setting:
[00:30:06][D][light:051]:   Brightness: 66%
[00:30:06][D][light:109]:   Effect: 'Replying'
[00:30:06][D][voice_assistant:641]: Event Type: 8
[00:30:06][D][voice_assistant:719]: Response URL: "http://192.168.100.5:9123/api/tts_proxy/0iJDBoepgauiQul-S_SvFQ.flac"
[00:30:06][D][voice_assistant:515]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[00:30:06][D][voice_assistant:522]: Desired state set to STREAMING_RESPONSE
[00:30:06][D][media_player:080]: 'Media Player' - Setting
[00:30:06][D][media_player:087]:   Media URL: http://192.168.100.5:9123/api/tts_proxy/0iJDBoepgauiQul-S_SvFQ.flac
[00:30:06][D][media_player:093]:  Announcement: yes
[00:30:06][D][voice_assistant:641]: Event Type: 2
[00:30:06][D][voice_assistant:733]: Assist Pipeline ended
[00:30:07][D][nabu_media_player.pipeline:173]: Reading FLAC file type
[00:30:07][D][nabu_media_player.pipeline:184]: Decoded audio has 1 channels, 48000 Hz sample rate, and 16 bits per sample
[00:30:07][D][nabu_media_player.pipeline:211]: Converting mono channel audio to stereo channel audio
[00:30:08][D][voice_assistant:515]: State changed from STREAMING_RESPONSE to IDLE
[00:30:08][D][voice_assistant:522]: Desired state set to IDLE
[00:30:08][D][light:036]: 'voice_assistant_leds' Setting:
[00:30:08][D][light:047]:   State: OFF
[00:30:08][D][light:109]:   Effect: 'None'
[00:30:16][I][safe_mode:041]: Boot seems successful; resetting boot loop counter
[00:30:16][D][esp32.preferences:114]: Saving 1 preferences to flash...
[00:30:17][D][esp32.preferences:142]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
[00:30:18][D][power_supply:048]: Disabling power supply.

Device 2 - Slow Response Speed

[00:32:14][D][micro_wake_word:417]: State changed from IDLE to DETECTING_WAKE_WORD
[00:32:19][D][micro_wake_word:355]: Detected 'Okay Nabu' with sliding average probability is 0.98 and max probability is 1.00
[00:32:19][D][media_player:080]: 'Media Player' - Setting
[00:32:19][D][media_player:084]:   Command: STOP
[00:32:19][D][media_player:093]:  Announcement: yes
[00:32:19][D][media_player:080]: 'Media Player' - Setting
[00:32:19][D][media_player:093]:  Announcement: yes
[00:32:19][D][ring_buffer:034]: Created ring buffer with size 48000
[00:32:19][D][ring_buffer:034]: Created ring buffer with size 48000
[00:32:19][D][ring_buffer:034]: Created ring buffer with size 65536
[00:32:19][D][ring_buffer:034]: Created ring buffer with size 65536
[00:32:19][D][nabu_media_player.pipeline:173]: Reading FLAC file type
[00:32:19][D][nabu_media_player.pipeline:184]: Decoded audio has 1 channels, 48000 Hz sample rate, and 16 bits per sample
[00:32:19][D][nabu_media_player.pipeline:211]: Converting mono channel audio to stereo channel audio
[00:32:19][D][ring_buffer:034][speaker_task]: Created ring buffer with size 19200
[00:32:19][D][i2s_audio.speaker:111]: Starting Speaker
[00:32:19][D][i2s_audio.speaker:116]: Started Speaker
[00:32:19][D][voice_assistant:515]: State changed from IDLE to START_MICROPHONE
[00:32:19][D][voice_assistant:522]: Desired state set to START_PIPELINE
[00:32:19][D][voice_assistant:225]: Starting Microphone
[00:32:19][D][ring_buffer:034]: Created ring buffer with size 16384
[00:32:19][D][voice_assistant:515]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[00:32:19][D][voice_assistant:515]: State changed from STARTING_MICROPHONE to START_PIPELINE
[00:32:19][D][voice_assistant:280]: Requesting start...
[00:32:19][D][voice_assistant:515]: State changed from START_PIPELINE to STARTING_PIPELINE
[00:32:19][D][voice_assistant:537]: Client started, streaming microphone
[00:32:19][D][voice_assistant:515]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[00:32:19][D][voice_assistant:522]: Desired state set to STREAMING_MICROPHONE
[00:32:19][D][voice_assistant:641]: Event Type: 1
[00:32:19][D][voice_assistant:644]: Assist Pipeline running
[00:32:19][D][voice_assistant:641]: Event Type: 3
[00:32:19][D][voice_assistant:655]: STT started
[00:32:19][D][light:036]: 'voice_assistant_leds' Setting:
[00:32:19][D][light:047]:   State: ON
[00:32:19][D][light:051]:   Brightness: 66%
[00:32:19][D][light:109]:   Effect: 'Waiting for Command'
[00:32:19][D][power_supply:033]: Enabling power supply.
[00:32:20][D][voice_assistant:641]: Event Type: 11
[00:32:20][D][voice_assistant:804]: Starting STT by VAD
[00:32:20][D][light:036]: 'voice_assistant_leds' Setting:
[00:32:20][D][light:051]:   Brightness: 66%
[00:32:20][D][light:109]:   Effect: 'Listening For Command'
[00:32:22][D][voice_assistant:641]: Event Type: 12
[00:32:22][D][voice_assistant:808]: STT by VAD end
[00:32:22][D][voice_assistant:515]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[00:32:22][D][voice_assistant:522]: Desired state set to AWAITING_RESPONSE
[00:32:22][D][voice_assistant:515]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[00:32:22][D][light:036]: 'voice_assistant_leds' Setting:
[00:32:22][D][light:051]:   Brightness: 66%
[00:32:22][D][light:109]:   Effect: 'Thinking'
[00:32:22][D][voice_assistant:515]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[00:32:22][D][voice_assistant:515]: State changed from AWAITING_RESPONSE to AWAITING_RESPONSE
[00:32:22][D][power_supply:033]: Enabling power supply.
[00:32:23][D][power_supply:033]: Enabling power supply.
[00:32:23][D][power_supply:033]: Enabling power supply.
[00:32:23][D][power_supply:033]: Enabling power supply.
[00:32:24][D][power_supply:033]: Enabling power supply.
[00:32:24][D][power_supply:033]: Enabling power supply.
[00:32:24][D][power_supply:033]: Enabling power supply.
[00:32:25][D][power_supply:033]: Enabling power supply.
[00:32:25][D][power_supply:033]: Enabling power supply.
[00:32:25][D][power_supply:033]: Enabling power supply.
[00:32:26][D][power_supply:033]: Enabling power supply.
[00:32:26][D][power_supply:033]: Enabling power supply.
[00:32:26][D][power_supply:033]: Enabling power supply.
[00:32:26][D][esp32.preferences:114]: Saving 4 preferences to flash...
[00:32:26][D][esp32.preferences:142]: Saving 4 preferences to flash: 3 cached, 1 written, 0 failed
[00:32:27][D][power_supply:033]: Enabling power supply.
[00:32:27][D][power_supply:033]: Enabling power supply.
[00:32:27][D][power_supply:033]: Enabling power supply.
[00:32:28][D][power_supply:033]: Enabling power supply.
[00:32:28][D][power_supply:033]: Enabling power supply.
[00:32:28][D][power_supply:033]: Enabling power supply.
[00:32:29][D][power_supply:033]: Enabling power supply.
[00:32:29][D][power_supply:033]: Enabling power supply.
[00:32:29][D][power_supply:033]: Enabling power supply.
[00:32:30][D][power_supply:033]: Enabling power supply.
[00:32:30][D][power_supply:033]: Enabling power supply.
[00:32:30][D][power_supply:033]: Enabling power supply.
[00:32:31][D][power_supply:033]: Enabling power supply.
[00:32:31][D][power_supply:033]: Enabling power supply.
[00:32:31][D][power_supply:033]: Enabling power supply.
[00:32:32][D][power_supply:033]: Enabling power supply.
[00:32:32][D][power_supply:033]: Enabling power supply.
[00:32:32][D][power_supply:033]: Enabling power supply.
[00:32:33][D][power_supply:033]: Enabling power supply.
[00:32:33][D][power_supply:033]: Enabling power supply.
[00:32:33][D][power_supply:033]: Enabling power supply.
[00:32:34][D][power_supply:033]: Enabling power supply.
[00:32:34][D][power_supply:033]: Enabling power supply.
[00:32:34][D][power_supply:033]: Enabling power supply.
[00:32:35][D][power_supply:033]: Enabling power supply.
[00:32:35][D][power_supply:033]: Enabling power supply.
[00:32:35][D][power_supply:033]: Enabling power supply.
[00:32:36][D][power_supply:033]: Enabling power supply.
[00:32:36][D][power_supply:033]: Enabling power supply.
[00:32:36][D][power_supply:033]: Enabling power supply.
[00:32:37][D][power_supply:033]: Enabling power supply.
[00:32:37][D][power_supply:033]: Enabling power supply.
[00:32:37][D][power_supply:033]: Enabling power supply.
[00:32:38][D][power_supply:033]: Enabling power supply.
[00:32:38][D][power_supply:033]: Enabling power supply.
[00:32:38][D][power_supply:033]: Enabling power supply.
[00:32:39][D][power_supply:033]: Enabling power supply.
[00:32:39][D][power_supply:033]: Enabling power supply.
[00:32:39][D][power_supply:033]: Enabling power supply.
[00:32:40][D][power_supply:033]: Enabling power supply.
[00:32:40][D][power_supply:033]: Enabling power supply.
[00:32:40][D][power_supply:033]: Enabling power supply.
[00:32:41][D][power_supply:033]: Enabling power supply.
[00:32:41][D][power_supply:033]: Enabling power supply.
[00:32:41][D][power_supply:033]: Enabling power supply.
[00:32:42][D][power_supply:033]: Enabling power supply.
[00:32:42][D][power_supply:033]: Enabling power supply.
[00:32:42][D][power_supply:033]: Enabling power supply.
[00:32:43][D][power_supply:033]: Enabling power supply.
[00:32:43][D][power_supply:033]: Enabling power supply.
[00:32:43][D][power_supply:033]: Enabling power supply.
[00:32:44][D][power_supply:033]: Enabling power supply.
[00:32:44][D][power_supply:033]: Enabling power supply.
[00:32:44][D][power_supply:033]: Enabling power supply.
[00:32:45][D][power_supply:033]: Enabling power supply.
[00:32:45][D][power_supply:033]: Enabling power supply.
[00:32:46][D][power_supply:033]: Enabling power supply.
[00:32:46][D][power_supply:033]: Enabling power supply.
[00:32:46][D][power_supply:033]: Enabling power supply.
[00:32:47][D][power_supply:033]: Enabling power supply.
[00:32:47][D][power_supply:033]: Enabling power supply.
[00:32:47][D][power_supply:033]: Enabling power supply.
[00:32:47][I][safe_mode:041]: Boot seems successful; resetting boot loop counter
[00:32:47][D][esp32.preferences:114]: Saving 1 preferences to flash...
[00:32:47][D][esp32.preferences:142]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
[00:32:48][D][power_supply:033]: Enabling power supply.
[00:32:48][D][power_supply:033]: Enabling power supply.
[00:32:48][D][power_supply:033]: Enabling power supply.
[00:32:49][D][power_supply:033]: Enabling power supply.
[00:32:49][D][power_supply:033]: Enabling power supply.
[00:32:49][D][power_supply:033]: Enabling power supply.
[00:32:50][D][power_supply:033]: Enabling power supply.
[00:32:50][D][power_supply:033]: Enabling power supply.
[00:32:50][D][power_supply:033]: Enabling power supply.
[00:32:50][D][voice_assistant:641]: Event Type: 4
[00:32:50][D][voice_assistant:669]: Speech recognised as: " What's the time?"
[00:32:50][D][voice_assistant:641]: Event Type: 5
[00:32:50][D][voice_assistant:674]: Intent started
[00:32:50][D][voice_assistant:641]: Event Type: 6
[00:32:50][D][voice_assistant:641]: Event Type: 7
[00:32:51][D][voice_assistant:697]: Response: "0:32 AM"
[00:32:51][D][light:036]: 'voice_assistant_leds' Setting:
[00:32:51][D][light:051]:   Brightness: 66%
[00:32:51][D][light:109]:   Effect: 'Replying'
[00:32:51][D][voice_assistant:641]: Event Type: 8
[00:32:51][D][voice_assistant:719]: Response URL: "http://192.168.100.5:9123/api/tts_proxy/HqM16bKI3X0OZuMWvdTBXQ.flac"
[00:32:51][D][voice_assistant:515]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[00:32:51][D][voice_assistant:522]: Desired state set to STREAMING_RESPONSE
[00:32:51][D][media_player:080]: 'Media Player' - Setting
[00:32:51][D][media_player:087]:   Media URL: http://192.168.100.5:9123/api/tts_proxy/HqM16bKI3X0OZuMWvdTBXQ.flac
[00:32:51][D][media_player:093]:  Announcement: yes
[00:32:51][D][voice_assistant:641]: Event Type: 2
[00:32:51][D][voice_assistant:733]: Assist Pipeline ended
[00:32:51][D][nabu_media_player.pipeline:173]: Reading FLAC file type
[00:32:51][D][nabu_media_player.pipeline:184]: Decoded audio has 1 channels, 48000 Hz sample rate, and 16 bits per sample
[00:32:51][D][nabu_media_player.pipeline:211]: Converting mono channel audio to stereo channel audio
[00:32:53][D][voice_assistant:515]: State changed from STREAMING_RESPONSE to IDLE
[00:32:53][D][voice_assistant:522]: Desired state set to IDLE
[00:32:53][D][light:036]: 'voice_assistant_leds' Setting:
[00:32:53][D][light:047]:   State: OFF
[00:32:53][D][light:109]:   Effect: 'None'
@StrandmonYellow
Copy link

As mentioned already, I just received my unit and seem to be having the same issue. The STT and processing are very quick, and the TTS in my pipeline is alo very quick, but the actual response sounding from the speaker has a delay of 20 ~ 50 seconds or so. Tommorow I have some time to take a look at the logs.

@hille721
Copy link

I got my device today and have the same issue. First I thought it might be due to bad wifi signal but it has the behavior also next to the router. The audio response is not only slow but also stucks in the middle from time to time.

This is really annoying as with that it is for our home not "production ready".

@cryptk
Copy link

cryptk commented Dec 31, 2024

I also have two Voice PE devices, one works great, the other is VERY slow to respond. Same issues as above, the TTS is generated quickly, but there is a lengthy delay before I get the audio response from the unit. Interestingly, I can see some pretty poor behavior when pinging the unit:

PING 192.168.20.153 (192.168.20.153) 56(84) bytes of data.
64 bytes from 192.168.20.153: icmp_seq=1 ttl=62 time=3097 ms
64 bytes from 192.168.20.153: icmp_seq=2 ttl=62 time=2098 ms
64 bytes from 192.168.20.153: icmp_seq=3 ttl=62 time=1058 ms
64 bytes from 192.168.20.153: icmp_seq=4 ttl=62 time=18.0 ms
64 bytes from 192.168.20.153: icmp_seq=5 ttl=62 time=3116 ms
64 bytes from 192.168.20.153: icmp_seq=6 ttl=62 time=2037 ms
64 bytes from 192.168.20.153: icmp_seq=7 ttl=62 time=999 ms
64 bytes from 192.168.20.153: icmp_seq=8 ttl=62 time=1.39 ms
64 bytes from 192.168.20.153: icmp_seq=9 ttl=62 time=1918 ms
64 bytes from 192.168.20.153: icmp_seq=10 ttl=62 time=878 ms
64 bytes from 192.168.20.153: icmp_seq=11 ttl=62 time=23.3 ms
64 bytes from 192.168.20.153: icmp_seq=12 ttl=62 time=3089 ms
64 bytes from 192.168.20.153: icmp_seq=13 ttl=62 time=2050 ms
64 bytes from 192.168.20.153: icmp_seq=14 ttl=62 time=1010 ms
64 bytes from 192.168.20.153: icmp_seq=15 ttl=62 time=4.10 ms
64 bytes from 192.168.20.153: icmp_seq=16 ttl=62 time=1808 ms
64 bytes from 192.168.20.153: icmp_seq=17 ttl=62 time=779 ms
64 bytes from 192.168.20.153: icmp_seq=19 ttl=62 time=1961 ms
64 bytes from 192.168.20.153: icmp_seq=20 ttl=62 time=920 ms

You can see the response times are all over the place, with several pings being in the multiple-second range. I have tried factory resetting it as well as completely re-flashing it within ESPHome and the behavior persists. My other unit responds consistently in the 100ms range with no dropped pings (subsequent tests with the bad unit did drop several pings).

@cryptk
Copy link

cryptk commented Dec 31, 2024

I'm not sure why, but my slow unit is working great now. Here is what I did, lets see if anyone else can reproduce:

  1. Inside ESPHome chose to Take Control of the slow unit
  2. As part of this process, it will build the firmware and update the unit, this initial build worked, but the update failed because of a network timeout (go figure, the unit had horrible networking)
  3. I plugged the unit into my desktop and re-did the install over the USB cable, the unit still had very slow networking
  4. I chose to clean build files in ESPHome and re-did the build/install over the USB cable again
  5. My unit now has great networking and is responding quickly, just like my other unit!

@StrandmonYellow
Copy link

Does the automatic OTA remain intact with this method?

@StrandmonYellow
Copy link

I can also notice that sometimes it takes longer for the device to actually execute the commands. When using my samsung phone, the lights for example turn off immediately. Sometimes with the Voice PE it is also quick to respond, but sometimes the command is executed 2~3 seconds later than I am used to when using my phone. Could this have to do something with the bad Ping results? I am also getting values most of the time of +3000ms.

@hille721
Copy link

hille721 commented Jan 1, 2025

Hi @cryptk,

I can see some pretty poor behavior when pinging the unit:

I also see slow pings, not that poor as yours, only 50 to 100ms, but compared to other devices in my network it is slow.

I'm not sure why, but my slow unit is working great now. Here is what I did, lets see if anyone else can reproduce:

I tried the same, unfortunately it does not solve my issue :(

Does the automatic OTA remain intact with this method?

@StrandmonYellow: no, as documented here: https://voice-pe.home-assistant.io/guides/update/

@hille721
Copy link

hille721 commented Jan 1, 2025

Found some settings in another issue: #255 (comment), which are solving the "slow ping issues" for me. With these settings the ping response is normal. But the actual issue of slow and stutter response is still there. Thus not sure if this is really related.

EDIT: after testing it a bit more I have the feeling that the response is quicker than before. Long responses still hangs in the middle sometimes, but when sending TTS actions they are coming immediately, before it had delays of a few seconds.

@cryptk
Copy link

cryptk commented Jan 2, 2025

Does the automatic OTA remain intact with this method?

As @hille721 mentioned, with the device "taken over" in the ESPHome Addon, no, the normal OTA mechanism will not work, but you can update it via the ESPHome addon. After taking over the device and re-flashing it to get things working, I was able to factory reset the device, delete it from Home Assistant and delete it from ESPHome, then re-flash it again using the official page at https://esphome.github.io/home-assistant-voice-pe/ and re-adopt. This has resulted in a properly working Voice PE which is NOT taken over in ESPHome and should support the standard OTA mechanisms.

It is entirely possible that it's not necessary to do the take-over process at all and you could just re-flash the device using the page above to resolve the issue, I just went straight to taking over the device because I am quite familiar and comfortable with ESPHome. That said, if you aren't looking to modify the devices firmware at all, maybe start with just doing a user data wipe and/or a re-install of the firmware from the page linked above.

@StrandmonYellow
Copy link

StrandmonYellow commented Jan 2, 2025

@cryptk thank you for the detailed description. If I take over the Device using ESPHome builder Addon, it works great without any delay. I can also implement this change: #255. In this state, the device works fantastic. However, when I then delete the device from the addon, and reset it to factory settings and flash it again using the stock firmware, I am back to having the delay, and it not being responsive on any control input. I don't understand why there is no delay when I adopt it using the addon, and why there is a delay when using stock firmware?

Edit: If anyone is still strugling with this, for the time being I adopted the device using ESPHome builder where it is working much better, until a version is released where this is fixed. Hopefully that will be soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants