-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ZHA] Integration randomly stops working, sits in 'initialising' state. (still) #107490
Comments
Hey there @dmulcahey, @Adminiuga, @puddly, @TheJulianJES, mind taking a look at this issue as it has been labeled with an integration ( Code owner commandsCode owners of
(message by CodeOwnersMention) zha documentation |
Encountered same issue on 2023.12.4 Tried to reconfigure network, reboot, now integration is not showing "initializing" but no zigbee device is working. Diagnostics: |
@cdalexndr you can do a full system reboot Settings > System > Hardware > Advanced options> Reboot system. Or less recommended: unplug it and replug it back in. You do not need to repair (I have not needed to). After rebooting, ZHA should work again. For me, it works anywhere between 3 hours and 24 hours before it needs another reboot. Here are my logs, I am experiencing the same issue. I am running Core 2023.1.2, Supervisor 2023.12.0, OS 11.3, on a Raspberry Pi 4. This issue is probably a duplicate of this: #105506 |
I'm experiencing this issue as well (after seeing the same issue start in the 2023.12.x releases as documented in #105445 and related tickets). ZHA was stable for me through 2023.11.3, and has not worked well since. I avoided the 2023.12.x releases all together due to these bugs, but updated to 2024.1.1 earlier this week. ZHA worked for a few days, but as of this afternoon, has started falling into the "Initializing..." As of now, I can't get it to recover, even with a full system reboot and power off. I'm using an HA Yellow with the built-in Zigbee radio. Here are the logs with debugging enabled since the last reboot. ZHA never comes online and stays stuck in initializing: home-assistant_2024-01-09T06-09-23.858Z.log I'll likely need to revert to 2023.11.3 again, but I'm not sure how long I can stay on that old of a version. I don't suppose there's been any discussion of reverting ZHA to the 2023.11.3 code base until these issues can be resolved? |
I can confirm the issues of ZHA instability are still present even on the latest. I had a stable 2024.1.2 for a few days but since yesterday ZHA just randomly restarted twice, 6 hours apart, :( the 2nd time the system was unstable for an hour before recovering. Will try to see if I can manage to capture logs |
Same issue here. |
Same thing here: I run HAOS on an NUC, have the SkyConnect connected via USB extension cable (like you're supposed to), got the 2.4 update to the Silicon Labs Multiprotocol to 2.4.0, things started breaking... hours later I updated to 2.4.1... still broken... another few hours 2.4.2 was pushed and I upgraded. Since 2.4.2 it's been up a day, then randomly the ZHA integration goes back to "Failed Setup Will Retry" What's worse, I'm running both Zigbee AND Thread on the SkyConnect... so BOTH type of devices (85 of them) are broken... including lights. Wife Acceptance Factor is dropping rapidly. How do I downgrade back to 2.3.2?? |
Had the same issue, Although my error is specifically:
Downgrading to 2023.12.4 seems to have caused less problems after I restarted the zigbee router based devices. |
Know what you mean. Home automations (including things we have come to rely on) being broken for months is not winning me any points. I had to revert a bunch of things to failsafe mode and find workarounds for a bunch of other things. Overall this is causing me a significant amount of work and effort. |
@cdalexndr: upgrade to 2024.1.2. There were may bugs fixed between 2023.12.4 and then. Multi-PAN has issues independent of ZHA, some of which will be addressed in an update scheduled for release very soon. If you're having reloads and using multi-PAN, this isn't a ZHA issue, nor something ZHA can fix. Be aware that multi-PAN is still in the experimental phase (though improving) so if you need stability, I strongly suggest using separate sticks for Zigbee and Thread (or using an external Thread border router). |
so was there anything obvious in my logs? do you want/need me to do anything to help get to the bottom of this? were the libraries which were changed (reverted) in 1.1 changed back again in 1.2 or something? |
2024.1.1 to 2024.1.2 was a very tiny change and there would be no difference between how the two behave network-wise. What exact coordinator are you using? |
Can confirm issues since 2023.12.x aswell. The Skyconnect seems to be crashing or loosing the connection, my logs from the multiprotocoll integration are full of some messages like trying to conncet with baudrate X while its trying different baudrates before it stops overall. Physically reconnecting the Skyconnect and starting the integration again fixes the issue temporarly. It mostly crashes at 2-3AM in the night. With 2024.0 it was stable for a couple of days, now we are back to 24h. |
my coordinator as it is currently is a sonoff brigge flashed with tasmota. unlike others here. |
Let me set one up to test. I've been running my home network on a Silvercrest gateway without issues for the past day so perhaps it's something specific to the Sonoff. |
I'm also seeing these issues since the 2023.12.x update using an HA Yellow,
so it's not just Sonoff. The coordinator built into the Yellow hardware
seems to trigger the issue as well. I uploaded my debug logs previously in
this thread before migrating back to 2023.11.3 since the network was
largely unusable, but let me know if you need more logs and I can try to
upgrade again.
…On Thu, Jan 11, 2024, 15:09 puddly ***@***.***> wrote:
Let me set one up to test. I've been running my home network on a
Silvercrest gateway without issues for the past day so perhaps it's
something specific to the Sonoff.
—
Reply to this email directly, view it on GitHub
<#107490 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACLHPI4U766AKRVB3IQUHTYOBPILAVCNFSM6AAAAABBQRKL4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGA2DOMRWG4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
There were a lot of changes between 2023.12.4 and 2024.1.0 so please try the latest version. If you still have issues, post a debug log of the integration reload. Multi-PAN issues are not related to ZHA unless downgrading solves the problem. Keep in mind that ZHA prior to 2023.12.0 did not notify you when your coordinator was offline or unresponsive so it's very possible that you're not actually seeing any new issues that were not present in the past. |
I tested 2024.1.2 and that's where I had the most recent issues on my HA
Yellow. The logs above are from that version. Just noting that the issues
started in 2023.12.x. Prior to that, ZHA was rock solid. Ever since, it's
been very flaky.
…On Thu, Jan 11, 2024 at 3:20 PM puddly ***@***.***> wrote:
There were a lot of changes between 2023.12.4 and 2024.1.0 so please try
the latest version. If you still have issues, post a debug log of the
integration reload.
Multi-PAN issues are not related to ZHA unless downgrading solves the
problem. Keep in mind that ZHA prior to 2023.12.0 did not notify you when
your coordinator was offline or unresponsive so it's very possible that
you're not actually seeing any new issues that were not present in the past.
—
Reply to this email directly, view it on GitHub
<#107490 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACLHPPTUJ3JJSRDNTVZWRDYOBQRPAVCNFSM6AAAAABBQRKL4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGA3DAMRSGQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
If this is the case are you willing to try something drastic to help identify this? Would you be willing to try running the most recent version with all other integrations disabled? Just for a bit to see if the stability issue goes away? |
The challenge is that it often takes a day or more for th issue to crop up
(but then it tends to stay -- even full system reboots wouldn't bring it
back last time -- I had to downgrade to get it working again). Would
it work to wait to disable the other integrations until the issue crops up,
and then turn the other integrations off? If so, I may be able to do that,
but this is also my house, and not a test site, so my ability to have
extended downtime is a bit limited (hence why I had to revert to 2023.11.3
where things are at least stable).
On Thu, Jan 11, 2024 at 5:06 PM David F. Mulcahey ***@***.***>
wrote:
… I tested 2024.1.2 and that's where I had the most recent issues on my HA
Yellow. The logs above are from that version. Just noting that the issues
started in 2023.12.x. Prior to that, ZHA was rock solid. Ever since, it's
been very flaky.
… <#m_3367827849832089470_>
On Thu, Jan 11, 2024 at 3:20 PM puddly *@*.*> wrote: There were a lot of
changes between 2023.12.4 and 2024.1.0 so please try the latest version. If
you still have issues, post a debug log of the integration reload.
Multi-PAN issues are not related to ZHA unless downgrading solves the
problem. Keep in mind that ZHA prior to 2023.12.0 did not notify you when
your coordinator was offline or unresponsive so it's very possible that
you're not actually seeing any new issues that were not present in the
past. — Reply to this email directly, view it on GitHub <#107490 (comment)
<#107490 (comment)>>,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AACLHPPTUJ3JJSRDNTVZWRDYOBQRPAVCNFSM6AAAAABBQRKL4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGA3DAMRSGQ
<https://github.com/notifications/unsubscribe-auth/AACLHPPTUJ3JJSRDNTVZWRDYOBQRPAVCNFSM6AAAAABBQRKL4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGA3DAMRSGQ>
. You are receiving this because you are subscribed to this thread.Message
ID: @.*>
If this is the case are you willing to try something drastic to help
identify this? Would you be willing to try running the most recent version
with all other integrations disabled? Just for a bit to see if the
stability issue goes away?
—
Reply to this email directly, view it on GitHub
<#107490 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACLHPLMZP5C4WDPMHMSYBTYOB5BZAVCNFSM6AAAAABBQRKL4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGE3DIOBTGE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
It’s worth a shot and I completely understand the impact this would have. No worries either way. |
I have the same problem, I have to reboot 2-3 times for ZHA to startup correctly. I am using a Sonoff Zigbee 3.0 dongle. These are the related log messages:
|
Same problem
|
whatever was changed in 2024.1.3 has made it even worse. what was once a week has happened about 4 times in 2 days |
for what its worth, here is what it is saying mostly in the debug logs (other than the other ZHA log entries saying that DELIVERY_FAILED errors (understandably) 2024-01-14 23:20:46.009 ERROR (MainThread) [zigpy.zcl] [0xA01A:1:0x0b04] Traceback (most recent call last): |
My network went on a restarting spree today, a lot of errors and timeouts. Devices dropped off but started rejoining when I lauched add devices. Apologies I cant pin point when things started to hit the fan but trying to fix it I had turn on debug a few times. Hopefully something jumps out |
There were no ZHA or library changes between 2024.1.2 and 2024.1.3 so I think the problem you're having is just randomly manifesting. The repeated restart issue will be fixed by #107963.
@harvindhillon I don't believe your issue is related to this one. Your log is littered with
|
Thanks, @puddly. To your hardware question: I'm using the built-in Silicon Labs radio on an HA Yellow with a 4GB CM4 RPi driving it. It's in the normal Zigbee-only mode (not multiprotocol). My processor usage hovers around 5%-10%, so it's not like the system is over loaded (although I'm not sure how many things are bound by single core speed that those multi-core usage percentages may not reflect.) I attached logs above and in the previous iteration of this ticket. I have reverted back to 2023.11.3 which was the last stable version of ZHA prior to this run of issues that started in 2023.12. I can give the latest brain a try again later this week if you need more logs. I did update the radio firmware recently, and haven't tested that against the latest releases yet. |
Hmmm... as I said in my description, I am using Multi-PAN. So is my problem NOT a ZHA problem? I'm not using SkyConnect however, but Home Assistant Yellow, so home-assistant/addons#3408 is not an exact match either. |
As it happens - I do have a SkyConnect dongle. So I could go one way and set up a separate radio for Matter. As Zigbee is much more important to me right now, I would also be willing to disable Matter/Multi-PAN and just use ZHA if I can figure out how to do it without breaking anything. |
The firmware is identical for both so it's very likely the same issue.
There are documented steps here: https://yellow.home-assistant.io/guides/disable-multiprotocol/. It will migrate your network back to Zigbee-only. Once that's done, plug in the SkyConnect, install the OpenThread Border Router addon: it'll flash your SkyConnect with Thread firmware. You can then push your preferred dataset to the same border router from the Thread configuration and replicate your multi-PAN setup with two stable radios. |
Thanks @puddly, the energy scan warning is a weird one cause it is not always and I have my dongle on a 5m powered USB2 cable in the middle of my house, far from any radios. It fluctuates from as low as 15 to as high as 88. It is usually hovering around 76 though
Apologise for the logs as I dont remember the timestamps but the integration was restarting multiple times on its own and was trying to capture that. There are errors that I've not seen before
|
So I disabled Multi-PAN and my system is working now. Yay! Going to try setting up a separate radio for Thread next as suggested. Thanks! |
Any solution for non-SkyConnect users in the works? ZHA is now reinitializing 5 minutes after reboot which takes about 10 minutes each time so it's basically unusable. It used to only have this issue once or twice a day. |
is there a way of restarting HA as part of an automation? ie if detecting that the integration isn't available (or a device isn't) then restart HA completely? (I don't seem to be able to restart the integration manually even when i have access to the gui because it needs to be "up" for it to be able to restart... so i'm guessing it wouldnt work with restarting the integration itself.... although I dont mind trying). thanks |
I've set up an automation: Trigger: one of my plugged in zigbee devices become unavailable But I've since disabled it because ZHA keeps reinitializing every 5 minutes and it takes 10 to boot up again. |
great. ok i've just created that. mine doesnt take that long (at all) to restart HA (more like a minute max). still a PITA but at least hopefully it shouldn't affect the family too much at home. thanks for the tips |
Thank you very much - my Zigbee is working again without too much of a hassle. |
Have also a HomeAssistant Yellow, out of the box. Nothing added. Had also to disable the Multiprocotol. |
Hi, I'm using a SLB-06N coordinator (ezsp) over ethernet. When ZHA goes into its 'initializing' state, the dongle is still up and running and I can reach its admin console perfectly over HTTPS, so the issue is clearly on the ZHA side. A reboot of HA (without touching the dongle at all) fixes the issue. |
@ddeconin-gh Please enable ZHA debug logging, reload the integration, and leave debug logging on until ZHA enters the "initializing" state. After it recovers, disable debug logging, ZIP the log, and upload it. |
There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. |
I've been experiencing this issue consistently of late. I'll see if I can get debug logs. |
It has been a constant issue for my setup as well. Recently everything got worse and had a non-responsive Zigbee network. I bought a second Skyconnect, disabled multi protocol, and I believe that solved the issues I had for both things, haven’t had a ZHA stuck in initializing since! |
I'm not sure the best way to pull the logs off the instance. At any rate, here are some of the error logs that I'm seeing in the logs
Some logs with initialization in it
|
|
There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. |
The problem
As per previous issue (#105445) I am experiencing my ZHA randomly becoming completely unresponsive and seeing that the integration is sitting "initialising"
What version of Home Assistant Core has the issue?
core-2024.1.2
What was the last working version of Home Assistant Core?
core-2024.1.1
What type of installation are you running?
Home Assistant Container
Integration causing the issue
ZHA
Link to integration documentation on our website
No response
Diagnostics information
config_entry-zha-5fb366dc2478313fb3cb2b29c52254af.json.txt
Example YAML snippet
No response
Anything in the logs that might be useful for us?
Additional information
No response
The text was updated successfully, but these errors were encountered: