-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feedback development firmware 2022/07 #383
Comments
Also running Edit: Had another lockup after ~22 hours |
I've been running it for ~12h without issues on both my networks, running z2m with Sonoff cc2652p. Will report if things change. |
Update: |
Currently running the 20220726 but this runs fine for me with z2m. Lets monitor it further. |
I am able to reproduce this when I call the light.toggle service with a few seconds transition (12) with 2 areas selected that total in 12 light bulbs. Wild guess: Maybe the flood of incoming messages causes memory issues? |
Correction: This happens as described above ONLY WHEN I have the Home Assistant Companion App open...? I then get a bunch of these warnings and my network freezes: @dmulcahey I believe once said that one of the ZHA issues is that it is limited to the HA event loop. Could it be, that..?:
|
Running it since 4-5 days without any issue. I have a 82 node network wich 30 are routers. |
@MattL0 You're running Zigbee2MQTT, right? (and not Home Assistant's ZHA integration) |
Yes I am on zigbee2mqtt |
It seems like this issue only causes the controller to lock up with ZHA. Is there a way to test if the controller locks up when there is some kind of backing up of messages? |
No issues on Z2M here since this firmware dropped on a USB CC2652P2 coordinator with a mixture of about 45 devices on mains and battery. |
Also still running fine in my case. @dumpfheimer I suggest to open an issue in the ZHA issue tracker |
I believe this may be caused by zigpy-znp registering callbacks for all known ZDO commands at startup, which increases its runtime RAM usage beyond Z2M's: https://github.com/zigpy/zigpy-znp/blob/695ebf50a86117daeec92ac08c256fc1bfd11e60/zigpy_znp/zigbee/application.py#L212-L218 |
I will. But there still seems to be some kind of regression with the firmware. IMHO it should not freeze like this and I have never had this issue before updating the firmware. |
@dumpfheimer if you have a second, try applying this patch to see if it still crashes. Only five of the thirty defined ZDO commands will be registered: diff --git a/zigpy_znp/zigbee/application.py b/zigpy_znp/zigbee/application.py
index 354edb7..382120e 100644
--- a/zigpy_znp/zigbee/application.py
+++ b/zigpy_znp/zigbee/application.py
@@ -214,6 +214,14 @@ class ControllerApplication(zigpy.application.ControllerApplication):
# Ignore outgoing ZDO requests, only receive announcements and responses
if cluster_id.name.endswith(("_req", "_set")):
continue
+ elif cluster_id not in (
+ zdo_t.ZDOCmd.Node_Desc_rsp,
+ zdo_t.ZDOCmd.Simple_Desc_rsp,
+ zdo_t.ZDOCmd.Active_EP_rsp,
+ zdo_t.ZDOCmd.Mgmt_Lqi_rsp,
+ zdo_t.ZDOCmd.Mgmt_Permit_Joining_rsp,
+ ):
+ continue
await self._znp.request(c.ZDO.MsgCallbackRegister.Req(ClusterId=cluster_id)) It won't be possible to bind/unbind things or properly join new devices since those all use additional ZDO commands but this should be enough to "run" ZHA once it's set up. |
I'll try as soon as possible, thanks! |
So, a first quick test did not crash the coordinator. This to me seems promising but not conclusive. |
@dumpfheimer The attribute reports during light transitions are ignored on the HA side (but still sent by the light, so they still travel through the network). (It can be disabled in the ZHA settings too.) Also, before/after you’ve applied the patch, you could also try to do a topology scan. (puddly mentioned that could cause higher memory usage and possibly the crash.) |
The amount of messages might only be part of the issue. If memory on the controller is the issue it might be fine as long as zigpy is reading them in the same speed they are received. But my best guess was that something like this is happening:
If the brightness during transition does not cause Hass to update the state the websocket does not block the loop and the messages are read in time for the controller to not block up. There are a few wild guesses here but the fact that it only happens when my Hass companion is open causing warning messages that the events took long and that it seems to be some kind of memory issue makes me believe this just might be the case. I am however a complete noob with python/asyncio |
I tried it with a topology scan. But it seems like the interruptions went from .4 down to .1 seconds. |
Mhm right, puddly's test-fix also breaks topology scans then. Might be worth to see if a topology scan without the fix triggers the crash. |
Right now I cannot make it crash deliberately by transitions or topology scans. (Without test fix) |
Running 20220726 here on an LAUNCHXL-CC26X2R1 and it's been very stable (81 Zigbee devices, groups,..) Topology scans don't crash,... edit: still rocksolid. In the past, some of my LED2005r5 would stop reporting/fail to respond to direct commands (group kept working!), so far, this hasn't occured on this firmware edit2: after a week, still solid, no timeouts from reporting. Everything seems to behave better. |
Trying to figure out why I keep having so many zigbee issues I updated my coordinator to this release of the firmware. Something new now started... my devices seem to stop updating. I have to reload ZHA and they work for a bit then issues happen all over again. In the logs I found this:
I don't know whether this is a ZHA issue, or coordinator issue. I see it is zigpy erroring out but I assume an issue with the coordinator could lead to that too. Before I go there and get blamed for the issue because I am using a development firmware, I'd like to rule this out. Thanks! |
I am completely uneducatedly convinced that this is primarily a firmware issue (bad memory management?) and probably needs to be fixed by Texas Instruments. But in a similar fashion I believe there could be things done in zigpy/ZHA that could prevent this bug from happening. Lastly, I also think that the zigpy/ZHA developers are aware of the "issue". I have been playing around with python event loops/threads which might be the reason this is happening with ZHA but not with z2m. I am too inexperienced with python and zigpy to do anything productive myself but I might start a discussion over at zigpy to get some guidance if and how it could be possible to decouple zigpy from the Home Assistant event loop which in the worsed case should decrease delays and in the best case might fix this bug from being triggered. |
I also have error messages with the Dev FW Coordinator Typ zStack3x0
|
I'll add my experience to the thread:
I upgraded three days ago and haven't had any issues at all. |
After the issues in my post above 9 days ago, I wiped the stick and reformed a network on ch25. It has been working great since then. My coordinator is now also linking directly to 40ish devices instead of being limited to 20ish as before this latest network reforming. Using this CC1352P2_CC2652P_launchpad_coordinator_20220726 on my Sonoff ZBDongle-P |
I finally got zigpy_znp working in a dedicated thread. Will let you know if it has any impact. |
Can the people using ZHA check if the issue is fixed with the following fw? fws.zip Changes in this fw compared to 20220726:
|
Do you think that version could impact positively that issue zigbeefordomoticz/Domoticz-Zigbee#1341 ? For memo: the ZIgbee for Domoticz plugin is based on zigpy libraries |
Yes I regenerate those files on every SDK update. All my changes to these files are annotated with a
I don't expect that. |
Fwiw, still running 20221220 with nymea without a single issue. So from my point of view that's a good one. I've only seen now that there is a newer one too. Will try to update asap and report back with that too. |
just for Information: tried 20221220 too with my CC2652RB - testet out appr. 6 Days. Runs without any Problems in my Environment. now running 20221226 since 30.12. no Problem so far i have 14 Router & 33 End Devices in my Network |
Not really dependent on the latest version 20221220, but don´t want to open a separate issue for this crazy finding.
|
@boris1000 probably better to jump to a discussion on this and off this issue. But do remember increasing TX power doesn't increase RX power and Zigbee is 2 way comms. Sometimes increasing power makes stuff worse. Just build a bigger mesh of mains powered devices. |
About as bad a conflict as can be had. Change it now before you have a lot of devices. Agree with @digiblur, increasing tx power at the coordinator gives mixed results at best, often only confusing end devices and causing them to prefer the more remote coordinator when they would be better off connecting to a nearer router. That hue motion sensor is only transmitting at whatever its normal power level is. It may be able to hear the coordinator shouting at it, but the coordinator probably can't hear the hue if it's only answering at a relative whisper. |
i agree with @jerrm increasing tx Power brings you nothing in most Cases... most of the zigbee battery devices could not connect to your Coordinator back route it makes more sense install more Routers to make your Zigbee Mesh stronger & more reliable |
Hello! I've upgraded from After upgrading to I will do more testing and report back. For now, this release looks like a major performance improvement compared to the latest release on master for my case with Zigbee2MQTT and over 100 devices. I will not need to split my network to have decent performance, which I was about to do. Router: 102 |
Quick feedback: I ran 20221220 over the holidays with no problems, except for one crash of z2m. But I didn't do much zigbee work as everybody here was sick (kids, yeah!). Been running 20221226 for a few days and reorganized some lights in the house. I'm having a lot of trouble pairing devices, it sometimes takes many tries to pair a light. After my reorg I've also lost five other devices that dropped from the network that I need to recover. Performance of the network has always been solid for me and seems unchanged. Router: 51 |
I seems to me the latest feedback is overwhelmingly positive and the original issue was addressed and seems fixed (or at least mitigated/worked around) If nobody disagrees I would close this issue as resolved. To keep any discussion going I was going to propose to open a discussion as is possible in zigpy but I was not able to find it here. |
No issue to report here too, I have been on 20221226 since day one. 78 devices (45 routers) |
@dumpfheimer see https://github.com/Koenkk/Z-Stack-firmware/discussions Great, I'll release this fw with the 1 February z2m release. |
CC1352P2_CC2652P_launchpad_coordinator_20221226 works well.
Does anyone have the same experience? |
I flashed three router sticks with 20221102 yesterday using cc2538-bsl. |
Been running solid for two weeks now, none of my TRV has dropped (I am having an issue with one Aqara temp sensor but will delete and re-add and see if that solves it) |
@jerrm You are absolutely right. If you use cc2538-bsl, you can flash router_20221102. |
|
The CC1352P2_CC2652P_launchpad_coordinator_20221226.hex firmware works without flaws for 3 weeks now with 90+ devices. |
Fourth attempt on writing this 🙈 |
Thanks for all the feedback! I've just released the |
Better later than never 🤭 |
Hi can you already see an updated Router firmware as well 😃
Koen Kanters ***@***.***> schrieb am So., 29. Jan. 2023,
13:08:
… Thanks for all the feedback! I've just released the 20221226 firmware 😄
—
Reply to this email directly, view it on GitHub
<#383 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQDXYJKHOS2XP3BIUGQ25WDWUZMVFANCNFSM54VTGHHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi,
|
forget TI flash programmer 2, use python script instead, It works |
Indeed, it were a little bit more complicated but it worked |
Follow up: #439 |
After seeing in the changelog that the routing table sizes have increased I wanted to test the latest DEVELOPMENT firmware.
I am having issues which I believe are caused by the firmware update.
It seems to me that the firmware crashes after a few hours / an amout of requests.
Unfortunately I cannot provide detailed feedback, but am glad to try with some guidance.
The first time it got stuck I did not pay a lot of attention and simply restarted everything. The second time I un- and replugged the coordinator and things recovered without any issues worth mentioning. The logs were full of messages as shown below (1). Later it changed to other error messages (2).
On the positive side:
I do feel like the larger routing table might have had a positive effect on my environment. I have ~120 zigbee devices of which probably 2/3 are routers. Especially when toggling a bunch of lights at the same time I feel like it has less "hickups"
My environment:
I am using a CC1352P2 launchpad with zigpy/zha/home assistant. The firware in use was https://github.com/Koenkk/Z-Stack-firmware/blob/develop/coordinator/Z-Stack_3.x.0/bin/CC1352P2_CC2652P_launchpad_coordinator_20220724.zip
Error message 1:
Error message 2:
The text was updated successfully, but these errors were encountered: