-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SkyConnect with new "ember" driver: Network extremely slow. OOM error in logs #22249
Comments
The config that fails to set is not a problem, it just uses the in-firmware value instead. Did you let it run for a few hours before retrying after the switch? It could just be the network that needs to rebuild itself for some reason. |
I tried it several times in the last two weeks and it was always the same: The network was extremely slow after the update, switching lights took minutes. Window sensors, however, reacted at once in Z2M. When switching back to ezsp after several hours, everything was fine at once. |
@Nerivec EDIT: Just for the fun of it, I checked how long it takes from switching the light on to reaction: 3min 30 secs! I did not even know that stuff can take so long without a timeout. |
I noticed that when starting with the ember driver, the Z2M container often fails with the following error at the first attempt, however works after the second (automatic) restart:
|
I think something is not going right during startup sequence. The send queue is way too high (117 in your first debug line, 211 by the end). Seems something is sending far too many commands, and the driver can't catch up fast enough. Since you mentioned using 7.4.2, did you use one from here to avoid hardware flow control issues? Not strictly related (or possibly entirely related 😄), but looks like you have these devices in a non-zigbee type group (group that simply sends X commands for X number of devices in it). I'd advise to use a Zigbee group, that will send only 1 command, should lighten the load quite a bit. (You can do the same for any other groups you have like that one.)
|
I don't know what could go wrong here. I use Z2M in a Podman container on x86-64 hardware, no fancy stuff, system load <10% most of the time, USB2.0 port and long cable to put the stick away from interference, etc. As soon as I replace
Yes, I use the file ncp-uart-hw-v7.4.2.0-skyconnect-115200.gbl from this location. I assume that this is with "hw" flow control enabled, however disabling hardware rts/cts in Z2M didn't seem not to help in my case (tried both variants). EDIT: Clicking on the link you gave does redirect to the "normal" firmware folder, the folder path seems to be different than the one given in the text?!
How did you find out about that group? I never intentionally created a non-Zigbee-group, however it might be some vendor specific stuff. Can you pinpoint which device is the culprit or point out to me for what log entry to watch? Maybe this could explain some of the problems. Thanks for your support and your work on a better SkyConnect driver! |
I don't have that redirect. The gbl file you linked is the right one though. Make sure to set Search for this line in your logs: Device 61960 ( |
I just did one more try with the new driver and rtscts disabled:
Doesn't seem to help, first launch of Z2M failed nevertheless, second one was OK. Slow reaction again...
|
having the same issue. Most of my devices are working as expected (battery or not) but one remote (icasa) only was sending linkquality and lastseen. I manually bound the endpoints, now i get a lot more (including action) BUT i cant get action_group, so basically i have the action but i dont know which button i pressed 😓 |
Same here. As @Ra72xx writes "As soon as I replace ember with ezsp in the config and restart, everything is fine again" When I do that everything works. |
I routinely try to switch to the new driver when new Z2M versions appear, and it is always the same. It also seems not to get better when giving the network a few hours to "calm down" and/or unplugging the USB dongle after the change for some time. |
Seems like other comments may not be related to this issue at all. @Ra72xx I'm not sure what's going on in your scenario, I don't have anyone else reporting these delays. Did you figure out what triggers those massive amounts of messages sent on start? I'm pretty sure fixing that, will fix the rest. |
As you guessed, probably an Homeassistant automation which pushed - for every room and every thermostat on every sensor change - temperature values to the respective thermostats. I've modified the automation sincde (and it's no longer active currently). |
To get to over 100 in-flight messages right from the start, something is definitely wrong. Although, on its own, |
Ember at home, skyconnect with raspberry pi works beautifully. Ember at work, skyconnect with an Intel NUC, completely unusable and constant NCP restarts and timeouts. Not sure if the different platforms are significant here but it’s weird that they behave so differently. Both networks with over 100 nodes. |
@lawrencedudley Interesting. One of my setups is on Intel NUC (Dongle-E though), no issue at all. NUCs usually are far more stable than PIs too. |
I’ll give it another go on the next release, I don’t think external factors are significant here as reverting to the EZSP driver on the same Skyconnect firmware makes everything happy again. On Ember, doing an OTA update was a way of repeatedly causing restarts on the driver. Happy to jump on a quick screen share if it helps you with tracking down what’s causing it or I can share logs. |
Looking into the logs, when using the old ezsp driver, I get the message "warning: zh:ezsp: Deprecated driver 'ezsp' currently in use...." at least every hour or so. Is this sign of a restart, too? |
@lawrencedudley Weirder still, OTA should be far more stable on @Ra72xx No, Koenkk added a warning with last release that logs every hour. |
I’ll grab a debug log when I get home and stick it in here! Like I said, at home, works great. OTA seems much happier running multiple OTA updates at the same time (3/4) where EZSP would have a bad time of it with more than one running concurrently. |
@Nerivec what's the easiest way for me to retain logs from Z2M in Home Assistant? It doesn't seem to respect the max_entries setting from system log so I'm only getting 50 entries and that's like 3 seconds worth. |
You can grab the file directly, it's inside the z2m config folder (use the file explorer add-on, or studio code server). |
Cool, got it. Not sure how I missed that! I've attached a few logs. log1: There's a lot of `Delivery of MULTICAST failed for "65533" which feels like it might have some kind of retry logic on it in the ember driver, which may then be causing the network to be so busy it falls over? log2:
log3: This also a bit worrying:
Switching it back to EZSP works like a dream though. |
Try the new release of Z2M - it's been way more stable for me and this issue has pretty much disappeared unless you OTA more than one device at once. |
You were testing with the latest dev (not release) I believe @lawrencedudley That has a lot more refactoring that came along with the v8 support. It will go under test in dev for a month before it's in release though, as it does change a lot of stuff 😉 |
Yes, the latest stable release does not help me with my problem, unfortunately. |
Here my logfile with the currenct stable version and the "ember" driver. |
It seems to sometimes take a surprisingly long time for the web UI to come back alive. Have you tried edge @Ra72xx? |
I'm just experimenting with the latest-dev docker image and there is no real difference, everything is delayed. |
Hate to say it but I'm now seeing similar! |
@Ra72xx are you using a Skyconnect USB stick by any chance? Just trying to correlate these issues. |
@Ra72xx this might be a dumb one but I've just disabled availability and that's made it boot straight up. I think ember may be trying to ping every device on the network before it declares the UI ready? |
Sadly I'm going to have to switch back to EZSP as well. Lots of angry "the lights aren't working" going on in my house. |
I'll have to look into it. Disabling "availability" definitely changes something with "ezsp" driver, as this enabled me displaying the networt map for the first time since months, so it might help with "ember", too. |
I would agree regarding being very careful regarding the EZSP driver as it's likely to leave some users high and dry otherwise. @Nerivec has been doing a great job on the ember driver, it's just painful work as everyone's setup is obviously slightly different and remote debugging is tricky. With the latest update I've gone from ember working great at home and not working in the office to it working fine in the office but not at home. The real risk with ember is that for stuff like lighting, stability is everything. As soon as a light switch or a motion sensor doesn't work or is delayed etc. it causes real pain for users. Tl;dr just because the bugs in ember are really hard to replicate, I wouldn't presume they don't need fixing! |
@Ra72xx You still have that issue with spamming in the request queue. Seems to be caused by these that don't support reporting, so Z2M reads the state every 5 seconds instead.
Do you have a log from @lawrencedudley Latest release contains no change in |
It seems to be really unstable on release and on edge |
Thanks for your work on the ember driver.. Log from a restart with ezsp: ezsp.zip Yes, I also think that those Busch Jäger devices https://www.zigbee2mqtt.io/devices/6735_6736_6737.html put some stress on the network. However, both Z2M/ezsp and before that Deconz/Phoscon were able to handle that somehow. |
PS: Might be worth asking for a revamp for that Busch Jager converter in here as a feature request. I'm sure someone can make a few changes, spruce it up a little. Make sure to describe behaviors and what parameters would be helpful, as I doubt a lot of people have encountered those devices. Off-topic: |
Thanks for looking into the issue.
That is a standard Ikea device like this, https://www.zigbee2mqtt.io/devices/ICPSHC24-30EU-IL-1_ICPSHC24-10EU-IL-2.html#ikea-icpshc24-30eu-il-1%252Ficpshc24-10eu-il-2. I doubt that I'm the only one running this kind of stuff ;-). |
When writing my bug report concerning the Busch Jaeger converter, I also noticed why this special ping is failing: There is no state to be reported! This panel is not directly connected to a relay/dimmer, but only to a power source (which is one of the three possible setups provided by the manufacturer - mount the panel on relay, dimmer, or only mains adaptor). So this is more a bug in the respective converter. |
Unfortunately I also have to switch back from ember to ezsp. Zigbee2mqtt was crashing twice on hour with ember. Totally useless. |
I'd be happy providing some hardware for @Nerivec to test with - I do think the work that's being done is the right work but I think given how wild the ecosystem is in general in terms of devices this isn't going to go away without a varied selection of devices on the same network. Anyone else up for buying some hardware to spruce up @Nerivec's testing lab? I think it's mostly relays at the moment which are one of the simpler devices out there in reality. |
@Ra72xx Can you give the latest dev/edge another try? I implemented concurrency in zigbee-herdsman 0.52.0, it should hopefully lessen the blow from the spam you are experiencing. |
My network completely completely broke down with the "ezsp" driver over night for no obvious reason, so I had nothing to lose and switched to latest-dev and tried "ember" once again. |
Great news! Thanks for the feedback. |
Delay when turning on/off with Zigbee. I understand that a delay of state feedback when operating manually is to be epected after the patch to the converter, but not this way round. |
When you have a moment, can you grab a |
It is just that the lights (relay or dimmer) controlled directly by Busch-Jaeger switches sometimes (not always, but very often) now have a slight delay when operated via Zigbee (Home-Assistant or Z2M web ui). All other devices in the network seem to react almost instantly to Zigbee commands. Operating the Busch Jaeger switch manually shows no delay (both the uppermost row, naturally, as this row is wired directly to the relay/dimmer, and the other, Zigbee-only rows). We're talking about a fews seconds here, and only about a few devices, so unlike the general minute-long lag this bug report was about. Maybe it is rather a side effect of patch https://github.com/Koenkk/zigbee-herdsman-converters/commit/6f5707bff5f79bc8c6c7dc1b78c9b4d8a4d0f607 and not ember-specific. The log below shows a single keypress which took about 3 seconds. It does not really show anything special IMHO.
|
Moving the conversation about that Busch-Jaeger in the previous PR. |
I'll close this bug report, as the problem seems to be fixed. I'm still victim of random Z2M crashes when restarting, but there is another bug report for that one. |
What happened?
I try to switch my SkyConnect from standard 'ezsp' to 'ember' driver. Therefore I updated to the current 7.4.1.0 firmware as recommended in the respective thread. Everything works as before as long as I use the 'ezsp' driver, but as soon as I switch to 'ember', the network is extremely slow. E.g. switching a light on takes about 1 to 2 minutes until it reacts.
The startup log shows an "OUT OF MEMORY" error.
The "homeassistant/binary_sensor" error also happens with the 'ezsp' driver (don't know why).
What did you expect to happen?
No response
How to reproduce it (minimal and precise)
No response
Zigbee2MQTT version
1.36.1 commit: ffc2ff1
Adapter firmware version
7.4.1.0 build 0
Adapter
Skyconnect EZSP v13
Setup
Z2M in Podman container
Debug log
No response
The text was updated successfully, but these errors were encountered: