Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network-map causing lag-spikes #2118

Closed
insensitiveclod opened this issue Oct 11, 2019 · 14 comments
Closed

Network-map causing lag-spikes #2118

insensitiveclod opened this issue Oct 11, 2019 · 14 comments
Labels
stale Stale issues

Comments

@insensitiveclod
Copy link

insensitiveclod commented Oct 11, 2019

Bug report:

Intro:
This is possibly a dupe of #1269 but since that issue seems to have multiple commenters reporting all kinds of things, I decided to make this a new report; also because I think I have found out WHY the OP of issue #1269 is having these issues.
I do realize that this might well not really be a bug, but expected behaviour; but I expect other people might run into this behaviour and wonder what is going on.
Since it's related to the network-map interval of the Node-red admin-panel plugin, it might also just be seen as a bug in THAT project, nog zigbee2mqtt. I am not sure.

Setup:
I have a setup of approx 30 devices; about 60% routers, 40% end-devices; all Ikea-tradfri stuff.
I run hassio on RPI4+ 4gb with the hassio-zigbee2mqtt addon, which pulls in 1.6.0 zigbee2mqtt version.
On the usb-stick, running either the default firmware or the latest 'source-routing' on the 2531 stick (no difference).
I run the Node-red Zigbee2mqtt-adminpanel on the node-red instance on the Rpi4+

Problem:
I was running into huge latency spikes that'd pop up periodically and that would cause any command to be delayed at least a few seconds; sometimes up to 10.
Trying different wifi-channels and stick-firmware did not fix the issue. But I noticed that at the end of such spikes, the delayed commands would be executed quickly one-after-another as if a queue was being played out.
Looking at the MQTT and debug-log, I was able to correlate the latency-spikes to the interval between the appearance of the 'graphviz' command being issued to "zigbee2mqtt/bridge/networkmap" from my Node-Red-zigbee2mqtt-adminpanel , and the corresponding reply from zigbee2mqtt on "zigbee2mqtt/bridge/networkmap/graphviz"

With log-level 'debug' , it became clear that a lot of it had to do with 10000ms timeouts for LQI scans for missing/offline/powered-off devices. You can see that there's a 20-second period between requesting the map and responding with a map. During all this time, any other commands are greatly delayed:

10/11/2019, 1:37:46 PM - debug: Received MQTT message on 'zigbee2mqtt/bridge/networkmap' with data 'graphviz'
< ...SNIP.... >
10/11/2019, 1:38:00 PM - warn: Failed network lqi scan for device: '0x14b457fffe71cba7' with error: 'Error: Timed out after 10000 ms'
10/11/2019, 1:38:01 PM - warn: Failed network lqi scan for device: '0x000d6ffffe7a6fc5' with error: 'Error: Timed out after 10000 ms'
10/11/2019, 1:38:05 PM - warn: Network scan timeout, skipping outstanding lqi scans for '0x000d6ffffe4c1c2b 0x14b457fffe71cba7 0x000d6ffffe7a6fc5'
10/11/2019, 1:38:05 PM - warn: Network scan timeout, skipping outstanding rtg scans for ''
10/11/2019, 1:38:05 PM - info: MQTT publish: topic 'zigbee2mqtt/bridge/networkmap/graphviz', payload 'digraph G { ....

I do not know if this is a solvable/preventable issue; but the potential impact on performance of any domotica-system based on zigbee2mqtt is quite significant; hence it might warrant a warning in the documentation about this so that people who obliviously install admin-panel plugins know just where to look when they experience network-lag.

Unplugging a remote-switch module, turning (ac_power off) a lamp, or losing a remote are all things I expect will end up having these network-scans become slow; and thus have any command to a lamp/switch be delayed for periods of ~10-~20 seconds (in my case).

If building a network-map is inevitably going to always cause a delay (and there is no way around it), then it'd be good to contact the authors of the Node-red admin-panel to enable/disable the network-map function in a nice simple way and/or have the documentation or node-red flow also provide a comment about the possible impact network-map functionality might have on device responsiveness.

Regards.

Koenkk added a commit to Koenkk/zigbee2mqtt.io that referenced this issue Oct 12, 2019
@Koenkk
Copy link
Owner

Koenkk commented Oct 12, 2019

The networkscan is indeed a very heavy operation. Commands executed after the scan will end up in a queue waiting for the networkscan to be completed. It gets even worse when devices fail to respond as we have to wait for a timeout (delaying commands even more).

Therefore using an addon which does the networkscan on an interval basis is really tricky. I would recommend all the addon creators to only do these scans when they are manually requested by the user.

Tagging addon creators here so they can check and fix if necessary: @carldebilly @ben423423n32j14e @azuwis @rgruebel (let me know if I missed somebody)

Also added a warning to the documentation: https://www.zigbee2mqtt.io/information/mqtt_topics_and_message_structure.html#zigbee2mqttbridgenetworkmap

@patrul
Copy link

patrul commented Oct 12, 2019

I have not activated any of these tools, but the behaviour is very similar. I can use Z2M with xiaomi wireless switch and Shelly relay or RGB control through MQTT instantly but when I use any integration with HA, it introduces a delay, sometimes answer between 0,5 and 1 second and other times it delays 3-5 seconds. Sometimes near 10 seconds. The good thing is 99% times it answers to what it is asked.

Tried with several devices and ways using services, ESPHome API, ... MQTT works like a charm

@ben423423n32j14e
Copy link

I'll get it disabled in the next couple of days, interstate right now without access to my laptop.

@azuwis
Copy link
Contributor

azuwis commented Oct 14, 2019

My HomeAssistant custom card https://github.com/azuwis/zigbee2mqtt-networkmap will only do scan when user manually click refresh.

Actually, when HomeAssistant restarted, and user opens the card, it will be automatically run, but only for one time.

@schurgan
Copy link

Hello, I also have a big problem with the network-lags. How can I disable the network scanning? I use zigbee2mqtt with domoticz plugin.

@schurgan
Copy link

I still do not understand how to disable scan. Can you please explain to me step by step? thanks

@Koenkk
Copy link
Owner

Koenkk commented Oct 16, 2019

@schurgan did you check if actual network map scans occur in your case? Search the logging for scan

@schurgan
Copy link

No I have not. But I'm almost sure it's because of that. The delays are very noticeable. The network sometimes does not react for a long time (up to one minute). Now I want to switch to dev. Let's see if it gets better.

@schurgan
Copy link

now after switching to dev the errors come

Oct 16 21:52:18 raspberrypi npm[874]: zigbee2mqtt:error 2019-10-16T19:52:18: Error while starting zigbee-herdsman
Oct 16 21:52:18 raspberrypi npm[874]: zigbee2mqtt:error 2019-10-16T19:52:18: Failed to start zigbee
Oct 16 21:52:18 raspberrypi npm[874]: zigbee2mqtt:error 2019-10-16T19:52:18: Exiting...
Oct 16 21:52:18 raspberrypi npm[874]: zigbee2mqtt:error 2019-10-16T19:52:18: Error: SRSP - SYS - version after 6000ms
Oct 16 21:52:18 raspberrypi npm[874]: at Timeout.object.timer.setTimeout [as _onTimeout] (/opt/zigbee2mqtt/node_modules/zigbee-herdsman/dist/utils/waitress.js:44:24)
Oct 16 21:52:18 raspberrypi npm[874]: at ontimeout (timers.js:436:11)
Oct 16 21:52:18 raspberrypi npm[874]: at tryOnTimeout (timers.js:300:5)
Oct 16 21:52:18 raspberrypi npm[874]: at listOnTimeout (timers.js:263:5)
Oct 16 21:52:18 raspberrypi npm[874]: at Timer.processTimers (timers.js:223:10)
Oct 16 21:52:18 raspberrypi npm[874]: npm ERR! code ELIFECYCLE
Oct 16 21:52:18 raspberrypi npm[874]: npm ERR! errno 1
Oct 16 21:52:18 raspberrypi npm[874]: npm ERR! zigbee2mqtt@1.6.0 start: node index.js
Oct 16 21:52:18 raspberrypi npm[874]: npm ERR! Exit status 1
Oct 16 21:52:18 raspberrypi npm[874]: npm ERR!
Oct 16 21:52:18 raspberrypi npm[874]: npm ERR! Failed at the zigbee2mqtt@1.6.0 start script.
Oct 16 21:52:18 raspberrypi npm[874]: npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
Oct 16 21:52:18 raspberrypi systemd[1]: zigbee2mqtt.service: Main process exited, code=exited, status=1/FAILURE
Oct 16 21:52:18 raspberrypi systemd[1]: zigbee2mqtt.service: Unit entered failed state.
Oct 16 21:52:18 raspberrypi systemd[1]: zigbee2mqtt.service: Failed with result 'exit-code'.

@schurgan
Copy link

What did I do wrong???

@ben423423n32j14e
Copy link

ben423423n32j14e commented Oct 17, 2019

I've just posted a new version, graph generation happens only once after node-red has been running for 5 minutes, not every 30 seconds like the previous version.

@ben423423n32j14e
Copy link

I'm working on a longer term fix which will be visit x url then check the graph url 60 seconds later (on demand)

@carldebilly
Copy link
Contributor

carldebilly commented Oct 26, 2019

@Koenkk Fixed for Zigbee2MqttAssistant ! :-)

The main reason for regular scans is to get information not available otherwise.

The most important information only available there is the last_seen information. Is there any other way to get it? It will be available in next release anyway... #1910

@stale
Copy link

stale bot commented Dec 25, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale issues label Dec 25, 2019
@stale stale bot closed this as completed Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Stale issues
Projects
None yet
Development

No branches or pull requests

7 participants