Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Adapter disconnected, stopping" (Skyconnect + Multiprotocol + EZSP) #21198

Closed
merlinpimpim opened this issue Feb 3, 2024 · 50 comments
Closed
Labels
problem Something isn't working stale Stale issues

Comments

@merlinpimpim
Copy link

What happened?

I regularly get this type of message with my adapter: "Adapter disconnected, stopping".
In this case, Z2M restarts.

(Continuation of the conversation started here: #21140 (comment))

What did you expect to happen?

Correct the problem by finding out why it happens

How to reproduce it (minimal and precise)

Nothing, just wait and read the logs

Zigbee2MQTT version

1.35.2

Adapter firmware version

7.3.1.0 build 0

Adapter

Skyconnect (Multiprotocol)

Setup

HA supervised, Khadas Vim1S (Docker)

Debug log

logs.txt

@merlinpimpim merlinpimpim added the problem Something isn't working label Feb 3, 2024
@merlinpimpim
Copy link
Author

To continue the conversation started here @Nerivec : #21140 (comment):

I have provided a herdsman debug log but it is very difficult to get it just before a crash which can happen at any time.
In this example the crash occurred 4 or 5 minutes after the logs were extracted (and then a Z2M restarts so a new Z2M container ID...).

I have no other error in the logs (in the Z2M interface) when this happens. See the following screenshot.
In this same screenshot we see examples of how often the problem occurs:
chrome_RFUV0PG4bL

It is therefore not certain that the problem is due to the suspension of the USB ports.

However, the cat /sys/module/usbcore/parameters/autosuspend command gives the value 2 (not -1).
And I haven't figured out how to change it permanently on my system (Khadas Vim1S, Fenix/Debian) including reading this:
https://unix.stackexchange.com/questions/91027/how-to-disable-usb-autosuspend-on-kernel-3-7-10-or-above
(I'm not a Linux system expert...)

In any case, the echo -1 >/sys/module/usbcore/parameters/autosuspend command is taken into account (until the next reboot), but it doesn't change the problem.

I hope I've given you enough information to help me.

@Nerivec
Copy link
Collaborator

Nerivec commented Feb 3, 2024

You are using silabs multiprotocol addon, right? If so, your firmware version shouldn't be 7.x.x, multiprotocol RCP firmware are currently all 4.x.x.

https://github.com/NabuCasa/silabs-firmware/tree/main/RCPMultiPAN
https://github.com/darkxst/silabs-firmware-builder/tree/main/firmware_builds/skyconnect

Also make sur the settings in silabs match whatever firmware you pick (baudrate). You'll probably want to go with a 460800 one (but older nabu casa ones are 115200).

@merlinpimpim
Copy link
Author

You're right because I did some manipulations in this direction just before your message and I think there's a clue, but I don't understand it any more:

I've just used https://skyconnect.home-assistant.io/firmware-update/ to flash this firmware:
https://github.com/darkxst/silabs-firmware-builder/blob/main/firmware_builds/skyconnect/rcp-uart-802154-v4.4.0-skyconnect-460800.gbl
(probably the same as https://github.com/NabuCasa/silabs-firmware/blob/main/RCPMultiPAN/beta/NabuCasa_SkyConnect_RCP_v4.4.0_rcp-uart-hw-802154_460800.gbl) ?

Next:
I manage to start SLM in 460800 baud but in this case Z2M does not start any more including by changing my current parameters:

port: tcp://core-silabs-multiprotocol:9999
adapter: ezsp
baudrate: 115200

to:

port: tcp://core-silabs-multiprotocol:9999
adapter: ezsp
baudrate: 460800

So I put 115200 back into SLM and Z2M and everything starts fine, but in Z2M/settings/About I read "Coordinator revision":7.3.1.0 build 0

And in this configuration I always get "Adapter disconnected, stopping" error messages (obviously...)

How can I see a version 7.3.1.0 in Z2M after flashing an 4.4.0 RCP version?
I'm lost....

@Nerivec
Copy link
Collaborator

Nerivec commented Feb 3, 2024

Did you make sure to disable anything that could be using the adapter before starting the flashing procedure? (ZHA, you shouldn't have at all in your case), silabs multiprotocol addon and Z2M.
I assume you didn't get any particular error from the flasher? (I've never used it since I only have a Sonoff Dongle-E).

PS: If you are not using the multiprotocol (other radios than Zigbee), I'd recommend you use an NCP firmware, with just Z2M. SLM/Zigbeed adds a layer of complexity and no major benefit if you don't have a huge network of devices.

@merlinpimpim
Copy link
Author

merlinpimpim commented Feb 3, 2024

I flashed the adapter from my PC, using the website mentioned above. So it wasn't connected to my Khadas with Home Assistant.
And I didn't get any errors during the operation.
When I inserted the key a second time, the flasher confirmed that I had the expected version.

Yes, I was thinking of going back to the NCP firmware to test, but we agree that if I do that, I'll have to re-pair my 25 devices one by one (bearing in mind that some of my Zigbee relays are in wall switch boxes, in an outdoor electric gate post, etc...) ? :((

Could there be another explanation for the fact that Z2M "sees" a 7.3.1 version when I'm on RCP 4.4.0 (an issue with a file in the /zigbee2mqtt directory of the Z2M addon in HA, or something else...)?

@Nerivec
Copy link
Collaborator

Nerivec commented Feb 3, 2024

If the flasher is telling you 4.4.0 after flashing, then you're good.
As for Z2M, the version would actually be reported by the silabs addon, that itself gets it from Zigbeed, not the adapter (since with multiprotocol the Zigbee stack is offloaded to the host).
I seemed to remember the SDK version being reported when I tested the addon a while back, but maybe I was wrong and it was the EmberZNet version instead, or maybe they changed that internally (bit of a mess really).

I looked, apparently Home Assistant reverted the version of the multiprotocol addon SDK back to 4.3.1 (again), which would report 7.3.1 EmberZNet (what you're seeing). And might explain a lot of you troubles. They created "special firmware" too... https://github.com/home-assistant/addons/tree/master/silabs-multiprotocol/rootfs/root
See home-assistant/addons#3429

On the "re-pair" question, when I tested the addon a few months back, I had to re-pair everything when I switched from NCP to RCP (which also upgraded the EZSP version), but not when I switched back to NCP at the same version level as the RCP (the equivalent, as described in first link...), Z2M kept all my devices talking without issue... But I heard mixed reports on people who also tried, so... 😢

@merlinpimpim
Copy link
Author

I installed the firmware you were talking about (4.3.1), I managed to start SLM (2.4.4) at 460800 baud and Z2M at 115200 baud 😊 but unfortunately I still get the famous error message 😞
image

For info on Z2M:
image

@Nerivec
Copy link
Collaborator

Nerivec commented Feb 4, 2024

Can you try to catch the logs like you did before? There was a lot of "noise" in your previous one because of decryption failures (I think?). Hopefully these will now be gone.

PS: The Z2M baudrate won't matter in your current setup, only the SLM one will be used (since Z2M is connecting through socket to SLM).

@merlinpimpim
Copy link
Author

Here's an updated file:
logs9.txt
A crash occurred 4 minutes after this extraction.

About the Z2M baudrate, I believe you but if I don't put these parameters in the Z2M add-on configuration, it won't start:

adapting: ezsp
baudrate: 115200

@romarysonrier
Copy link

romarysonrier commented Feb 5, 2024

I am facing a similar issue, more details here : Koenkk/zigbee-herdsman#910

@romarysonrier
Copy link

Here is some log that lead to shutdown of zigbee2mqtt : https://github.com/Koenkk/zigbee-herdsman/files/14170047/lixee-2.log

@romarysonrier
Copy link

@Nerivec feel free to ask more logs.

@merlinpimpim
Copy link
Author

Ah, finally a possible lead :)
And I get the impression that this phenomenon is a very French issue ;)
Thank you @romarysonrier

@romarysonrier
Copy link

romarysonrier commented Feb 5, 2024

@merlinpimpim indeed , a tiny lead. is Lixee the cause or the consequence? that is question.
It seems that it is related to our specific Lixee device using EZSP, which does a lot a polling on many values.

@Nerivec looking at my logs, we see a kind of pattern, having the same sequence :

first , we do have this log line:
zlinky config: standard, xxxx_phase, false, yyyyy
from src/devices/lixee.ts , generate by "getCurrentConfig(device, options)" line 1044.

then, we have
endpoint Read 0x00158d0005d25e8c/1 liXeePrivate(["currentDate","currentIndexTarif","currentPrice","currentTarif"] .....

then , few millesecondes after we have some error log :
Unexpected packet sequence X | X+1

followed by
Unhandled frame childJoinHandler

hope its helps.

@jhbruhn
Copy link

jhbruhn commented Feb 6, 2024

I can confirm this behaviour with my HA Yellow, Multiprotocol firmware and Z2M 1.35.2, with 1.35.1 it works. Not Lixee device in my network. But I would have to get more precise log messages to find out where exactly the error happens, the only error I have for now is "Adapter disconnected"

@romarysonrier
Copy link

romarysonrier commented Feb 6, 2024

@jhbruhn could your HW and exact FW revision? and share some log with zigbee-herdsman debugging enable ? by running
DEBUG=zigbee-herdsman* npm start.
Could you also list the router/endpoint device running in your zigbee network ? I higly suspect that intensive polling of non Reportable attributes is part of the root cause.

@romarysonrier
Copy link

@merlinpimpim Could you also list the router/endpoint devices running in your zigbee network ? I higly suspect that intensive polling of non Reportable attributes is part of the root cause.

@romarysonrier
Copy link

romarysonrier commented Feb 6, 2024

@Nerivec even shorter log :
lixee-3.log

Also, among the seveal test case, most "Unexpected packet sequence" occurence seen are related to ACK

@romarysonrier
Copy link

romarysonrier commented Feb 6, 2024

Disabling polling on Linky_TIC / Lixee by setting an empty string on param tic_command_whitelist is a working dirty workarround that stop unexpected shutdown of Z2M.

tic_command_whitelist: ~

But by doing so we loose the non Reportable attribute values, as described here :
https://github.com/fairecasoimeme/Zlinky_TIC

EDIT : empty tic_command_whitelist disable also Reportable attributes, so it is not a evidence for any rootcause.

@romarysonrier
Copy link

Looking at this sequence :
image

At first, a data is sent :
image
and was expecting a ACK : -?- waiting (2)

But Z2M never received this ACK (2)
and later while receiving uart <-- ACK (1):
we get a Unexpected packet sequence 1 | 2 +2ms

@Koenkk @Nerivec is this expected ?

Full log here :
lixee-4.log

@jhbruhn
Copy link

jhbruhn commented Feb 6, 2024

The Coordinator Type is

EZSP v12

to be specific a HA Yellow running the most recent (Gecko SDK 4.3.1)
Multiprotocol firmware.

Coordinator revision is

7.3.1.0 build 0

Logs are attached here:
logs.txt

(Unfortunately it seems like HA truncated the logs at the start)

The devices in my network are:

info  2024-02-06 19:45:47: Starting Zigbee2MQTT version 1.35.2 (commit #unknown)
info  2024-02-06 19:45:47: Starting zigbee-herdsman (0.33.5)
info  2024-02-06 19:45:53: zigbee-herdsman started (resumed)
info  2024-02-06 19:45:53: Coordinator firmware version: '{"meta":{"maintrel":"1 ","majorrel":"7","minorrel":"3","product":12,"revision":"7.3.1.0 build 0"},"type":"EZSP v12"}'
info  2024-02-06 19:45:53: Currently 69 devices are joined:
info  2024-02-06 19:45:53: Wohnzimmer Regallampe (0xcc86ecfffe19d137): LED1733G7 - IKEA TRADFRI LED bulb E14 600 lumen, dimmable, white spectrum, opal white (Router)
info  2024-02-06 19:45:53: Büroja Deckenlampe (0xf0d1b8000014287b): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info  2024-02-06 19:45:53: Büroja Schreibtischlampe (0x001788010b22487a): 9290024684 - Philips Hue white ambiance E27 1100lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Büroja Heizung (0xcc86ecfffec1f63f): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info  2024-02-06 19:45:53: Büroja Fenster links (0x00124b002521b10d): SNZB-04 - SONOFF Contact sensor (EndDevice)
info  2024-02-06 19:45:53: Büroja Fenster rechts (0x00124b00254e0b3a): SNZB-04 - SONOFF Contact sensor (EndDevice)
info  2024-02-06 19:45:53: Büroja Kaffeeknopf (0x04cd15fffe4a6863): E1812 - IKEA TRADFRI shortcut button (EndDevice)
info  2024-02-06 19:45:53: Wohnzimmer Stehlampe (0x001788010cae07a2): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Wohnzimmer Deckenlampe (0xf0d1b8000013cb3f): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info  2024-02-06 19:45:53: Wohnzimmer Kugellampe (0x680ae2fffe557254): LED1937T5_E27 - IKEA TRADFRI LED bulb E27 470 lumen, dimmable, opal white (Router)
info  2024-02-06 19:45:53: Wohnzimmer STYRBAR (0x50325ffffe2ac628): E2001/E2002 - IKEA STYRBAR remote control (EndDevice)
info  2024-02-06 19:45:53: Sterni (0x50325ffffef5205f): LED2003G10 - IKEA TRADFRI LED bulb E26/27 1100/1055/1160 lumen, dimmable, white spectrum, opal white (Router)
info  2024-02-06 19:45:53: Wohnzimmer Heizung (0xcc86ecfffec87edd): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info  2024-02-06 19:45:53: Wohnzimmer Fenster (0x00124b00234e98ef): SNZB-04 - SONOFF Contact sensor (EndDevice)
info  2024-02-06 19:45:53: Wohnzimmer SYMFONISK (0x943469fffe65989c): E1744 - IKEA SYMFONISK sound controller (EndDevice)
info  2024-02-06 19:45:53: Lichterkette (0xcc86ecfffe89ee37): E1603/E1702/E1708 - IKEA TRADFRI control outlet (Router)
info  2024-02-06 19:45:53: Flur Deckenlampe (0xf0d1b8000013ad20): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info  2024-02-06 19:45:53: Bewegungsmelder (0x00124b002502f639): SNZB-03 - SONOFF Motion sensor (EndDevice)
info  2024-02-06 19:45:53: Bürojo Heizung (0xcc86ecfffec39bb9): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info  2024-02-06 19:45:53: Bürojo Lichtknopf (0x003c84fffe259547): E1812 - IKEA TRADFRI shortcut button (EndDevice)
info  2024-02-06 19:45:53: Bürojo Fenster (0x00124b002518a8cc): SNZB-04 - SONOFF Contact sensor (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Lichtleiste (0xb4e3f9fffea55396): ICPSHC24-10EU-IL-1 - IKEA TRADFRI driver for wireless control (10 watt) (Router)
info  2024-02-06 19:45:53: Schlafzimmer Lichtschalter (0x04cd15fffe0f9437): E1743 - IKEA TRADFRI ON/OFF switch (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Vorhangschalter (0x94deb8fffe0f3a98): E1743 - IKEA TRADFRI ON/OFF switch (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Heizung links (0xcc86ecfffef4d241): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Fenster links (0x00124b002521b14d): SNZB-04 - SONOFF Contact sensor (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Fenster rechts (0x00124b00251d357c): SNZB-04 - SONOFF Contact sensor (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Heizung rechts (0x847127fffe0a5cb4): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info  2024-02-06 19:45:53: Badezimmer Heizung (0xcc86ecfffec20a8a): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info  2024-02-06 19:45:53: Küche Deckenlampe (0xf0d1b8000013cb94): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info  2024-02-06 19:45:53: Küche Esstisch Hängelampe 1 (0x001788010ba70b7e): 9290024684 - Philips Hue white ambiance E27 1100lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Küche Esstisch Hängelampe 2 (0x001788010b1f7e51): 9290024684 - Philips Hue white ambiance E27 1100lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Küche Lichtleiste (0x001788010b4783f2): 8718699703424 - Philips Hue white and color ambiance LightStrip plus (Router)
info  2024-02-06 19:45:53: Küche Wandlampe (0xf0d1b8000013cc06): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info  2024-02-06 19:45:53: Küche Heizung (0xcc86ecfffef4d631): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info  2024-02-06 19:45:53: Küche STYRBAR (0x50325ffffe2ac673): E2001/E2002 - IKEA STYRBAR remote control (EndDevice)
info  2024-02-06 19:45:53: Küche SYMFONISK (0x943469fffe5fbe4a): E1744 - IKEA SYMFONISK sound controller (EndDevice)
info  2024-02-06 19:45:53: Küche Lichtknopf (0x2c1165fffecfb8c1): E1812 - IKEA TRADFRI shortcut button (EndDevice)
info  2024-02-06 19:45:53: Küche Esstischschalter (0x0017880109a9ee05): 929002398602 - Philips Hue dimmer switch (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Deckenlampe (0xf0d1b8000014c5f1): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info  2024-02-06 19:45:53: Sudokuknopf (0x70ac08fffedcf453): E1812 - IKEA TRADFRI shortcut button (EndDevice)
info  2024-02-06 19:45:53: Bürojo Arbeitslampe Schein 1 (0x001788010cac1700): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Bürojo Arbeitslampe Schein 2 (0x001788010cad2b03): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Bürojo Arbeitslampe Schein 3 (0x001788010cae0767): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Bürojo Arbeitslampe Direkt (0x2c1165fffe85ef45): LED2003G10 - IKEA TRADFRI LED bulb E26/27 1100/1055/1160 lumen, dimmable, white spectrum, opal white (Router)
info  2024-02-06 19:45:53: Büroja Schreibtisch (0xa4c1387eb0418c52): TS011F_plug_1 - TuYa Smart plug (with power monitoring) (Router)
info  2024-02-06 19:45:53: Wohnzimmer Subwoofer (0xa4c1382197f8815d): TS011F_plug_1 - TuYa Smart plug (with power monitoring) (Router)
info  2024-02-06 19:45:53: Schlafzimmer Lichterschalter (0x142d41fffe3b900e): E1743 - IKEA TRADFRI ON/OFF switch (EndDevice)
info  2024-02-06 19:45:53: Flur Trockner (0xa4c138a4bc29642d): TS011F_plug_1 - TuYa Smart plug (with power monitoring) (Router)
info  2024-02-06 19:45:53: Bürojo Schreibtisch (0xa4c1385540f20e2b): TS011F_plug_1 - TuYa Smart plug (with power monitoring) (Router)
info  2024-02-06 19:45:53: Flur Waschmaschine (0xa4c138d87a63c833): TS011F_plug_1 - TuYa Smart plug (with power monitoring) (Router)
info  2024-02-06 19:45:53: Wohnzimmer Kuglibus (0x001788010bfbac59): 8719514301481 - Philips Hue Filament Globe Ambiance E27 (Router)
info  2024-02-06 19:45:53: Bürojo Bastelleuchte Schein 1 (0x001788010cac19ed): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Bürojo Bastelleuchte Schein 2 (0x001788010cad16c7): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Bürojo Bastelleuchte Schein 3 (0x001788010b9aa2c6): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Bürojo Bastelleuchte Direkt (0x50325ffffee76dff): LED2004G8 - IKEA TRADFRI LED bulb E26/E27 800/806 lumen, dimmable, white spectrum, clear (Router)
info  2024-02-06 19:45:53: Bürojo Schalter (0x6c5cb1fffe2c29ac): E2201 - IKEA RODRET wireless dimmer/power switch (EndDevice)
info  2024-02-06 19:45:53: Büroja Thermometer (0xa4c13884a645bced): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Küche Thermometer (0xa4c1381a2f1411d1): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Wohnzimmer Thermometer (0xa4c1387ac22fdbea): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Bürojo Thermometer (0xa4c1380cb5161457): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Filament Thermometer (0xa4c13829ce9c4039): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Flur Thermometer (0xa4c138f649eca602): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Thermometer (0xa4c1389bd33d2b4d): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Draußen Thermometer (0xa4c138742dff2382): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Badezimmer Thermometer (0xa4c138bd9299f1b6): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Wohnzimmer Lichtleiste (0xf0d1b800001bd287): 4058075208339 - LEDVANCE Flex 3P multicolor (Router)
info  2024-02-06 19:45:53: Serverschrank Thermometer (0xa4c13866cfe09f7c): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Flur Bewegungsmelder (0xd87a3bfffe015855): E2134 - IKEA VALLHORN wireless motion sensor (EndDevice)

@merlinpimpim
Copy link
Author

@merlinpimpim Could you also list the router/endpoint devices running in your zigbee network ? I higly suspect that intensive polling of non Reportable attributes is part of the root cause.

Hello,

Here are (all) my devices:

Aqara	MCCGQ11LM
Aqara	DJT11LM
LiXee	ZLinky_TIC
Neo	NAS-AB06B2
Neo	NAS-AB02B2
NodOn	SIN-4-1-20
Nous	E2
Nous	A1Z
Nous	A1Z
Nous	A1Z
OSRAM	AB3257001NJ
SONOFF	ZBMINIL2
SONOFF	SNZB-04
SONOFF	SNZB-02D
SONOFF	SNZB-02D
SONOFF	SNZB-02D
SONOFF	SNZB-02D
TuYa	TS0601_soil
TuYa	TS0207_repeater
TuYa	TS0207_repeater
Xiaomi	WXKG01LM
Xiaomi	WSDCGQ01LM
Xiaomi	GZCGQ01LM

If it helps...

@merlinpimpim
Copy link
Author

After checking, the common link between my devices and those of @jhbruhn (excluding Lixee) are SonOff SNZB-xxx devices, including SNZB-04s in particular.

@Nerivec
Copy link
Collaborator

Nerivec commented Feb 6, 2024

I'm still looking into this, but my list of tasks is growing faster than my two eyes can follow 😅
I doubt Sonoff devices are the cause, I've got about 20 of them, I'd have caught the bug too!

I've fixed a couple of problems raised from your original log @merlinpimpim, but not particularly linked to LiXee...

@jhbruhn By any chance do you have any older logs from 1.35.1 or before? Older versions were doing internal restarts (without reaching Z2M level "Adapter disconnected"), that you might not have noticed, but they could still have been happening.

@jhbruhn
Copy link

jhbruhn commented Feb 6, 2024

Absolutely, I am running 1.35.1 right now and also noticed that devices sometimes do not react in that version. I will try to collect some logs about that behaviour the next time I notice it.

But the problems with 1.35.2 occur right after starting Z2M, making it unusable, that does not happen with 1.35.1. Maybe the problem is different than that?

@Nerivec
Copy link
Collaborator

Nerivec commented Feb 6, 2024

Then likely different yes, and (without seen the error can't say for sure) there's a good chance that is fixed already. I'm just looking for a couple of other buggers before I create a PR.

@merlinpimpim
Copy link
Author

Disabling polling on Linky_TIC / Lixee by setting an empty string on param tic_command_whitelist is a working dirty workarround that stop unexpected shutdown of Z2M.

tic_command_whitelist: ~

But by doing so we loose the non Reportable attribute values, as described here : https://github.com/fairecasoimeme/Zlinky_TIC

EDIT : empty tic_command_whitelist disable also Reportable attributes, so it is not a evidence for any rootcause.

@Nerivec : I think @romarysonrier is on to something because since I did what he said, Z2M hasn't rebooted for 2 and a half hours, which is a record since the 1.35.2 update (Z2M currently reboots between 1 and 8 times an hour for me...).

Parameter tic_command_whitelist changed from "all" to empty:
image

The issue with this workaround is that no more attributes are exposed. So it's as if the device doesn't exist and/or serves no purpose:
image

However, this shows that the issue is indeed with Lixee for our cases.

I'm going to leave it until tomorrow morning to confirm that it didn't restart overnight.
To be continued.

@Nerivec
Copy link
Collaborator

Nerivec commented Feb 7, 2024

Glancing at the github for that device, tomorrow (if leaving it like that worked) you might want to try enabling only the values that interest you in that (long) list.
I believe that's it there in the first column, and see if that improves stability.

@romarysonrier
Copy link

romarysonrier commented Feb 7, 2024

On my side , I tried yesterday :

  • disabling all polling by patching node_modules/zigbee-herdsman-converters/devices/lixee.js
    if (targ.length) {
    into
    if (false && targ.length) {
    this way is Reportable Attributes from tic_command_whitelist are still being updated , but not polling is done at all.

And then i did those testing cases :

  • A./ tic_command_whitelist=~ (empty set) for only 1 hour => did not fall into fatal error
  • B./ tic_command_whitelist='EAST,EASF01,EASF02,EASF03,EASF04,EASF05,EASF06' (minimal set) for only 1 hour => did not fall into fatal error
  • C./ tic_command_whitelist='EAST,EASF01,EASF02,EASF03,EASF04,EASF05,EASF06,SMAXN,SINSTS,UMOY1,IRMS1,URMS1,CCASN' (medium set) : did fall into shutdown after 2.5 hour of uptime
    see log here
    lixee-6.log, still having Unexpected packet sequence messages

Thus , I will retry test case B ( all attributes belonging to the same cluster = 0x702) for a longer time frame, to gather more log and clues. But is the lixee device still send updated to all Reportable attribute from all cluster (haElectricalMeasurement+haMeterIdentification+lXeePrivate+seMetering).

@Nerivec : So it so mean that the abnormal behaviour is also trigger when using only Reportable attribute update

@merlinpimpim
Copy link
Author

My return today:

I updated Z2M to version 1.35.3 last night, leaving tic_command_whitelist empty: no crashes in 15 hours! 😀
So the culprit is clearly identified.

Now I've added some Reportable attributes (I'm in three-phase), I haven't restarted Z2M, but I've done a Reconfigure Device as shown in the screenshot:
image

  • The attributes are now exposed
    image
  • No reboot for 1 hour (I'll wait a few more hours to be sure)
  • But I did get this in the logs when I reconfigured the device, why do you think that is?
    Error 2024-02-07 16:23:04Failed to configure 'GARAGE - Compteur électrique - LiXee', attempt 1 (Error: ConfigureReporting 0x00158d0005d2a627/1 haElectricalMeasurement([{"attribute":"totalApparentPower","minimumReportInterval":60,"maximumReportInterval":900,"reportableChange":1}], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 104 - 2820 - 7 after 10000ms) at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:63:23) at EZSPAdapter.sendZclFrameToEndpointInternal (/app/node_modules/zigbee-herdsman/src/adapter/ezsp/adapter/ezspAdapter.ts:486:47) at processTicksAndRejections (node:internal/process/task_queues:95:5) at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:35:20) at Request.send (/app/node_modules/zigbee-herdsman/src/controller/helpers/request.ts:79:20) at Endpoint.configureReporting (/app/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:669:28) at async Promise.allSettled (index 15) at Object.configure (/app/node_modules/zigbee-herdsman-converters/src/devices/lixee.ts:1020:14) at Configure.configure (/app/lib/extension/configure.ts:120:13) at Configure.onMQTTMessage (/app/lib/extension/configure.ts:55:21))

@merlinpimpim
Copy link
Author

  • No reboot for 1 hour (I'll wait a few more hours to be sure)

no reboot for 3 hours :)

@romarysonrier
Copy link

romarysonrier commented Feb 7, 2024

@Nerivec i found out that zigbee-herdsman ezrp uart driver doesn't support NAK message properly.

Here is a proposal for handleDATA()

diff --git a/src/adapter/ezsp/driver/uart.ts b/src/adapter/ezsp/driver/uart.ts
index 1755a0e..f07b26d 100644
--- a/src/adapter/ezsp/driver/uart.ts
+++ b/src/adapter/ezsp/driver/uart.ts
@@ -44,6 +44,7 @@ export class SerialDriver extends EventEmitter {
     private sendSeq = 0; // next frame number to send
     private recvSeq = 0; // next frame number to receive
     private ackSeq = 0;  // next number after the last accepted frame
+    private rx_reject = false;
     private waitress: Waitress<EZSPPacket, EZSPPacketMatcher>;
     private queue: Queue;
 
@@ -210,6 +211,24 @@ export class SerialDriver extends EventEmitter {
 
         debug(`<-- DATA (${frmNum},${frame.control & 0x07},${reTx}): ${frame}`);
 
+        if(this.recvSeq != frmNum){
+            if(reTx || this.rx_reject){
+                debug(`Unexpected DATA packet sequence ${frmNum} | ${this.sendSeq} : rejecting packet`);
+            }
+            else {
+              debug(`Unexpected DATA packet sequence ${frmNum} | ${this.sendSeq} : Sending NAK & & rejecting packet`);
+              this.writer.sendNAK(this.recvSeq);
+              this.rx_reject=true;
+            }
+            return;
+        }
+        else{
+            if (!reTx && this.rx_reject){
+                 debug(`DATA packet sequence ${frmNum} : clear reject`);
+                 this.rx_reject=false;
+            }
+        }
+
         this.recvSeq = (frmNum + 1) & 7; // next
 
         debug(`--> ACK  (${this.recvSeq})`);
@@ -272,6 +291,7 @@ export class SerialDriver extends EventEmitter {
         let code;
         this.sendSeq = 0;
         this.recvSeq = 0;
+        this.rx_reject = false;^M
 
         debug(`<-- RSTACK ${frame}`);
 

ps : inspired from https://www.silabs.com/documents/public/user-guides/ug101-uart-gateway-protocol-reference.pdf

But zigbee-herdsman doesn't support NAK itsef : handleACK doesn't do retransmission. don't know yet if really necessary.

@merlinpimpim Regarding the root cause : My best assumption is that on my slow CPU (Rasberry pi zero) + all the MQTT updates required by the huge number of attributes sent or polled by the lixee device is so long to handle/execute by NodeJS that the EZSP device trigger retransmission, that current implementation doesn't handle properly.

@romarysonrier
Copy link

@merlinpimpim what is your CPU running nodejs ? any source of latencies between Nodejs and the ESZP device ?

@merlinpimpim
Copy link
Author

I'm using HA + Z2M + Skyconnect on a Khadas Vim1S (https://www.khadas.com/vim1s).

@Nerivec
Copy link
Collaborator

Nerivec commented Feb 7, 2024

@romarysonrier I'm looking into a more permanent solution to fix these sequence & related issues (that's not the only one). We'll see what I can come up with, needs some automated tests and all... 😉

@romarysonrier
Copy link

romarysonrier commented Feb 8, 2024

@Nerivec i made this PR proposal https://github.com/Koenkk/zigbee-herdsman/pull/911/files , which is far from perfect , but still improving the situation with NAK messages on rx side.

@romarysonrier
Copy link

romarysonrier commented Feb 8, 2024

On my case (Z2M running an a single core raspbery pi zero) , i think the root cause is now clear , at least for me:
1./ When lixee join the Zigbee network, it will generate on Z2M :

  • many Reportable attribute updates
  • AND many polling attribute updates

2./ all those updates trigger a huge amount of MQTT updates (quiet big dataset because lots of attributes count) + log writes ( if log_level=info), which generate a peak of CPU usage.
3./ then zigbee-herdsman EZSP UART driver is reading too late message from EZSP firmware (not sending ACK in the required timeframe)
4./ then , at some point, EZSP firmware is lost with seq number and refuses any new TX request with a fatal error.
5./ then zigbee-herdsman EZSP report a faltal error and Z2M shutdown.

So @merlinpimpim if you doing this :

  • disable polling, by patching : node_modules/zigbee-herdsman-converters/devices/lixee.js
    if (targ.length) {
    into
    if (false && targ.length) {
  • minimize logging effort : in configuration.yaml
log_output:
    - file
log_level: warn

You should see some improvement , as you lower the amount of CPU required to handle lixee device .

For me , It is running with those settings (+ disabling frontend) since 8 hours without a single Z2M shutdown.
The main drawback of this workarround, is lost of non Reportable attributes.

@merlinpimpim
Copy link
Author

@romarysonrier : I've made the changes in the Lixee.js file and it seems to work after resetting the Reportable attributes to "All". No crashes for 2 hours 😁but I get this type of message in the logs from time to time:

Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 liXeePrivate(["message2"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 164 - 65382 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haElectricalMeasurement(["averageRmsVoltageMeasurePeriodPhB"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 176 - 2820 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haElectricalMeasurement(["rmsCurrentPhB"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 177 - 2820 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haElectricalMeasurement(["rmsVoltagePhB"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 179 - 2820 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haElectricalMeasurement(["rmsVoltagePhC"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 180 - 2820 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 seMetering(["meterSerialNumber"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 181 - 1794 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 seMetering(["siteId"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 182 - 1794 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haMeterIdentification(["availablePower"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 183 - 2817 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haMeterIdentification(["powerThreshold"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 184 - 2817 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haMeterIdentification(["softwareRevision"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 185 - 2817 - 1 after 10000ms)

@romarysonrier
Copy link

romarysonrier commented Feb 9, 2024

@merlinpimpim "Failed to read zigbee attributes" are only generated by polling / read from Z2M in lixee.ts.
Could you double check if you really have disabled polling ?

BTW, my setup is still up and running after 21 hours

@remidebette
Copy link

remidebette commented Feb 11, 2024

Hi guys,
I am having the same error, I have a Lixee in historic mode (polling heavy) and the sonoff dongle Plus with the 7.3.1 firmware.

Since the 1.35.3-1 update of z2m, it would crash only a few seconds after start.

I want to know if switching the Linky from Historic to Standard mode would change anything? (Normally it should just be a call to the power provider)
I have just started to build a smart home (prior to moving house in a few days) and I will definitely ask for it there.

Edit: in the meantime I simply unplugged the Lixee and z2m held for 24h max. But then failed again with the same issue. It is not only the Lixee then?
I have only a few zigbee devices otherwise so far (Phillips hue bulbs and switch, a heiman Smart plug)

@romarysonrier
Copy link

@remidebette : switching the Linky from Historic to Standard mode=> more attributes => more mqtt update => more cpu load => more issue
What is the cpu running Z2M ? serial over usb or network ?

@remidebette
Copy link

remidebette commented Feb 11, 2024

Not sure to understand the question, I have a Sonoff USB dongle Plus EFR32MG21 plugged on the raspberry pi 3+ that hosts my home assistant
So it should be serial over USB?

I did not configure the dongle firmware to do multipan RCP (more precisely I had tested it at some point but I then rolled back the firmware to NCP 7.3.1)

@romarysonrier
Copy link

yes your dongle does serial over USB communication.
Your setup based on a low end CPU (raspberry pi 3+) , , that may explain why you also encounter the issue.

Could you list you zigbee devices ?
Could you try that fix :

  • disable polling, by patching : node_modules/zigbee-herdsman-converters/devices/lixee.js
    if (targ.length) {
    into
    if (false && targ.length) {
  • minimize logging effort (to reduce CPU load) : in configuration.yaml
log_output:
    - file
log_level: warn

This setup is running since 4 days on my side ( but i lost few non reportable attributes, the ones taged as read only in https://github.com/fairecasoimeme/Zlinky_TIC/tree/master?tab=readme-ov-file#subscription-optarif-values)

@merlinpimpim
Copy link
Author

@merlinpimpim "Failed to read zigbee attributes" are only generated by polling / read from Z2M in lixee.ts. Could you double check if you really have disabled polling ?

BTW, my setup is still up and running after 21 hours

It's not that simple because to do it I have to connect to the Z2M container and make the change to the lixee.js file.
This is a non-permanent modification because every time Z2M restarts the modification is lost (Z2M can restart without any link to Lixee, for example Home Assistant, HACS, Z2M version upgrades, etc.).

However, I've already had Z2M restart because of Lixee even though I think I've made the change in the current Z2M container...
Maybe I'll wait for the PRs to be deployed in production now.

Anyway, by limiting the reportable attributes to the 6 I'm interested in and setting the top parameter to 2, I haven't had a single reboot in 24 hours.

image

@romarysonrier
Copy link

it seems that @mildis is the most active contributor lixee.ts, would be great to have some help...

@Koenkk : I would be great to implement a new feature to lower CPU usage + merge Koenkk/zigbee-herdsman#911 :

  • split tic_whitelist_command into 2 parts :
    • tic_whitelist_command_reportable dedicated to Reportable attributes ( and don't do polling on thoses attributes)
    • tic_whitelist_command_readonly dedicated to Read only attributes, and do polling only on those attributes.

the main benefits would be to decrease the amount of attributes updated by polling ( Reportable attributes doesn't need polling), and then to reduce the number of MQTT updates sent to HA.

@mildis
Copy link

mildis commented Feb 12, 2024

Hi @romarysonrier,
Proud you think of me for this but unfortunately I don't get that much about Z2M internals.
I've just poke around Lixee as a end-user that could help making it more complete.
Maybe @fairecasoimeme has a better understanding of the whole picture…

@merlinpimpim
Copy link
Author

Just in case, without any changes to the converter or other files, if I delete the Lixee from Z2M 1.35.3 and start again from scratch, I get this error message (4th attempt, but same message every time):

Error 2024-02-18 22:01:05Failed to configure '0x00158d0005d2a627', attempt 4 (Error: ConfigureReporting 0x00158d0005d2a627/1 haElectricalMeasurement([{"attribute":"totalApparentPower","minimumReportInterval":60,"maximumReportInterval":900,"reportableChange":1}], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 30499 - 1 - 135 - 2820 - 7 after 10000ms) at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:63:23) at EZSPAdapter.sendZclFrameToEndpointInternal (/app/node_modules/zigbee-herdsman/src/adapter/ezsp/adapter/ezspAdapter.ts:486:47) at processTicksAndRejections (node:internal/process/task_queues:95:5) at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:35:20) at Request.send (/app/node_modules/zigbee-herdsman/src/controller/helpers/request.ts:79:20) at Endpoint.configureReporting (/app/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:669:28) at async Promise.allSettled (index 15) at Object.configure (/app/node_modules/zigbee-herdsman-converters/src/devices/lixee.ts:1020:14) at Configure.configure (/app/lib/extension/configure.ts:120:13) at Configure.onMQTTMessage (/app/lib/extension/configure.ts:55:21))

@Nerivec
Copy link
Collaborator

Nerivec commented Feb 20, 2024

Hey @merlinpimpim, sorry I've been a little buried in code lately 😉 Did you try to force pairing to a specific device (instead of just "all" when you "permit join")? Try the coordinator first, if it's in range, otherwise the nearby routers one by one. You may have a router that's picking it up and making troubles... Since it's a timeout, seems a message is getting lost somewhere (or not being sent to the right place...).

@romarysonrier In case you have a test environment where you could put the LiXee up against ember, that'd be great (though don't go breaking your house 😄). This device clearly is a big spammer, which is one of the items missing from my "TODO tests" list!

@merlinpimpim
Copy link
Author

@Nerivec:
Sorry, I was busy myself 😅
By the way, before Z2M 1.36.0, it wasn't possible to try out your proposal because of this bug: Koenkk/zigbee-herdsman#938
However, with the latest version of Z2M and after deleting LiXee from Z2M and starting a new pairing from scratch by forcing pairing with the nearest router (the coordinator is too far from the electricity meter), the good news is that it seems to have worked without getting the fatal message ("Adapter disconnected, stopping") for 48 hours! 😁
(@romarysonrier: I've left all the default settings, without touching anything, in the "Settings (specific)" tab).

Copy link
Contributor

github-actions bot commented Sep 1, 2024

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 30 days

@github-actions github-actions bot added the stale Stale issues label Sep 1, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
problem Something isn't working stale Stale issues
Projects
None yet
Development

No branches or pull requests

6 participants