Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zigbee2mqtt crash after a while running using tcp slzb-06m, sometimes even with restart it does not start. #23151

Open
diesel437 opened this issue Jun 24, 2024 · 34 comments
Labels
ember Issues related to ember driver problem Something isn't working

Comments

@diesel437
Copy link

What happened?

Even with a full reboot, sometimes it does not start correctly. It keeps crashing randomly.

What did you expect to happen?

Works correctly, reboots when needed!

How to reproduce it (minimal and precise)

Using SLZB-06M with darks ncp firmware: https://github.com/darkxst/silabs-firmware-builder/blob/main/firmware_builds/slzb-06m/ncp-uart-hw-v7.4.2.0-slzb-06m-115200.gbl

Zigbee2MQTT version

1.38

Adapter firmware version

7.4.2

Adapter

SMLIGHT SLZB-06M

Setup

Z2M LXC, MQTT LXC, HAOS VM

Debug log

(STARTED USING NPM START)

root@zigbee2mqtt:/opt/zigbee2mqtt/data/log/2024-06-24.03-37-22# cat log.log
[2024-06-24 03:37:22] info: z2m: Logging to console, file (filename: log.log)
[2024-06-24 03:37:22] info: z2m: Starting Zigbee2MQTT version 1.38.0 (commit #f1847301)
[2024-06-24 03:37:22] info: z2m: Starting zigbee-herdsman (0.49.2)
[2024-06-24 03:37:22] info: zh:ember: Using default stack config.
[2024-06-24 03:37:22] info: zh:ember: ======== Ember Adapter Starting ========
[2024-06-24 03:37:22] info: zh:ember:ezsp: ======== EZSP starting ========
[2024-06-24 03:37:22] info: zh:ember:uart:ash: ======== ASH NCP reset ========
[2024-06-24 03:37:22] info: zh:ember:uart:ash: Socket ready
[2024-06-24 03:37:22] info: zh:ember:uart:ash: ======== ASH starting ========
[2024-06-24 03:37:22] error: zh:ember:uart:ash: Received ERROR from NCP while connecting, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-06-24 03:37:22] error: zh:ember:uart:ash: ASH disconnected | NCP status: ASH_NCP_FATAL_ERROR
[2024-06-24 03:37:22] error: zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-06-24 03:37:22] info: zh:ember:uart:ash: ======== ASH NCP reset ========
[2024-06-24 03:37:22] info: zh:ember:uart:ash: ======== ASH starting ========
[2024-06-24 03:37:23] info: zh:ember:uart:ash: ======== ASH connected ========
[2024-06-24 03:37:23] info: zh:ember:uart:ash: ======== ASH started ========
[2024-06-24 03:37:23] info: zh:ember:ezsp: ======== EZSP started ========
[2024-06-24 03:37:24] warning: zh:ember:uart:ash: Frame(s) in progress cancelled in [1ac1020b0a527e]
[2024-06-24 03:37:24] error: zh:ember:uart:ash: Received unexpected reset from NCP, with reason=RESET_SOFTWARE.
[2024-06-24 03:37:24] error: zh:ember:uart:ash: ASH disconnected: ASH_ERROR_NCP_RESET | NCP status: ASH_NCP_FATAL_ERROR
[2024-06-24 03:37:24] error: zh:ember:uart:ash: Error while parsing received frame, status=HOST_FATAL_ERROR.
[2024-06-24 03:37:24] error: zh:ember: !!! NCP FATAL ERROR reason=HOST_FATAL_ERROR. ATTEMPTING RESET... !!!
[2024-06-24 03:37:24] info: zh:ember:queue: Request dispatching stopped; queue=0 priorityQueue=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: ASH COUNTERS since last clear:
[2024-06-24 03:37:24] info: zh:ember:uart:ash: Total frames: RX=2, TX=3
[2024-06-24 03:37:24] info: zh:ember:uart:ash: Cancelled : RX=1, TX=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: DATA frames : RX=0, TX=1
[2024-06-24 03:37:24] info: zh:ember:uart:ash: DATA bytes : RX=0, TX=4
[2024-06-24 03:37:24] info: zh:ember:uart:ash: Retry frames: RX=0, TX=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: ACK frames : RX=0, TX=1
[2024-06-24 03:37:24] info: zh:ember:uart:ash: NAK frames : RX=0, TX=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: nRdy frames : RX=0, TX=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: CRC errors : RX=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: Comm errors : RX=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: Length < minimum: RX=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: Length > maximum: RX=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: Bad controls : RX=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: Bad lengths : RX=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: Bad ACK numbers : RX=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: Out of buffers : RX=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: Retry dupes : RX=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: Out of sequence : RX=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: ACK timeouts : RX=0
[2024-06-24 03:37:24] info: zh:ember:uart:ash: ======== ASH stopped ========
[2024-06-24 03:37:24] info: zh:ember:ezsp: ======== EZSP stopped ========
[2024-06-24 03:37:24] info: zh:ember: ======== Ember Adapter Stopped ========
[2024-06-24 03:37:25] info: zh:ember: ======== Ember Adapter Starting ========
[2024-06-24 03:37:25] info: zh:ember:ezsp: ======== EZSP starting ========
[2024-06-24 03:37:25] info: zh:ember:uart:ash: ======== ASH NCP reset ========
[2024-06-24 03:37:25] info: zh:ember:uart:ash: Socket ready
[2024-06-24 03:37:25] info: zh:ember:uart:ash: ======== ASH starting ========
[2024-06-24 03:37:26] info: zh:ember:uart:ash: ======== ASH connected ========
[2024-06-24 03:37:26] info: zh:ember:uart:ash: ======== ASH started ========
[2024-06-24 03:37:26] info: zh:ember:ezsp: ======== EZSP started ========
[2024-06-24 03:37:26] warning: zh:ember: [EzspConfigId] Failed to SET "APS_UNICAST_MESSAGE_COUNT" TO "32" with status=ERROR_OUT_OF_MEMORY. Firmware value will be used instead.
[2024-06-24 03:37:26] info: zh:ember: [STACK STATUS] Network up.
[2024-06-24 03:37:26] info: zh:ember: [INIT TC] NCP network matches config.
[2024-06-24 03:37:26] info: zh:ember: [CONCENTRATOR] Started source route discovery. 1246ms until next broadcast.
[2024-06-24 03:37:26] info: zh:ember:queue: Request dispatching started.
[2024-06-24 03:37:26] info: zh:ember:ezsp: Received network/route error ROUTE_ERROR_MANY_TO_ONE_ROUTE_FAILURE for "52765".
[2024-06-24 03:38:05] error: z2m:mqtt: Not connected to MQTT server!
[2024-06-24 03:38:05] error: z2m:mqtt: Cannot send message: topic: 'zigbee2mqtt/bridge/state', payload: '{"state":"offline"}
[2024-06-24 03:38:05] info: z2m:mqtt: Disconnecting from MQTT server
[2024-06-24 03:38:05] info: z2m: Stopping zigbee-herdsman...
[2024-06-24 03:38:05] info: z2m: Stopped zigbee-herdsman
[2024-06-24 03:38:05] info: z2m: Stopped Zigbee2MQTT
root@zigbee2mqtt:/opt/zigbee2mqtt/data/log/2024-06-24.03-37-22#

(FAILED TO START)

root@zigbee2mqtt:/opt/zigbee2mqtt# npm start

zigbee2mqtt@1.38.0 start
node index.js

[2024-06-24 03:49:20] info: z2m: Logging to console, file (filename: log.log)
[2024-06-24 03:49:20] info: z2m: Starting Zigbee2MQTT version 1.38.0 (commit #6c7d52a3)
[2024-06-24 03:49:20] info: z2m: Starting zigbee-herdsman (0.49.2)
[2024-06-24 03:49:20] info: zh:ember: Using default stack config.
[2024-06-24 03:49:20] info: zh:ember: ======== Ember Adapter Starting ========
[2024-06-24 03:49:20] info: zh:ember:ezsp: ======== EZSP starting ========
[2024-06-24 03:49:20] info: zh:ember:uart:ash: ======== ASH NCP reset ========
[2024-06-24 03:49:20] info: zh:ember:uart:ash: Socket ready
[2024-06-24 03:49:20] info: zh:ember:uart:ash: ======== ASH starting ========
[2024-06-24 03:49:20] error: zh:ember:uart:ash: Received ERROR from NCP while connecting, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-06-24 03:49:20] error: zh:ember:uart:ash: ASH disconnected | NCP status: ASH_NCP_FATAL_ERROR
[2024-06-24 03:49:20] error: zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-06-24 03:49:20] error: zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-06-24 03:49:20] error: zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-06-24 03:49:20] error: zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-06-24 03:49:20] info: zh:ember:uart:ash: ======== ASH NCP reset ========
[2024-06-24 03:49:20] info: zh:ember:uart:ash: ======== ASH starting ========
[2024-06-24 03:49:21] info: zh:ember:uart:ash: ======== ASH connected ========
[2024-06-24 03:49:21] info: zh:ember:uart:ash: ======== ASH started ========
[2024-06-24 03:49:21] info: zh:ember:ezsp: ======== EZSP started ========
[2024-06-24 03:49:22] warning: zh:ember:uart:ash: Frame(s) in progress cancelled in [1ac1020b0a527e]
[2024-06-24 03:49:22] error: zh:ember:uart:ash: Received unexpected reset from NCP, with reason=RESET_SOFTWARE.
[2024-06-24 03:49:22] error: zh:ember:uart:ash: ASH disconnected: ASH_ERROR_NCP_RESET | NCP status: ASH_NCP_FATAL_ERROR
[2024-06-24 03:49:22] error: zh:ember:uart:ash: Error while parsing received frame, status=HOST_FATAL_ERROR.
[2024-06-24 03:49:22] error: zh:ember: !!! NCP FATAL ERROR reason=HOST_FATAL_ERROR. ATTEMPTING RESET... !!!
[2024-06-24 03:49:22] info: zh:ember:queue: Request dispatching stopped; queue=0 priorityQueue=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: ASH COUNTERS since last clear:
[2024-06-24 03:49:22] info: zh:ember:uart:ash: Total frames: RX=2, TX=3
[2024-06-24 03:49:22] info: zh:ember:uart:ash: Cancelled : RX=1, TX=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: DATA frames : RX=0, TX=1
[2024-06-24 03:49:22] info: zh:ember:uart:ash: DATA bytes : RX=0, TX=4
[2024-06-24 03:49:22] info: zh:ember:uart:ash: Retry frames: RX=0, TX=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: ACK frames : RX=0, TX=1
[2024-06-24 03:49:22] info: zh:ember:uart:ash: NAK frames : RX=0, TX=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: nRdy frames : RX=0, TX=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: CRC errors : RX=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: Comm errors : RX=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: Length < minimum: RX=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: Length > maximum: RX=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: Bad controls : RX=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: Bad lengths : RX=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: Bad ACK numbers : RX=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: Out of buffers : RX=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: Retry dupes : RX=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: Out of sequence : RX=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: ACK timeouts : RX=0
[2024-06-24 03:49:22] info: zh:ember:uart:ash: ======== ASH stopped ========
[2024-06-24 03:49:22] info: zh:ember:ezsp: ======== EZSP stopped ========
[2024-06-24 03:49:22] info: zh:ember: ======== Ember Adapter Stopped ========
[2024-06-24 03:49:23] info: zh:ember: ======== Ember Adapter Starting ========
[2024-06-24 03:49:23] info: zh:ember:ezsp: ======== EZSP starting ========
[2024-06-24 03:49:23] info: zh:ember:uart:ash: ======== ASH NCP reset ========
[2024-06-24 03:49:23] info: zh:ember:uart:ash: Socket ready
[2024-06-24 03:49:23] info: zh:ember:uart:ash: ======== ASH starting ========
[2024-06-24 03:49:24] info: zh:ember:uart:ash: ======== ASH connected ========
[2024-06-24 03:49:24] info: zh:ember:uart:ash: ======== ASH started ========
[2024-06-24 03:49:24] info: zh:ember:ezsp: ======== EZSP started ========
[2024-06-24 03:49:24] warning: zh:ember: [EzspConfigId] Failed to SET "APS_UNICAST_MESSAGE_COUNT" TO "32" with status=ERROR_OUT_OF_MEMORY. Firmware value will be used instead.
[2024-06-24 03:49:24] info: zh:ember: [STACK STATUS] Network up.
[2024-06-24 03:49:24] info: zh:ember: [INIT TC] NCP network matches config.
[2024-06-24 03:49:24] info: zh:ember: [CONCENTRATOR] Started source route discovery. 1248ms until next broadcast.
[2024-06-24 03:49:24] info: zh:ember:queue: Request dispatching started.
[2024-06-24 03:49:29] info: zh:ember:ezsp: <=== [ZDO clusterId=32824 sender=52815] Support not implemented upstream.

(FAILED TO START)

@diesel437 diesel437 added the problem Something isn't working label Jun 24, 2024
@hellcry37
Copy link

I am using slzb-06 and mine is more unstable also

@mmokrejs
Copy link

mmokrejs commented Jun 25, 2024

Add rtscts: false to your config for the ember driver (requires core and radio firmware updates prior to that) and beware, that at least for me the configuration.yaml is somehow cached by HomeAssistantGreen and is out of sync with the configuration.yaml I edited via ssh. I deleted the database/json contents of the /config/zigbee2mgtt_"$somenum"/ and rebooted. That is probably why people complain things are messy, because the contents are somehow cached or once parsed and converted to the database format the GUI reachable via HA -> Settings -> AddOns -> Zigbee2MQTT -> Configuration is out of sync.

Core Firmware update - SLZB-06M Zigbee Ethernet POE USB Adapter

Firmware update - SLZB-06M Zigbee Ethernet POE USB Adapter

@mirkochip88
Copy link

Slzb-06m with ember and rtscts on false, same here. If I've switch back to ezsp, work like a charm.

@Koenkk Koenkk added the ember Issues related to ember driver label Jul 6, 2024
@maciekdnd
Copy link

I have two SlZB-06M, each connected to separate LXC container with Z2M. First works every time, after every reboot, second crash every time when I restart container. Randomly connects after some time, or not at all. FW: 7.4.3.

@Nerivec
Copy link
Collaborator

Nerivec commented Aug 20, 2024

@maciekdnd Can you confirm the behavior is still occurring with latest dev? Since you have the two scenario, it would be great if you could share clean startup logs for both, and one when it crashes (both Z2M logs and logs from smlight ESP interface).

@mirkochip88
Copy link

@Nerivec i've two istance with SlZB-06M. I can do some testing tomorrow, are you referring to version 1.39.2?

@Nerivec
Copy link
Collaborator

Nerivec commented Aug 23, 2024

@mirkochip88 Latest dev branch. I was mostly interested in the fact that maciekdnd had one that worked, and one that didn't with seemingly same setup. Do you have that same behavior?

@maciekdnd
Copy link

maciekdnd commented Aug 23, 2024

Yes, I have two instances, exactly same setups, but unfortunately I'm unable to update it to dev as this is production environment 460km from my place at this moment. In couple of days I will visit that site, and maybe I'll try do something with this.
But I managed to start those controllers every time after some trial and error and bit of research. @Nerivec have a look at #23355 I was able to start both of them, but before that, I got random behavior, like no start at all, start just one controller, and suddenly next after that, or both run for 10-20s and disconnected. Since that time I have 3 controllers I'm able to start and run with no major errors. Only when I delete controller_backup.json I'm able to start Z2M by restarting systemd Z2M service. All that on 1.38.0-1.

@mirkochip88
Copy link

I have two instances both with SLZMB-06M with ethernet. The main one at home that I have never migrated to ember that runs with the official component. The secondary one is in the garage, so few devices and possibly not too much trouble to pair if necessary, and it runs on docker in a VM lxc on proxmox. New firmwares have also come out since the last time I tried to use ember, tomorrow I will migrate again and update you.

@maciekdnd
Copy link

@mirkochip88 thanks, that will hopefully help.

@Nerivec
Copy link
Collaborator

Nerivec commented Aug 23, 2024

@maciekdnd Same fixes currently in dev will be available in September 1st release, so, pretty soon, if that's easier to update in your case. 😉
@mirkochip88 SMLight indeed has released a lot of updates recently (core firmware).

@maciekdnd
Copy link

@Nerivec, that's bummer, I will be back day before 1st, so I have to move that update to my next visit. At least I can update core for my controllers, but great news anyway, thanks!

@mirkochip88

This comment was marked as outdated.

@mirkochip88
Copy link

Here we go again:

[2024-08-26 13:26:08] error: zh:ember:uart:ash: Received ERROR from adapter while connecting, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-08-26 13:26:08] error: zh:ember:uart:ash: Received ERROR from adapter while connecting, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-08-26 13:26:08] error: zh:ember:uart:ash: Received ERROR from adapter while connecting, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-08-26 15:13:52] error: zh:ember:ezsp: ERROR Transaction failure; status=ASH_ERROR_TIMEOUTS. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=254 Len=7].
[2024-08-26 15:13:52] error: zh:ember: Adapter fatal error: ASH_ERROR_TIMEOUTS
[2024-08-26 15:13:52] error: z2m: Adapter disconnected, stopping
WATCHDOG: Waiting 1min before next start try.
Starting Zigbee2MQTT with watchdog (60000,300000,900000,1800000,3600000).
[2024-08-26 17:48:21] error: zh:ember:uart:ash: Received ERROR from adapter, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-08-26 17:48:21] error: zh:ember:uart:ash: ASH disconnected | Adapter status: ASH_NCP_FATAL_ERROR
[2024-08-26 17:48:21] error: zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-08-26 17:48:21] error: zh:ember: Adapter fatal error: HOST_FATAL_ERROR
[2024-08-26 17:48:21] error: zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-08-26 17:48:21] error: z2m: Adapter disconnected, stopping
WATCHDOG: Waiting 1min before next start try.
Starting Zigbee2MQTT with watchdog (60000,300000,900000,1800000,3600000).
[2024-08-26 17:59:03] error: zh:ember:uart:ash: Received ERROR from adapter, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-08-26 17:59:03] error: zh:ember:uart:ash: ASH disconnected | Adapter status: ASH_NCP_FATAL_ERROR
[2024-08-26 17:59:03] error: zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-08-26 17:59:03] error: zh:ember: Adapter fatal error: HOST_FATAL_ERROR
[2024-08-26 17:59:03] error: z2m: Adapter disconnected, stopping
WATCHDOG: Waiting 1min before next start try.
Starting Zigbee2MQTT with watchdog (60000,300000,900000,1800000,3600000).
[2024-08-26 18:59:27] error: zh:ember:uart:ash: Received ERROR from adapter, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-08-26 18:59:27] error: zh:ember:uart:ash: ASH disconnected | Adapter status: ASH_NCP_FATAL_ERROR
[2024-08-26 18:59:27] error: zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-08-26 18:59:27] error: zh:ember: Adapter fatal error: HOST_FATAL_ERROR
[2024-08-26 18:59:27] error: zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-08-26 18:59:27] error: zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-08-26 18:59:27] error: zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-08-26 18:59:27] error: zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-08-26 18:59:27] error: z2m: Adapter disconnected, stopping
WATCHDOG: Waiting 1min before next start try.
Starting Zigbee2MQTT with watchdog (60000,300000,900000,1800000,3600000).
[2024-08-26 19:00:27] error: zh:ember:uart:ash: Received ERROR from adapter while connecting, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-08-26 19:59:52] error: zh:ember:uart:ash: Received ERROR from adapter, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-08-26 19:59:52] error: zh:ember:uart:ash: ASH disconnected | Adapter status: ASH_NCP_FATAL_ERROR
[2024-08-26 19:59:52] error: zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-08-26 19:59:52] error: zh:ember: Adapter fatal error: HOST_FATAL_ERROR
[2024-08-26 19:59:52] error: z2m: Adapter disconnected, stopping
WATCHDOG: Waiting 1min before next start try.

I've enabled the watchdog for now, and seem to restart correctly (The error seems to occur exactly every hour, very strange). Any suggestions?

@Nerivec
Copy link
Collaborator

Nerivec commented Aug 26, 2024

Can you set log level to debug so I can see what's going on right before it fails?

@mirkochip88
Copy link

mirkochip88 commented Aug 27, 2024

[2024-08-27 06:16:39] debug: 	zh:ember:uart:ash: <--- [FRAME type=DATA]
[2024-08-27 06:16:39] debug: 	zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=1](ackRx=1 frmTx=1)
[2024-08-27 06:16:39] debug: 	zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=1 frmNum=5](frmRx=5) Added to rxQueue
[2024-08-27 06:16:39] debug: 	zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=6](ackRx=1)
[2024-08-27 06:16:39] debug: 	zh:ember:ezsp: <=== [CBFRAME: ID=69:"INCOMING_MESSAGE_HANDLER" Seq=80 Len=59]
[2024-08-27 06:16:39] debug: 	zh:ember:ezsp: ezspIncomingMessageHandler(): callback called with: [type=UNICAST], [apsFrame={"profileId":260,"clusterId":64704,"sourceEndpoint":1,"destinationEndpoint":1,"options":320,"groupId":0,"sequence":162}], [packetInfo:{"senderShortId":34614,"senderLongId":"0xFFFFFFFFFFFFFFFF","bindingIndex":255,"addressIndex":255,"lastHopLqi":220,"lastHopRssi":-45,"lastHopTimestamp":0}], [messageContents=1c5f11080af2ff4119aa75138446040741000f0c23010308030a2308000c640b2003]
[2024-08-27 06:16:39] debug: 	zh:controller: Received payload: clusterID=64704, address=34614, groupID=0, endpoint=1, destinationEndpoint=1, wasBroadcast=false, linkQuality=220, frame={"header":{"frameControl":{"frameType":0,"manufacturerSpecific":true,"direction":1,"disableDefaultResponse":true,"reservedBits":0},"manufacturerCode":4447,"transactionSequenceNumber":8,"commandIdentifier":10},"payload":[{"attrId":65522,"dataType":65,"attrData":{"type":"Buffer","data":[170,117,19,132,70,4,7,65,0,15,12,35,1,3,8,3,10,35,8,0,12,100,11,32,3]}}],"command":{"ID":10,"name":"report","parameters":[{"name":"attrId","type":33},{"name":"dataType","type":32},{"name":"attrData","type":1000}]}}
[2024-08-27 06:16:39] debug: 	z2m: Received Zigbee message from 'Telecomando Garage', type 'attributeReport', cluster 'manuSpecificLumi', data '{"65522":{"data":[170,117,19,132,70,4,7,65,0,15,12,35,1,3,8,3,10,35,8,0,12,100,11,32,3],"type":"Buffer"}}' from endpoint 1 with groupID 0
[2024-08-27 06:16:39] debug: 	zhc:lumi: LLKZMK12LM: unknown key 65522 with value  u� F� A �#����
#� d
 �
[2024-08-27 06:16:39] debug: 	zhc:lumi: LLKZMK12LM: Processed data into payload {}
[2024-08-27 06:16:39] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2docker/Telecomando Garage', payload '{"action":null,"consumption":0,"current":0,"device_temperature":38,"energy":0,"interlock":"ON","last_seen":"2024-08-27T04:16:39.254Z","led_disabled_night":false,"linkquality":220,"mode":"dry","operation_mode_l1":"control_relay","operation_mode_l2":"control_relay","power":0,"power_on_behavior":"off","power_outage_count":91,"pulse_length":200,"state":"OFF","state_l1":"OFF","state_l2":"OFF","switch_type":"momentary","update":{"installed_version":30,"latest_version":30,"state":"idle"},"update_available":null,"voltage":213.98}'
[2024-08-27 06:16:47] debug: 	zh:controller:endpoint: ZCL command 0xa49e69fffe01544b/1 haElectricalMeasurement.read(["rmsVoltage","rmsCurrent","activePower"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"reservedBits":0,"writeUndiv":false})
[2024-08-27 06:16:47] debug: 	zh:ember: ~~~> [ZCL to=62467 apsFrame={"profileId":260,"clusterId":2820,"sourceEndpoint":1,"destinationEndpoint":1,"options":4416,"groupId":0,"sequence":0} header={"frameControl":{"reservedBits":0,"frameType":0,"direction":0,"disableDefaultResponse":true,"manufacturerSpecific":false},"transactionSequenceNumber":28,"commandIdentifier":0}]
[2024-08-27 06:16:47] debug: 	zh:ember:ezsp: ===> [FRAME: ID=52:"SEND_UNICAST" Seq=81 Len=30]
[2024-08-27 06:16:47] debug: 	zh:ember:uart:ash: ---> [FRAME type=DATA frmTx=1 frmRx=6](ackRx=1)
[2024-08-27 06:17:01] debug: 	zh:ember:ezsp: =x=> [FRAME: ID=52:"SEND_UNICAST" Seq=81 Len=30] Error: ASH_ERROR_TIMEOUTS
[2024-08-27 06:17:01] error: 	zh:ember:ezsp: ERROR Transaction failure; status=ASH_ERROR_TIMEOUTS. Last Frame: [FRAME: ID=52:"SEND_UNICAST" Seq=80 Len=7].
[2024-08-27 06:17:01] error: 	zh:ember: Adapter fatal error: ASH_ERROR_TIMEOUTS
[2024-08-27 06:17:01] debug: 	zh:controller: Adapter disconnected
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash: ASH COUNTERS since last clear:
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   Total frames: RX=1152, TX=1409
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   Cancelled   : RX=0, TX=0
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   DATA frames : RX=1152, TX=258
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   DATA bytes  : RX=33019, TX=7623
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   Retry frames: RX=0, TX=0
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   ACK frames  : RX=0, TX=1151
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   NAK frames  : RX=0, TX=0
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   nRdy frames : RX=0, TX=0
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   CRC errors      : RX=0
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   Comm errors     : RX=0
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   Length < minimum: RX=0
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   Length > maximum: RX=0
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   Bad controls    : RX=0
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   Bad lengths     : RX=0
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   Bad ACK numbers : RX=0
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   Out of buffers  : RX=0
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   Retry dupes     : RX=0
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   Out of sequence : RX=0
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash:   ACK timeouts    : RX=0
[2024-08-27 06:17:01] info: 	zh:ember:uart:ash: ======== ASH stopped ========
[2024-08-27 06:17:01] info: 	zh:ember:ezsp: ======== EZSP stopped ========
[2024-08-27 06:17:01] info: 	zh:ember: ======== Ember Adapter Stopped ========
[2024-08-27 06:17:01] error: 	z2m: Adapter disconnected, stopping
[2024-08-27 06:17:01] debug: 	zh:controller:endpoint: Error: ZCL command 0xa49e69fffe01544b/1 haElectricalMeasurement.read(["rmsVoltage","rmsCurrent","activePower"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"reservedBits":0,"writeUndiv":false}) failed (~x~> [ZCL to=62467] Failed to send request with status=FAIL.)
    at /app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:2515:27
    at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:36:20)
    at Request.send (/app/node_modules/zigbee-herdsman/src/controller/helpers/request.ts:82:20)
    at Endpoint.zclCommand (/app/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:911:28)
    at Endpoint.read (/app/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:443:29)
    at poll (/app/node_modules/zigbee-herdsman-converters/src/lib/tuya.ts:175:13)
    at Timeout._onTimeout (/app/node_modules/zigbee-herdsman-converters/src/lib/utils.ts:53:25)
[2024-08-27 06:17:01] debug: 	z2m: Saving state to file /app/data/state.json
[2024-08-27 06:17:01] info: 	z2m:mqtt: MQTT publish: topic 'zigbee2docker/bridge/state', payload '{"state":"offline"}'
[2024-08-27 06:17:01] info: 	z2m: Disconnecting from MQTT server
[2024-08-27 06:17:01] info: 	z2m: Stopping zigbee-herdsman...
[2024-08-27 06:17:01] debug: 	zh:controller:database: Writing database to '/app/data/database.db'
[2024-08-27 06:17:01] info: 	z2m: Stopped zigbee-herdsman
[2024-08-27 06:17:01] info: 	z2m: Stopped Zigbee2MQTT
WATCHDOG: Waiting 1min before next start try.
Starting Zigbee2MQTT with watchdog (60000,300000,900000,1800000,3600000).

zigbee2mqtt_logs.txt

@Nerivec
Copy link
Collaborator

Nerivec commented Aug 27, 2024

Can you see what the logs in the ESP interface (SMLight) say around the time of the crash? It's acting like the adapter stopped responding to Z2M (but without failing the TCP connection).

@mirkochip88
Copy link

[02:31:15] taskZB | Client disconnected, id: 0
[02:31:30] taskZB | New client: 192.168.0.143 id: 0
[02:36:07] taskZB | Client disconnected, id: 0
[02:36:08] taskZB | New client: 192.168.0.143 id: 0
[02:45:10] taskZB | Client disconnected, id: 0
[02:45:13] taskZB | New client: 192.168.0.143 id: 0
[02:48:50] taskZB | Client disconnected, id: 0
[02:48:52] taskZB | New client: 192.168.0.143 id: 0
[02:52:28] taskZB | Client disconnected, id: 0
[02:53:19] taskZB | New client: 192.168.0.143 id: 0
[02:56:39] taskZB | Client disconnected, id: 0
[02:57:29] taskZB | New client: 192.168.0.143 id: 0
[03:00:09] taskZB | Client disconnected, id: 0
[03:01:06] taskZB | New client: 192.168.0.143 id: 0
[03:38:08] taskZB | Client disconnected, id: 0
[03:39:09] taskZB | New client: 192.168.0.143 id: 0
[06:17:05] taskZB | Client disconnected, id: 0
[06:18:01] taskZB | New client: 192.168.0.143 id: 0

@Nerivec
Copy link
Collaborator

Nerivec commented Aug 27, 2024

@darkxst Can you take a look at this issue?
From the latest logs of mirkochip88, the counters in-between crashes are a bit too close to be a coincidence (~RX=1150, TX=1400 for both occurrences), it's like something is happening at an interval and making the EmberZNet stack unresponsive? The TCP doesn't appear to be lost however (or it's not reported lost...), so Z2M times out instead.

@Tarik2142
Copy link

[02:31:15] taskZB | Client disconnected, id: 0
[02:31:30] taskZB | New client: 192.168.0.143 id: 0
[02:36:07] taskZB | Client disconnected, id: 0
[02:36:08] taskZB | New client: 192.168.0.143 id: 0
[02:45:10] taskZB | Client disconnected, id: 0
[02:45:13] taskZB | New client: 192.168.0.143 id: 0
[02:48:50] taskZB | Client disconnected, id: 0
[02:48:52] taskZB | New client: 192.168.0.143 id: 0
[02:52:28] taskZB | Client disconnected, id: 0
[02:53:19] taskZB | New client: 192.168.0.143 id: 0
[02:56:39] taskZB | Client disconnected, id: 0
[02:57:29] taskZB | New client: 192.168.0.143 id: 0
[03:00:09] taskZB | Client disconnected, id: 0
[03:01:06] taskZB | New client: 192.168.0.143 id: 0
[03:38:08] taskZB | Client disconnected, id: 0
[03:39:09] taskZB | New client: 192.168.0.143 id: 0
[06:17:05] taskZB | Client disconnected, id: 0
[06:18:01] taskZB | New client: 192.168.0.143 id: 0

Do you have automatic zigbee updates enabled?

@mirkochip88
Copy link

No

@Tarik2142
Copy link

No

What is the IP lease time on your DHCP server?

@mirkochip88
Copy link

It is set to 10 days, but obviously I have the IP reserved for ZigBee routers. With EZSP I have no problems in the same network configuration.

@Tarik2142
Copy link

It is set to 10 days, but obviously I have the IP reserved for ZigBee routers. With EZSP I have no problems in the same network configuration.

@Nerivec

EZSP/EMBER only affects zigbee2mqtt, nothing changes in coordinator operation, so I think it's an EMBER bug and not coordinator firmware

@Nerivec
Copy link
Collaborator

Nerivec commented Sep 5, 2024

Since the same model adapter works in one setup, and not in another (per maciekdnd feedback), even though both are roughly the same, seems like we're missing a parameter in the equation.
@maciekdnd If you had them with ezsp before, can you confirm if both setups worked then? I'm assuming both stayed roughly the same? Can you also confirm parameters in the SMLight UI, and z2m configuration.yaml are the same?

@mirkochip88
Copy link

mirkochip88 commented Sep 5, 2024

The only 3 differences are the ZigBee Firmware Router version(I'm not sure the latest version is compatible with ezsp), the RTSC set on False and obliviously Ember

@maciekdnd
Copy link

maciekdnd commented Sep 5, 2024

Both controllers were configured with ember right from the beginning. The first thing I did was to update firmware to the latest version (z2m and core) before configuring z2m. I have 3 controllers right now, all of them have same exact settings, except static IP for every controller. All of them are in the same vlan. All settings are exactly the same for each LXC container, all of them were configured using configuration.yaml file before starting z2m.
All of theme behave the same. When I restart controller or container, or connection is interrupted in any way, z2m is unable to connect to the controller any more. Only deleting controler-backup.json and restarting systemd z2m service works.
Also, in controller UI there is info that z2m is connected, even when crashing, no meaningful logs in controller UI. All controllers with same z2m firmware and core.
One thing I observed after controller is up again (due to restart or other faults) there are 2 socket connections. Looks like socket or port connection is still active, so new connection cannot be established, and then z2m is crashing. I have to delete file and restart z2m service.

Containers were configured using this script: https://proxmox-helper-scripts.vercel.app/scripts?id=Zigbee2MQTT
Default (Debian) image, as I couldn't make z2m to work on Alpine distro. I cannot recall doing something odd with those configs.
All of my controller are connected by Ethernet and powered by POE form Unifi 24 POE switch.

I remember that at the beginning, when I didn't know how to restart the controller, I waited and blindly changed various things (for example sudo systemctl start zigbee2mqtt without deleting file), then sometimes one controller would restart, but also the second one would automatically start, where I didn't change anything . After that I wasn't able to run them at all until I found a way to remove them.

I don't know if it's related, but even when I have permit join set to false, some devices still can join network. Devices removed (force remove) and new ones (like IKEA light bulbs).

@Koenkk
Copy link
Owner

Koenkk commented Sep 5, 2024

@maciekdnd

Only deleting controler-backup.json and restarting systemd z2m service works.

Can you check if this reproducible (does it always fail when there is a coordinator-backup.json and always start when there is no coordinator-backup.json). If yes, can you provide the debug logs of both with and without coordinator-backup.json ?

See this on how to enable debug logging.

@maciekdnd
Copy link

maciekdnd commented Sep 5, 2024

Yes, for me every single time in environment I mentioned previously. With this file present in the data folder, z2m (1.38) will fail to start.
As I mentioned before, there are 2 socket connections (on controller UI socket section) when z2m is trying to connect after controller restart/power cycle/ z2m restart /crash. Even when I restart controller again z2m is unable to connect, there is no other way as deleting file and restarting service.

Unfortunately this site is nearly 500 km from my place right now and I should not restart or update anything there (especially remotely) as if something goes wrong, I won't be able to go there to fix it (this is production environment). I can do it as soon as I visit this place, but not before end of the month. Maybe, if something goes wrong, or restart will be needed I can do it then.

@mirkochip88
Copy link

After months I finally managed to get to grips with it. The problem was on my proxmox node, which uses two nics, one of which had both home assistant and the vm with docker and z2m. By moving the VMs to the other nic I no longer had any problems. I noticed this because I also had problems on other VMs. It seems as if the nic, especially with peak CPU loads, stalls.
All resolved for now, I also migrated the main instance to Ember and it runs perfectly.

@Koenkk
Copy link
Owner

Koenkk commented Nov 25, 2024

@mirkochip88 that's useful info! Would you mind making a PR for this page?

@Ricc68
Copy link
Contributor

Ricc68 commented Dec 30, 2024

@mirkochip88 I'm trying to understand the core of your problem with two NICs.
I see you had lot of load on one NIC which was kind of "overloaded" with VMs but in general, the only problem I would see in a solid NIC (for example an onboard Intel NIC) would be gitter and delays due to various network streams competing, not stalls and disconnects.
What kind of NIC was problematic in your Proxmox? Was the problematic NIC an USB NIC for example?

@oerix
Copy link

oerix commented Jan 6, 2025

Any update on this please?

@mirkochip88
Copy link

@mirkochip88 I'm trying to understand the core of your problem with two NICs. I see you had lot of load on one NIC which was kind of "overloaded" with VMs but in general, the only problem I would see in a solid NIC (for example an onboard Intel NIC) would be gitter and delays due to various network streams competing, not stalls and disconnects. What kind of NIC was problematic in your Proxmox? Was the problematic NIC an USB NIC for example?

It looks like a hardware problem with that network card. Now I have moved all the VMs to the other NIC and everything works great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ember Issues related to ember driver problem Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants