-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MQTT client timeout #140
Comments
I'm gathering that the hundreds of these messages, the number is changing? If so, yes, I have a idea. We are not yielding to allow the actual network processing to happen. If you can confirm the above, the fix should be simple. |
Yes, exactly. The numbers are changing for every row. |
Same issue here. It also happens sometimes on startup of the qt-ozw-contianer. It seems like when the mqtt-disconnect happens, the ozwdaemon is stuck using 100% CPU. I thought it would exit (and restart the ccontainer?) |
I am also experiencing this issue, occurring when the ozw container is starting up. Resulting in a client timeout and disconnection. In the previous zw1.4, ozwlog shows I have a very chatty network so assuming this is causing too many messages, and ozw is delayed in sending a keepalive to mqtt. |
As with OP, the only way for the ozwd to remain connected to mqtt (not getting timeout) is to trash the ozwcache file. |
Any progress on this one? I have a large network as well and can't get past the initialization step without this timeout and eventually shutdown of ozwdaemon.
|
@jlengq Check your OZW container volume location, it should be right there. |
Thanks, I found it! Does unfortunately not solve my problem , I have 100+ nodes and the interview process launched at startup seems to choke the MQTT network somehow, resulting in the timeout. Deleting the ozwcache only seems to start the whole process over again? |
@jlengq yes, trashing the file starts the whole discovery process again (no loss of any node data); but sometimes that's what is needed to get it fully operational. |
I've now moved from qt-openzwave to zwave2mqtt. I noticed that during some operations while the controller is waiting for replies and they take a long time to come (and timeout, usually), z2m also shows as "disconnected", but after a while it reconnects. |
Has anyone tried increasing the MQTT client timeout in ozwd to see if that works around the problem? Maybe increasing the timeout would allow ozwd to finish whatever it's doing before disconnecting (assuming it's not in an infinite loop). I think it should be as simple as adding a call to qt-openzwave/qt-ozwdaemon/mqttpublisher.cpp Lines 54 to 63 in 89cc0d8
Not sure if there's any downside to increasing the time besides not reacting as quickly for real timeouts. |
So far with limited testing, I can confirm with @kpine's suggestion, ozw is able to start up with a pre-existing ozwcache file, whereas in the past, I would 100% of the time get timeout disconnects until I deleted the ozwcache file. |
This is a workaround for OpenZWave#140 to ensure the daemon is able to start up while we are waiting for a proper fix
Just added a PR for that. I still think there are better fixes to be made, but at least it will make the daemon start up. |
I too have encountered this issue. I have rebuild my entire HA setup around using Docker such that I can test if turning off logging and what not improves the situation (which it did, but it still fails to talk to MQTT sometimes). I also took a look at how the ping/pong works and from what I can tell it's all built into qtmqtt using a timer, the only reason I can imagine that the timer wouldn't fire is if the event loop was blocked. |
I am having the same issue, 100+ nodes ozw daemon won't stay up. |
I have just added additional devices to my zwave network. I am now having the same issue. ozw daemon goes offline during startup. |
So I spent an ungodly amount of time figuring out how to get a local debug build of ozwdaemon running on my MBP. I can't exactly reproduce the issue on my MBP (presumably because its too fast to trigger the issue), but what I do see is that the MQTT timer for pings is being invoked as expected (I also set up a 1s timer which triggers precisely on time). I also confirmed that blocking the event loop for a period of time definitely causes the MQTT timer not to be invoked. After digging into the code a bit, it looks like OZWNotification schedules it's events to be processed by the main thread using Qt::QueuedConnection, I have a suspicion that what's going on is that Open-Zwave is generating events so quickly that the Pi cannot keep up on the main thread. This causes the queue of queued signals to become saturated with queued events from OZW, which leads to the timer not being able to get in to fire. Something else that makes this seem likely is that I did some profiling and a huge amount of time is spent serializing events for MQTT, which is likely what is making each event take so long to be processed on the main thread. Assuming this is an accurate assessment, I see a couple of paths forward.
What do you think @Fishwaldo ? |
I totally agree that #185 is just a workaround, and for me, yout option 3 seems like the best one. Unfortunately I don't have enough experience with C++ and QT to help. Regarding option 2, there has been some discussion in another issue (could not find it here and now), and a problem is apparently that it is hard to keep track of the states (and what messages in either direction that might be lost) if you just do a MQTT-reconnect without restarting OZW at the same time. I really hope fishwaldo is well, and will be back from his involuntary break soon... |
This is where I am at right now :)
So the classic CS problem of thread starvation? Man I am rusty with C++, but having non-working lights is making me want to pick it back up. |
Until there's an actual fix for this issue, is there any way to take CPU resources from the core-openzwave docker container to artificially slow it down so that it doesn't overwhelm mqtt? Perhaps Portainer has something useful. I'll check this idea out tomorrow, because otherwise I'm just dead in the water. |
what I did on my end as a workaround is to install mqtt locally on my pi and then bridged it to my main mqtt server. It seems to fix the issue at least for me. I also tried recompiling ozwd to extend the timeout but that didn't quite solve the issue completely. try this docker compose version: '3'
services:
mqtt:
image: eclipse-mosquitto
container_name: "mqtt-bridge"
volumes:
- ./mqtt:/mosquitto
- ./mqtt/data:/mosquitto/data
- ./mqtt/log:/mosquitto/log
ports:
- "1883:1883"
- "9001:9001"
restart: always
ozwd:
image: openzwave/ozwdaemon:latest
container_name: "ozwd"
depends_on:
- "mqtt"
security_opt:
- seccomp:unconfined
devices:
- "/dev/serial/by-id/usb-xxx"
volumes:
- ./ozw:/opt/ozw/config
ports:
- "1983:1983"
- "5901:5901"
- "7800:7800"
environment:
MQTT_SERVER: "pi.local.net"
MQTT_USERNAME: "[redacted]"
MQTT_PASSWORD: "[redacted]"
USB_PATH: "/dev/serial/by-id/usb-xxx"
OZW_INSTANCE: "1"
OZW_NETWORK_KEY: "[redacted]"
restart: always Here is mqtt config persistence true
persistence_location /mosquitto/data/
log_dest file /mosquitto/log/mosquitto.log
password_file /mosquitto/config/passwd
allow_anonymous false
# External MQTT Broker
connection zpie01
address hassio.local.net
topic OpenZWave/1/# both # <-- 1 is the id of one of many of my instances update as needed
remote_username [redacted]
remote_password [redacted]
|
Depending on the MQTT server you have, the performance of your device and
the size of the network, you could improve things with those kinds of
changes. With my network of 58 nodes, even with a local MQTT and Pi4, it
couldn't complete quickly enough.
I'll try to remember to push my docker image with Olen's workaround (which
is perfectly reasonable to use in "production") tomorrow morning.
Cheers, Brett
…On Tue., Dec. 1, 2020, 8:13 p.m. m3ki, ***@***.***> wrote:
what I did on my end as a workaround is to install mqtt locally on my pi
and then bridged it to my main mqtt server. It seems to fix the issue at
least for me. I also tried recompiling ozwd to extend the timeout but that
didn't quite solve the issue completely.
try this docker compose
version: '3'services:
mqtt:
image: eclipse-mosquitto
container_name: "mqtt-bridge"
volumes:
- ./mqtt:/mosquitto
- ./mqtt/data:/mosquitto/data
- ./mqtt/log:/mosquitto/log
ports:
- "1883:1883"
- "9001:9001"
restart: always
ozwd:
image: openzwave/ozwdaemon:latest
container_name: "ozwd"
depends_on:
- "mqtt"
security_opt:
- seccomp:unconfined
devices:
- "/dev/serial/by-id/usb-xxx"
volumes:
- ./ozw:/opt/ozw/config
ports:
- "1983:1983"
- "5901:5901"
- "7800:7800"
environment:
MQTT_SERVER: "pi.local.net"
MQTT_USERNAME: "[redacted]"
MQTT_PASSWORD: "[redacted]"
USB_PATH: "/dev/serial/by-id/usb-xxx"
OZW_INSTANCE: "1"
OZW_NETWORK_KEY: "[redacted]"
restart: always
Here is mqtt config
persistence truepersistence_location /mosquitto/data/
log_dest file /mosquitto/log/mosquitto.log
password_file /mosquitto/config/passwdallow_anonymous false
# External MQTT Brokerconnection zpie01address hassio.m3ki.nettopic OpenZWave/1/# both # <-- 1 is the id of one of many of my instances update as neededremote_username [redacted]remote_password [redacted]
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#140 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAML466463XNO4Q4OWZ3YMDSSW5FRANCNFSM4PLIQ25Q>
.
|
It looks like adding " --cpu-shares 512 " might be a good start. Would any of you folks here know how I can get that into the command line that HA is using to start up the addon? |
For me I have 4 pis with 50-100 nodes or so and nothing would work. Even on the test ozwd instance where there were no nodes (what gives !?) OZWD would just disconnect with or without the keepalive modification in the code, iff the mqtt server was external to hassio. eclipse/mosquitto on docker on a beefy VM. I have a feeling now that there is something going on with a network connection between the pi and the mqtt server. Things worked better if pi would connect to internal mqtt addon on hass server. If you need to make sure startup completes you can wipeout ozwd cache on your pi that might help, just don't reset your zwave stick and all nodes will come back. For me my network has been rock solid as of this morning with a setup I mention in my previous comment. |
By network connection, i meant there is something going on with how ozwd handles traffic and/or congestion. Even with the keepalive increased network wouldn't be stable. At least in my case. |
I'm running a supervised install on Ubuntu 20.4 on an RPi4, with the Openzwave addon and the MQTT addon. I've been working to get everything back to normal after switching from the old all-in-one Zwave integration, and I thought I had finally got things right when this issue cropped up. |
A friend did this build yesterday when we were trying to troubleshoot this issue you can try his docker Keep in mind it kinda worked for me but I would still experience intermittent disconnects every hour or so or when network got busy. I am now back to the original docker though, with mqtt running on same host and bridging mqtt to mqtt with hassio on a separate VM. So far so good. You can also wipeout ozwd cache file of your ozwdaemon and see if you can get your network back up. |
Adding several more devices to my network caused this problem for me last weekend. I'm around 60 devices now with more to add. I'm running using VirtualBox on an Intel NUC so it doesn't appear easy to make any temporary changes to work around this issue. Last night I switched to Zwave2MQTT and have all my devices connected this morning. I'll have to spend some time renaming devices/entities but at least this will get me going again. |
I really need a fix for this. The system fails too frequently as I am trying to add some door sensors and deadbolts. Deleting the cache file will let it restart, but many nodes lose their names, and some disappear. And it takes forever, anyway. I tried Zwave2MQTT last week, and I switched when I ran into some issue (can't remember what it was now, lol). I may switch back and try it again. I think one issue was that it was clear from what I read that it was an orphaned project, and that the OpenZwave add-on with the OpenZwave integration was the road forward. Is there any way to get the developers' attention quickly, or should I assume that this state of affairs will persist for a while? |
Hey @psgcooldog, |
Hey @brett19 QString mqtt_keep_alive = qgetenv("MQTT_KEEP_ALIVE");
if (!mqtt_keep_alive.isEmpty()) {
this->m_client->setKeepAlive(mqtt_keep_alive);
} I am having a hard time setting up a crosscompilation buildchain. How did you get it to work? |
Sorry for the delay, forgot to push this the other day. Here are some armhf (32-bit ARM) images including fix-185. |
I decided to punt, and converted everything over to ZWave2MQTT. It was quite the time-consuming process, but this particular problem is no longer an issue for me. |
@psgcooldog I also jumped ship for z2m, but ozw is far superior when it comes to handling scene events from switches. It actually comes into HA as a scene event and not a regular state change. And z2m sends 4 state changes per button press, so you need to go deep into the event to figure out if it really is a new scene event. tl;dr: would prefer to use ozw but had to switch to z2m because ozw can't handle more than ~20-25 devices before it breaks down... |
@karl-gustav I was just trying to figure scenes out with z2m. I have Inovelli red dimmers that allow multi-tapping to create scenes. Had it working fine with ozw, but no luck with z2m. I'll probably give up for now and make the switch back to ozw when this bug is fixed. At least all my regular light automations are working again. |
Hey Everyone, can you confirm that you still have issues with the image I posted above containing fix 185. If you can upload logs, that would help track down your specific issue beyond what we've already discovered. Cheers, Brett |
FWIW, I run ozw with 61 devices, and it has been running solid for 7 weeks. But I have only added a few new devices during that time, not removed any, and not done any network refreshes or other tricks. |
@brett19 I'm not that familiar with Docker. Is it possible to use Portainer to replace my current ozwdaemon with the one you created? |
You would need to shut down the container and spin up a new one with the same configuration but different image (as far as I know). I personally use docker-compose to make it easier to do that. |
Maybe I’ll have some time over the holidays to figure that out. For now z2m is running well. The only thing I can’t figure out is scenes from my dimmers. Is @Fishwaldo the only person that can release a new version of the addon? Given that the HA roadmap seemed to be heading down the ozw path, it seems pretty risky if there is only one person that can release bug fixes/workarounds. |
A friend compiled "this" fix and added my fix to add MQTT_KEEP_ALIVE environment variable to change timeout as needed Before this fix my setup would still restart if I did "Refresh node" MQTT_KEEP_ALIVE: "360" My config is here (keep in mind I am using a local mqtt on the pi that bridges to a main mqtt sever to make sure ozw doesn't restart if HASS instance is restarted version: '3'
services:
mqtt:
image: eclipse-mosquitto
container_name: "mqtt-bridge"
volumes:
- ./mqtt:/mosquitto
- ./mqtt/data:/mosquitto/data
- ./mqtt/log:/mosquitto/log
ports:
- "1883:1883"
- "9001:9001"
restart: always
ozwd:
#image: openzwave/ozwdaemon:allinone-latest
image: firstof9/qt-ozwdaemon:latest
container_name: "ozwd"
depends_on:
- "mqtt"
security_opt:
- seccomp:unconfined
devices:
- "/dev/serial/by-id/usb-0658_0200-if00"
volumes:
- ./ozw:/opt/ozw/config
ports:
- "1983:1983"
- "5901:5901"
- "7800:7800"
environment:
MQTT_SERVER: "localhost.mydomain.net"
MQTT_USERNAME: "[redacted]"
MQTT_PASSWORD: "[redacted]"
MQTT_KEEP_ALIVE: "360" <------ add keep alive like so
USB_PATH: "/dev/serial/by-id/usb-0658_0200-if00"
OZW_INSTANCE: "3"
OZW_NETWORK_KEY: "[redacted]"
restart: always |
Hi, I have the same issues. Are there any solutions in sight? Or at least a temporary workaround for people like me, who run the official image on a raspberry pi? I've tried switching to zwave2mqtt but I couldn't figure out how to add the devices and entities to HomeAssistant. Auto discover didn't work either. And I really don't want to have to do everything manually. Is it correct that everyone who has a sufficiently large network using the OpenZWave Plugin is experiencing this issue? Oh and also I'm quite new to all of this. So I'm JUST learing how to access certain files in the docker containers, getting OS SSH Access etc. This is how I've managed to delete the ozwcache file at least, so I don't have to install the whole thing from scratch. But everytime I "reset", start up the OpenZWave Plugin and let my network run for a bit it seems to randomly miss some entities. Is there a way to restore the ozwcache from an old file without running into the timeout issue, maybe? I'm desperate at this point.. so I'm thankful for any help :) |
@genome-prime I was unable to figure out how to get the workaround installed on my setup. I ended up switching to z2m and was able to get auto discover to work. It did require that I re-enter all the entity names, which was a bit time consuming. So far it has been very stable. I have about 80 zwave devices. I had trouble getting the central scene figured out with z2m. I'm now looking at Node-Red with an MQTT node to grab the central scene info from there. |
I'm giving up in frustration... I gave z2m another shot. This time my devices were auto detected and showed up in Home Assistant. Unfortunately
I know there are customizations but I couldn't figure out where the config files are being stored and how to apply them and honestly I don't wanna have to customize anything, since OZW was detecting everything fine on its own (except my Fibaro Button which I can live without for now) I know this might not be the right or best place for it but I need to get this off my chest: I guess I should at least say some positive things too: zwave2mqtt:
OpenZWave:
|
Your docker image seemed to fix the issue for me! |
I have a new and fresh installation of Home Assistant on Raspberry Pi per 25th December 2020 and have the same problem: Should this bug even show up on a totally clean install? |
Seems like it is the Aeotec Z-Wave Gen5 stick that doesn't work on Raspberry Pi4. At least 3 (of 4) hardware versions of the stick: |
Did you try this solution? |
No, but I will get a cheap unpowered USB 2.0 hub. That will do the trick. Also it makes the stick further away from the Raspberry which is supposed to reduce interference. |
@m3ki - thank you. That docker image has seem to have done the trick for getting my network back online. It was super frustrating - I could see through the admin gui that everything was up and running, could see it posting into MQTT, but for some reason HA kept saying that everything was "unavailable"... |
I am running docker images ozwdaemon:latest and eclipse-mosquitto:latest as of yesterday on a Raspberry Pi together with Home Assistant 0.113.1. I have about 90 devices and over 500 entities.
As long as I don't really do anything, it's running pretty stable. However, using OZWAdmin-0.1.74, if I push the z-wave network a bit too far, such as healing and/or refreshing too many nodes at the same time, basically clogging the network with messages, the ozwdaemon mqtt client doesn't seem to be able to keep up. All entities in Home Assistant becomes unavailable,
and I see the following in the mosquitto logs:
So I restart the ozwdaemon docker instance, and I see the folling in the Mosquitto logs:
Then it works for just about under two minutes before it gets disconnected again:
From what I understand (please, correct me if I'm wrong) that "k60"-part in the Mosquitto logs means
"keepalive = 60", i.e. the MQTT client tells the broker when connecting that it will stay in touch
with a ping message at least once every minute, and if that doesn't happen, the client will be disconnected.
I increased logging in mosquitto (by setting "log_type all" in mosquitto.conf) and also started ozwdaemon with
-e QT_LOGGING_RULES="*.debug=false;ozw.mqtt.publisher.debug=true"
and I can see in the mosquitto logs
while ozwdaemon continues to print out hundreds of rows such as
for several minutes until it realizes that the connection is gone:
The only way I can get it to stay up is to remove/rename the ozwcache_0xf7b52c8f.xml file and restart, but it doesn't feel like a good solution.
Any ideas on what's going on?
The text was updated successfully, but these errors were encountered: