-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISM module stops sending data although ism7mqtt is still alive #115
Comments
Why die you remove "When restarting the ism module via browser and local IP address"? |
As you may have already observed: I tend to write way too much to describe something. ;-)) I just thought that the sentence might be easier to read without the information how I restarted the ism module. But yes, restarting automatically by browser emulation might be the last resort if nothing else helps. I already changed yesterday evening both, step 1 and 2. So my ism7mqqt runs in a Docker container now (:master image) and the access point is disabled. My hope is that point 1 solves the issue. I will report... |
It happened again tonight. :-\ This time 16 hours after start. I have no idea what to test anymore either than disabling Smartset portal which I do not want to miss at present. At least I found out how a reboot of the ism7 can be initiated automatically. You have to send a HTTP POST call to following URL:
with payload
The URL is hidden behind a .htaccess authentification (user: admin, password: your ism module password). So I will maybe implement a check whether a few of the most chatty sensors aren't updated for a longer time and than initiate the reboot POST call via P.S. |
Perhaps you should deactivate the ism7mqtt to find out the reason and only run the Wolf Smartset on the PC with a similar scope when retrieving data. If it stops there, Wolf will have to take over. If not, the ism7mqtt queries are the cause, as there are various differences in the XML... |
Letting the Smartset PC app run for multiple days is nothing I'm really in a mood for. Especially as I cannot use ism7mqtt during that time. But maybe I have to go that way. Beforehand I will execute another test. I observed, that in case you do not explicitly set an interval parameter, the |
Yes, the first step, of course, should be to create XML that is as identical as possible. |
In order for an regular update of my CHA07 i had a technician of wolf at home today. I reported the problem and he confirmed that developers of wolf investigate in it since several month now. So i guess we have no chance till an update of ism. Best way is restart ism automatic if errors in communication occur. For me nothing worked. I disabled wolf portal, i reduced observed parameters a bit (not much at all, but verry often updated an useless ones, like time of BM2). The technician confirmed also, that only the mqtt api is involved. The ism is rechable by webinterface and Wolf Smartset Portal at all times. This are my errors:
|
Last try: im going to connect my ism7 per Ethernet (LAN) Not WLAN anymore. I‘ve heard that this could be a problem.
|
I don't think Wolf cares about a mqtt api.. did you mean: that only the ism7<->Wolf Smartset App (not Portal!) part of the ism7 is involved..? |
You seem to have a different problem than me (or at least a different behavior of the same problem). In case my ism module stops sending I do not have any error in the log, ism7mqtt is up an running and even receives the keep-alive messages. Also the Smartset portal cloud is not receiving any data anymore. It feels a bit like the ism is sending the data to a not anymore existing connection. P.S. |
But there are keep alive messages, or aren't they? So the connection is fine.. |
For me it could be the solution. After i changed the setup to LAN i‘ve noticed that my server has already a static ip, but the homeassistant vm in VirtualBox does‘nt. So i set a static one in WebGUI of HomeAssistant. Since yesterday i had no problems with the datadelivery. (I have still exceptions in the log like:
But the watchdog is able to handle that now.. |
Yes the keep alive do still arrive. Today after two days I had the problem again. :-/ In the log file nothing special happens before only the keep alive message do arrive. The only thing I observe is, that it mostly seems to happen after one of the "normal" network Exceptions lead to a restart of ism7mqtt. Not directly after that. But some 2 to 10 hours. The next thing I will test is disabling the portal connection. But I have to wait for that until mid or even end of September for some reason. Nothing more I can do at this point. :-/ Maybe restarting the module once a day via curl when I find time. |
My ism7mqtt runs with debug flag enable since a few days. It might be that the "stop sending" problem always happens a few hours after the "typical" IOException problem (not sure yet). Therefore I tried to analyze the exceptions a bit in the log file. The Exception always occurs in my log when the data point I even do not know what the attribute is for. It is not shown on the BM-2 module itself. Only the "normal" I do not have a lot hope that this will change anything. But who knows... |
@krusta4711 can you share the debug log? |
Sure. I put the whole file with 6 days logs. I tried to anonymize it. I hope I did not miss anything. The last time the ism module stopped sending but keep alive messages are still logged was Aug 23 12:26:48 Regarding the IOException I found one occurrence were the edit: |
After looking at your log, the proxy log of the latest smartset app and its behaviour and #57 (comment) I'm pretty sure, that the portal connection and amount of parameters is the root cause. I've pushed some updates to automatically remove duplicate telegrams and empty parameters, but that may not be enough. It looks, like your parameter.json contains everything generated by ism7config. Can you try to removing unused parameters like If you still get timeouts and errors with the latest version, I'd like to see how the official app behaves - so you'll need to run ism7proxy and connect the app to the proxy instead of the ism.
Maybe it's a better idea to look into your root cause. Is #83 the reason, or do you have a different problem? |
Thanks for the work, zivillian. :-)
I already removed a bunch of parameters (e.g. the whole ism module). But I'm currently still at 249 properties. So yes, I will do another - and this time more drastically - cut down. For what it is worth: I do not have any problems writing data, which was the main issue in #57.
I will do. My plans are:
#83 does not seem to have anything in common with my problem. I saw the times when I started with ism7mqtt (so they were send) but removed them from properties because I do not need them. All parameters I want to see are there and writing works too. So my root cause is simply that the ism module stops sending data completely. |
I've closed #112.
Do you still have the cloud connection on? If yes, I'd suggest to turn it of as it is known to cause problems.
Great - this means we have another issue.
|
Thank you again, @zivillian for the time you are investing. :-) I do not take it as granted!!
Yes it is still on. I cannot turn it off until late September (meanwhile maybe even October) for some external reasons which would take too long to explain here. It is what it is and I take it as "opportunity" to test different scenarios with cloud connection enabled. Older FW/HW of the ism module works fine with cloud connection enabled (beside the IOExceptions which do not really harm). Maybe we find the reason why the new FW/HW behaves different. If we do not find any way to resolved the "total stop of sending data" until October, turning off the cloud connection will be something I'm very happy to test.
Yes, I also mentioned before the test that I doubt that it will help. But as it already ran more than 24 hours fine, I wanted to give it a try instead of stopping the test. Every information gathered might be helpful. Spoiler: we are both right. It did not help ;-P. For more information read below the first line.
Yes you are right again. I did a quick check yesterday and most of the values of the CHA are important. I guess I can cut down a lot of the Controls (home assistant wording) for writing data . But the Sensors are definitely needed long term. On the other hand, if it would work well with less parameters, it would of course be a better situation missing some information than what I have now.
That would be a tough path. But yes, maybe that is something to be done in the end. I would still do the other tests beforehand as they are easier. The testing is ugly anyhow. I need to wait days before I know the outcome of any change. That said... here is what happened lately: My ism module stopped again sending data (current test scenario: removal of parameter That is a new insight, as until now when I restarted ism7mqtt manually it did not help. Even when restarting 2 or 3 hours after the problem occurred. The data was still not send. So I always had to restart the ism module itself. It seems - when waiting long enough - there is at least some kind of self-healing. Ok, ok, 12 hours are a lot. But nevertheless... better than not-self-healing at all as I thought it is. But this was not the end. The ism module stopped again sending data at 15:00 today (10 hours after the automatic restart in the morning). At 15:35 an IOException occurred and after the automatic restart everything was fine. So this time it was self-healing after 35 minutes. I will let run this test at least for another day. I am too curious to see what happens from now on. How frequent will the issue occur after the first two occurrences behaved totally different time-wise? Will it stop self-healing? Or will the problem maybe even disappear at all after hick-ups? Very unlikely... but who knows. ;-P @zivillian At the beginning it looked like this:
...and later like this:
I guess it is a normal behavior with the short value counted up and maybe even TCP standard. But maybe the new ism module FW has problems with some white-space characters? Edit: the last keep alive messages before a problem seems to be different each time. So this might not be something to investigate after all. |
Or maybe two or three...
What you see is the "printable" part of a binary message - I was too lazy to make it look useful so what you see is "something" to be sure that keepalive is still working |
Intermediate result of my current test: Life is strange. :-P Since the last period of missing data, everything works like a charm. Since nearly 54 hours I have neither any IOExceptions nor the "total stop of sending data" issue. I never had such a long period without any issue before . So maybe the solution is just waiting and sit it out :-D I even started today clicking around on the Wolf Smartset mobile phone app, just to force the cloud connection to connect and to force problems. The timeline of my current - still ongoing - test looks following:
Just to emphasize it: when I restarted ism7mqtt manually in case the "total stop of sending data" problem occurred, it did not help. I had the reboot the ism module. So just waiting for an automatic restart of the ism7mqtt seems to have a self-healing effect. I will have a deeper look into the log files at the weekend to see whether there are differences in the communication before and after the self-healing. I will let run this test a few more days as long as no issue occurs. After that I will try the new newest version of ism7mqtt on master and after that a reduced set of properties. So I split my point 2) of my original test plan (see #115 (comment)) into 2a and 2b. That is more effort but I think it is the cleaner way to find the root cause (not mixing up two changes in one test). P.S. |
Newest information: On 30th August at 13:30 (ca. 70 hours after the last issue) I had and Exception in the log and an automatic restart of ism7mqtt. Everything worked fine before and after the restart, though. The Exception type was a new one to me. I have never seen it before in my log file. Instead of the "typical" IOException, it was this one:
I checked it in my log file and it was also the very first time I have an |
I do not see big differences between the first start (15 hours of not sending data) and the third start (70 hours without any issue). I just observed slightly changed
According to the code I will send you the log file via e-mail. |
I also had another quick look into my old log file of the Smartset PC app. There it looks like the pull and push requests are handled sequentially. So the PC app sends a request for one single device, waits for the response and only after the response sends the next request. ism7mqtt seems to send at least the pull requests in parallel. But as the ism module is responding to all messages I do not think that it is an issue. |
Today I stopped my current test run. No other issue occurred. So the ism module healed itself after the first two TSSD and ran 122 hours without any further stop of sending data. I hope that the self-healing was no coincidence täbut happens every time. ;-) A few minutes ago I started test 2a) of my test plan: I did not reduce the amount of parameters yet. So let's wait what happens. :-) |
Since i had changed the connection of ism to LAN and setup a static ip to home assistant running in Virtual Box plus Wolf updated my cha (outside device, no update to BM2 or ISM7) i had only one failure of transmitting the data. By the way, just for my interrests, maybe it has nothing to do with this behavior: |
My Router is showing everything of the ism module (IP, MAC and even client name ("Espressif Inc")). My ism module is configured to use DHCP but with a static address configured in the router. |
Intermediate result of my current test (I hope I do not get on your nerves ;-)): Since I started my test "2a" (= newest main-branch version from zivillian with some optimizations) on 1st of September, I did not have any problem of "total stop of sending data (TSSD)". I even had none of the IOException again I really had a lot (which - again - do not bother me because after an automatic restart of ism7mqtt everything works fine again). But I had two other Exceptions leading to automatic ism7mqtt restarts instead since 1st of September: An "InvalidDataException" I already had before (but rarely):
And a SocketException I did not ever observed before:
To emphasize it again and again: The problem that let me create this issue was the "total stop of sending data" of the ism7 module which could only be healed by rebooting the module.. This did not happen anymore since the first self-healing of the problem. I'm not sure whether any of the changes I tried are responsible for the now smooth running ism7 module... or whether the "self-healing" of the ism7 module changed anything in the module itself to let it run relatively smooth now (e.g. omitting a problematic data point). Regarding my next test "2b" (cutting down attributes) I had another look: I do not find a lot attributes I will not need in a long term. So as long as @zivillian is not too curious to see what happens with a drastically reduced set of parameters, I would let the current test run until the TSSD happens again. |
P.S. He is even not using ism7mqtt at all and has an older HW of the ism7 module than me. So as I already assumed in the first paragraph of this ticket: the problem seems to have nothing to do with ism7mqtt at all. And another user was posting, which also has the problem. SO at least I do not feel alone ;-): |
These two exceptions are pretty common. You can reproduce it by running a ism7mqtt session and than restart the ism7 module. So maybe your module itself or something else is going to restart the ism7, wich also prevent the tssd error. @zivillian @b3nn0 would it be possible to build a new version of the experimental homeassistant addon? |
So most likely the problem will come back to me too. :-| Did you try to "sit it out" and wait for 12 hours or more? Since I did not intervene manually during the last two TSSD, I did not have the problem again. The last time it occurred on 27th August. But most likely that is only luck and it will occur again. As I have written above: |
I can’t try, because my module ist not working for hours until i reboot it by myself. Im trying the new version of ism7mqtt now. |
Mine did self-heal after over 12 hours. So maybe it is worth a try waiting longer. |
I tested it out with the newest version of ism7mqtt and with a reduced parameterset also. |
I had two TSSD myself this week. The fist one I had to solve by manual restart because I needed contact to the heat pump for some configuration (I had no time to wait for self-healing). The second one was self-healing after 15 hours. With the information from CEXC that a reduced set of attributes does not work either and with the information that there are even users not using a local connection at all but nevertheless get TSSDs, I'm pretty sure we cannot do anything against the root cause. :-\ It is most likely just bad ism module HW or SW. So the next step for me: I will try out this weekend whether it is possible to restart the ism module per HTTP call (should be). If yes I will write me an automation to restart the ism module in case a chatty sensor did not change for a longer time. Not ideal, but it is what it is. |
I have to say: I've never had this problem by now, but my ISM7 has been running via LAN for about 10 days only, portal connection is also active. Firmware version 4.40.1. Running as a service on 64 bit Debian with At the beginning I had other problems, no LAN connection at startup (no DHCP address received) or loss of LAN connection during operation. Of course, this also resulted in errors for all connections. Now with static IP and with 15m cable on an old Fritzbox and on another LAN port of the Fritze, 1GBit instead of 100MBit, this no longer happened. Has been running for 3 days in a row without any problems since the last start of ism7mqtt. Also in the days before I never got this issue. Yesterday the Wolf Cloud was no longer updated, old temperatures etc. were visible. But commands via the app arrived at the IDU/BM2 and were processed. About 6 hours after the cloud issue I got a connection failure I'm using a self compiled version of ism7mqtt from a checkout from github, code is equal to release v0.0.17. I only have the CHA7 for heating with one heating circuit and a BM2. Wolf config 11. But if this would occur, I would use the parameter 220032/Uhrzeit to automatically reboot if the mqtt topic would not be updated for some minutes. Not the nicest thing, but sufficient, I think.
|
it is, yes. you could curl the url with the needed authentication admin/ism7 password. |
I managed to restart the ism module from home assistant. :-) My HA runs on a raspberry PI, so I used the curl command for executing HTTP POST requests. I created an automation that checks every hour, whether one of the chatty CHA sensors did not change for more than 90 minutes. If that is the case, the ism module will automatically be rebooted. In case the ism module reboots, my ism7mqtt also reboots automatically and works again out of the box. So I do not need to stop or start ism7mqtt in the HA automation. I'm pretty sure that this is also true in case an TSSD happens (at least for my system). If not, I would additionally have to stop my ism7mqtt beforehand. My shell_command action to execute the curl command to reboot the ism module:
My automation to check whether the ism module is in TSSD state and to trigger the reboot:
I'm curious to see what happens during the next TSSD. ;-) |
The Uhrzeit/time would of course be the most predictable attribute. But I do not want to have this attribute in my parameters. The "Zuluft" and "Abluft" temperature parameters are chatty enough for me at present. Normally they change every minute. Only in the night they are stable sometimes for a maximum of 30 Minutes. That is enough for me at present. But if I want to react faster in the future, the Uhrzeit/time is a good idea. 👍 |
Since my last TSSD, wich is very disappointing, im running on the same setting right now. I‘ve added an automation, if Uhrzeit/Time not updated for 5 Minutes, than reboot ISM7. Thank you @krusta4711 for the example of the shell_command, works great. I also use the very shortened Parameter.json of @allcoolusernamesaregone except 3 additional values i need. Until now.. no TSSD at all. |
Für die, die's interessiert: hier noch meine rein manuelle (ohne Auto Discovery) Config für Homeassistant für o.a. Parameter..
|
This morning I had the first TSSD with my new reboot automation. It worked well! :-) The curl command - and so the automation - ends in an error messageq because the dumb ism module is not responding to the reboot HTTP request but directly shutting down. But that is no problem at all. As my warm water circulation did not start this morning because of the TSSD, I might switch to the approach of taking the Uhrzeit/time as trigger for being able to react faster. ;-) |
... ich hatte meinen ersten überhaupt auch gestern. Warum auch immer, war nicht mal 1 Stunde nach einem gelegentlich vorkommenden Connection reset und autom. ism7mqtt Serviceneustart.
|
Wir haben dich angesteckt. ;-p |
My workaround with automatically restarting ism module in case od f a TSSD works perfectly. I hat a around 5 TSSD last week and all were discovered and healed by a reboot. Today I changed the discovery from parameter Ablufttemperatur to parameter 220032/Uhrzeit for being able to react faster. I'm done with this topic for now. I think the problem is not fixable in ism7mqtt and this issue can be closed for now. If someone has new insides we can reopen it. |
I already had this problem during the solution phase for issue #112 (see #112 (comment)). Now it occurred again. I think it has nothing to do at all with ism7mqtt. But maybe someone can help to figure out the root cause.
My problem:
My ism module (Wolf Link Home) stops propagating data to ism7mqtt after some time (first time it was after a few hours after start, this time 2 days after start). The output of ism7mqtt does not state any issue. It is up and running. Restarting ism7mqtt does not solve the issue. When restarting with debug flag I can see that none of the data request is responded to (also not the initial “pull” request). But the keep alive works and logs every 60 seconds. So at least the ism module connection is not dead.
When restarting the ism module and restarting ism7mqtt after that, everything works fine again.
I had this issue two or three times since the two weeks my heat pump is installed. As the connection to the Wolf portal also causes other issues with the local connection to ism7mqtt (see comment #112 (comment)) it might be the best candidate as being the root cause. Unfortunately I need the Smartset cloud app at present for setting my time programs. So I do not want to turn off the cloud connection at present.
If someone ever had the problem and solved it, or if someone has an idea what I can test (despite cutting cloud connection ;-)) it is welcomed input for me.
Testing will be ugly as I have to wait at least a week after every change. But it is what it is. My current plans:
Cheers
Volker
The text was updated successfully, but these errors were encountered: