-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[homematic] No connection after CCU to Openhab connection interruption #8808
Comments
I also have got the same CCU3 configuration and my openHAB is running also on a Raspi and not in a container. I can't remember that I ever had a problem like this if I had to reboot the CCU. Generally the binding contains the necessary coding to automatically reconnect. Maybe the logs can help to figure out the reason:
|
Many thanks for looking into it. I wanted a real case, this is why it took some days until it happened again. Today I had an issue with the network (I played around with some cables on the switch), which caused an interuption of the connection between ccu and openhab (different server) and again the connection was not re-established afterwards (I did not recognize the problem until my wife told me that AGAIN internet is not working ;-) . I will paste a log as you have requested right after.... but here is what I found in the shortened log.... At around 12h the NW problem started (Trace was not activated)
... then I had some hours the D/C..... Here I then fixed the Network connection.... but then I get this kind of log entry....
At 22:41 I started then trace on the homematic bridge... as you proposed
Here is the shortened log file .... !!! REMARK |
@MHerbst :-) Here we are ... I did the following...
To make the bridge work again I have to... I would appreciate, if you could have a look what is wrong. My setup is....
How do I control, if the connection is not working.... |
I will have a closer look into the logs the next days. I had a quick look into the second log. Here I can see after line 678 that the data points are added again. In line 1175 (time 02:07:45.099) I can see that the binding receives values from the CCU. Was this after your manual restart? In the second log I am missing the exceptions regarding the the re-connect problems that I can see in the first log.
I have got two Raspberry 3. One for the CCU3 and one for openHAB. That works really good. Yesterday I installed the latest Raspberrymatic software including a reboot of the CCU. OH reconnected automatically after the CCU was online again. But maybe it necessary to disconnect the CCU from the network for a longer time to get the same problem. Will give it a try. If you want to run the CCU and openHAB on the same Raspi I think you should at least use a Raspi 4.
|
Many thanks for the support Yes, on the second log on line 1175 you can see that the CCU comes up again after the reboot, which I initiated from the GUI. Interestingly the things are seen as connected in the paper UI, but the temperatures in HabPanel and PaperUI are either "0" or are the old ones (they do not react if you put them outside).. I would even have a raspi 4 here but as my VM is still faster than the raspi 4 and backup is so easy with snapshot, I would like to prevent that. |
What I can see from your screenshot is that the binding successfully detects the CCU again and the state is "ONLINE" again. But it seems that the binding does not receive messages from the CCU if a value changes. I will have to check how this could happen.
Interesting idea, could work. I will also check whether I can extend the binding to provide a timestamp of the last time an event from the CCU was received. |
While I can reproduce the problem easily (d/c the router which is between OH2 and CCU3), the restart of the bridge does not fix the problem all the time. While restarting a bridge once a while is a feasible and acceptable work-around, the restart of OH2 is a bad idea, because it makes OH2 unavailable for several minutes. Does this error: [ERROR] [ommunicator.AbstractHomematicGateway] - java.util.concurrent.TimeoutException: Total ti |
Unfortunately I did not have the time to test it in my environment. In my opinion the binding should work as follows:
|
@lobocobra I have the feeling that this problem only occurs for HmIP devices. Do you only have HmIP devices or also some "older" devices? If yes, what happens to them. |
I have both types of devices all mixed. I also saw that....
To exclude configuration errors on my SW, I moved not OH2 and CCU3 to a PI4 with the same configuration as on my productive setup. If I now restart the CCU3 on the PI, then it reconnects to OH2. So it seems that the problem occurs, when the CC3 is silently d/c and then reconnected again. |
I have mostly "old" HM devices and their values are automatically updated after a temporary disconnect to the CCU. But for the HmIP device (a dimmer) there is even a problem to show the correct after a complete restart. Only after changing the level it is updated. Need to investigate it further. |
Many thanks for looking into this! I can confirm now, that since I have Openhab2 on the same pi4 as the VCCU3, the issue did not appear. For the future I would of course want to go back to my old solution as the NAS runs 7/24 anyhow and backup / restore is a dream with VMM. |
Today I had to take to cut the power line of my house. After restart everything worked except all temp sensors did not work. Only after restarting the bridge with...
Did you find out anything new on your side or is there anything I can do? I want to program my heating with data from the CCU but with this current issue, I fear that the connection is far to unreliable to do such things. The only other solution I can think off is to replace the bridge with a MQTT solution or can that be repaired? Many thanks |
I am currently investigating this problem and another one where the device is not updated reliably. I think, I also had a similar problem last week after I had some problems with my router.
If the CCU3 and openHAB where started at the same time it is possible that the CCU was not ready when the binding tried to connect to the CCU and then this caused the problem.
I am quite sure that it can be repaired as soon as I find the cause of the problem. Unfortunately some things in the binding are a bit complicated because the "old" Homematic devices, HmIP and CuxD are using different methods for the communication ... |
I think I found a way to reproduce it. The behavior is a bit strange. After the re-connect I get a timeout each time I try to send a command to a HmIP device. As soon as I send a command to an older Homematic device the HmIP device starts to work again. |
Ahh cool! I saw also time out messages with 15000 ms. Just an update... today (after the power cut yesterday), the temperatures are still frozen at a value of yesterday. So it seems to me too, that once a certain condition takes place (maybe interruption of connection) the timeout blocks successfull communication. Unfortunatly a restart of the bridge does not always work (depends on the way of the exception condition). |
@lobocobra I think, I have found a solution. At least the first testw in my environment looked promising. The code needs some cleaning but then I will create a PR. Are you on OH 3 or would you need a fix also for 2.5? |
Hey that sounds great! I am currently on version 2.5 but as the future is v3, I am ready to upgrade (as I plan this anyhow). |
@lobocobra My current development environment is based OH 3 and if possible I would like to avoid to set-up an additional environment for 2.5. In the meantime I have uploaded a test version for OH 3 (and above) to https://github.com/MHerbst/openhab-addons-test |
@MHerbst Thank you for the fix. I have exactly the same problem, but I'm still running on OH 2.5. Is there any chance you backport to 2.5 as well? Thanks in advance. |
The message regarding "Unresolved requirment" should not appear after the installation of openhab-transport-upnp was successful.
You can check that the binding is correctly loaded with a Karaf console command:
You should then see one entry for the Homematic binding with status "Active" and a version starting with 3.1.0 |
@Thousand81 I will have to test whether I can set up a dev. environment for 2.5 in parallel. But first of all I would like to wait for the merge into the current version. |
thanks for the hint, at least the commands show it is active
and upnp is also active
|
@MHerbst since tonight ~4:50h I still face my issue that no update is send to openhab from ccu. Although the connection shows online. |
I had been waiting for more than an 30 minutes. The HmIP devices did not reconnect during this time. |
Next test, logging set to DEBUG: The IP devices do not reconnect: The log entries: |
After 2 Hours IP Devices are still on error: |
On more hint for testing: After cleaning of the cache via
one has to re-install the UPNP bundle
via karaf console. |
I think, I have found the reason why the patch does not work in your environment. It takes much longer than a CCU3 until the CCU2 is ready again after a reboot. I will have to modify the retry logic and probably make the number of attempts and the wait time between two attempts configurable. |
@MHerbst |
@Joerg-Dr I have planned it for this weekend. |
@Joerg-Dr I have uploaded a new test version (https://github.com/MHerbst/openhab-addons-test). You can now configure the number of retries and the wait time as advanced options in the Bridge configuration: |
Thank you very much for the new version, this solution works fine.
After restarting the Homematic CCU2, connection attempt 17/40 had been successful. I have two suggestions for the binding:
|
@Joerg-Dr
Because this is a different problem and this issue is already closed, I would like to ask you to create a new issue. |
@eikowagenknecht This really looks like the same problem. The problem is that the HmIP service on the CCU needs quite a long time after a CCU restart until it accepts event registration. The release or milestone verstion of the add-on version in the OH 3 tries it immediately after the CCU is reachable again but there were no retries if the event registration failed which may take about a minite. Can you try it the latest version of my test add-in? |
Trying it now. I noticed that the description and the actual code differ regarding the default value. You might want to synchronize that:
and maybe also add the default (20 times) here:
|
Seems to work just fine with the default settings (20x 3s). I notice that the log shows 37x the following entry (this snippet is the last one, the first was 2021-11-02 11:37:50.232):
Probably this is during the reconnection phase? I think it is way too verbose for something as normal as a reconnection occurring. It would be cleaner if there was a warning message like "Trying to reconnect, try x of y" and only if all those fail an error message. That would also give a hint if the amount of retries is set correctly or maybe the timing could be spaced out. Also this is the way e.g. the modbus binding handles reconnection attempts if I remember correctly. This is minor nitpicking though, I'm really glad it works and I can finally restart the CCU without causing to much problems :-) |
Not sure if this is related: I'm now seeing some of those
after the reconnection where everything is "Online" for a while. Some are even more verbose:
So:
Here is the log filtered to the Timeout events.
Right now (~10min later) only one device is still in Error:Comm state and doesn't look like it's changing anytime soon. In the CCU3 itself all devices are online, no service alerts. |
Thanks for the hint. I also forgot to sync it in the PR that I have created some days ago ..
This XML-RPC message appears because the CCU returns HTML code containing a message that it is not ready. Would be much better if it would return a correct error state ... I can try to improve the error handling here. I will be better to create a new issue for this problem.
The timeout exception a bit strange. Maybe the binding sends too many requests to the CCU if there are quite a lot of HmIP devices. Regarding the one remaining device: it would be interesting whether it is always the same device type or whether it is a different one if you test it again. You could also try to increase the timeout value from 15s to a higher value. |
The device stayed in Error:Comm state. It was of type homematic:HmIP-SWSD, so 1 of my 6 identical smoke alarms stayed in this state. Allright, next try:
Aand exactly the same device has problems again. Mysterious indeed. The other 5 have no problems. Also it's neither the first nor last to go offline. But this time no timeout errors in the logs, instead:
I've never seen these messages before so I suspect the mostly-fixed ccu3 reboot bahaviour has something to do with it. If you want I can open a new issue of course. Unfortunately the current state where devices flake in and out after a ccu3 reboot is somewhat worse than before bevause I can not reliably recognize this behaviour with my usual offline detection scripts any more. |
Is there something different in the configuration of this "Rauchmelder"? Maybe a debug or trace log could help.
I agree with you that these messages have to do with the restart. Maybe it is because of some unknown state value.
Maybe this can be improved but this would probably mean some greater changes. Maybe you can bypass it if you wait for some minutes after a discovered disconnect before you check again. This issue is already closed, so it would be better to create new issues for the remaining problems like the XMLRPC messages and the strange behavior of the one Rauchmelder. |
* Replace deprecated constructors * Removed no longer existing settings from the documentation. They were already marked as deprecated since several versions. * Refactored communication with the HM gateway - simplified coding for the communication with the gateway - buffer size for communication is now configurable to avoid problems with too small buffers - Previous solution for openhab#6963 was not sufficient. Should be finally done with these changes * Retrieving the duty cycle is sufficient to check connection - ping requests could therefore be safely removed problems with the automatic reconnection were solved. * Changed to explicit list of Exception Fixes openhab#8808 Signed-off-by: Martin Herbst <develop@mherbst.de>
It's configured exactly like all the other "Rauchmelder". I'll try to get a debug/trace log and open a new issue then. |
* Use globally unique id for registration of callback to allow ... the connection of multiple OH installations with one CCU. The bridge id is not sufficient for this purpose because it is same in all OH installations. Signed-off-by: Martin Herbst <develop@mherbst.de> * Retry callback re-registration after connection is resumed Some services on the CCU need longer to start and are not available immediately after the connection to the CCU has been resumed. Improves the solution for #8808 Fixes #10439 Signed-off-by: Martin Herbst <develop@mherbst.de> * Description was missing. Signed-off-by: Martin Herbst <develop@mherbst.de> * Changed setting name and description to avoid confusion Signed-off-by: Martin Herbst <develop@mherbst.de> * Added a troubleshooting tip to solve a communication problem Signed-off-by: Martin Herbst <develop@mherbst.de> * Shortened the label name to follow the guide lines Signed-off-by: Martin Herbst <develop@mherbst.de> * Print more information about the reason for the failure Signed-off-by: Martin Herbst <develop@mherbst.de> * Using scheduler thread pool and simplified configuration Instead of configuring separate values for retry delays and number of retries only the maximum time for retries can be configured. The init method uses fixed delays. Signed-off-by: Martin Herbst <develop@mherbst.de> * Don't retry to send if gateway does not answer at all Signed-off-by: Martin Herbst <develop@mherbst.de> * Improved reconnect handling - unregister callback not necessary if connection is lost - wait 30s until clients and servers are restarted to give the gateway some time to recover Signed-off-by: Martin Herbst <develop@mherbst.de> * Spotless Signed-off-by: Martin Herbst <develop@mherbst.de> * Cancel an active future if the binding is stopped Signed-off-by: Martin Herbst <develop@mherbst.de>
…ab#11429) * Use globally unique id for registration of callback to allow ... the connection of multiple OH installations with one CCU. The bridge id is not sufficient for this purpose because it is same in all OH installations. Signed-off-by: Martin Herbst <develop@mherbst.de> * Retry callback re-registration after connection is resumed Some services on the CCU need longer to start and are not available immediately after the connection to the CCU has been resumed. Improves the solution for openhab#8808 Fixes openhab#10439 Signed-off-by: Martin Herbst <develop@mherbst.de> * Description was missing. Signed-off-by: Martin Herbst <develop@mherbst.de> * Changed setting name and description to avoid confusion Signed-off-by: Martin Herbst <develop@mherbst.de> * Added a troubleshooting tip to solve a communication problem Signed-off-by: Martin Herbst <develop@mherbst.de> * Shortened the label name to follow the guide lines Signed-off-by: Martin Herbst <develop@mherbst.de> * Print more information about the reason for the failure Signed-off-by: Martin Herbst <develop@mherbst.de> * Using scheduler thread pool and simplified configuration Instead of configuring separate values for retry delays and number of retries only the maximum time for retries can be configured. The init method uses fixed delays. Signed-off-by: Martin Herbst <develop@mherbst.de> * Don't retry to send if gateway does not answer at all Signed-off-by: Martin Herbst <develop@mherbst.de> * Improved reconnect handling - unregister callback not necessary if connection is lost - wait 30s until clients and servers are restarted to give the gateway some time to recover Signed-off-by: Martin Herbst <develop@mherbst.de> * Spotless Signed-off-by: Martin Herbst <develop@mherbst.de> * Cancel an active future if the binding is stopped Signed-off-by: Martin Herbst <develop@mherbst.de> Signed-off-by: Nick Waterton <n.waterton@outlook.com>
…ab#11429) * Use globally unique id for registration of callback to allow ... the connection of multiple OH installations with one CCU. The bridge id is not sufficient for this purpose because it is same in all OH installations. Signed-off-by: Martin Herbst <develop@mherbst.de> * Retry callback re-registration after connection is resumed Some services on the CCU need longer to start and are not available immediately after the connection to the CCU has been resumed. Improves the solution for openhab#8808 Fixes openhab#10439 Signed-off-by: Martin Herbst <develop@mherbst.de> * Description was missing. Signed-off-by: Martin Herbst <develop@mherbst.de> * Changed setting name and description to avoid confusion Signed-off-by: Martin Herbst <develop@mherbst.de> * Added a troubleshooting tip to solve a communication problem Signed-off-by: Martin Herbst <develop@mherbst.de> * Shortened the label name to follow the guide lines Signed-off-by: Martin Herbst <develop@mherbst.de> * Print more information about the reason for the failure Signed-off-by: Martin Herbst <develop@mherbst.de> * Using scheduler thread pool and simplified configuration Instead of configuring separate values for retry delays and number of retries only the maximum time for retries can be configured. The init method uses fixed delays. Signed-off-by: Martin Herbst <develop@mherbst.de> * Don't retry to send if gateway does not answer at all Signed-off-by: Martin Herbst <develop@mherbst.de> * Improved reconnect handling - unregister callback not necessary if connection is lost - wait 30s until clients and servers are restarted to give the gateway some time to recover Signed-off-by: Martin Herbst <develop@mherbst.de> * Spotless Signed-off-by: Martin Herbst <develop@mherbst.de> * Cancel an active future if the binding is stopped Signed-off-by: Martin Herbst <develop@mherbst.de> Signed-off-by: Michael Schmidt <mi.schmidt.83@gmail.com>
…ab#11429) * Use globally unique id for registration of callback to allow ... the connection of multiple OH installations with one CCU. The bridge id is not sufficient for this purpose because it is same in all OH installations. Signed-off-by: Martin Herbst <develop@mherbst.de> * Retry callback re-registration after connection is resumed Some services on the CCU need longer to start and are not available immediately after the connection to the CCU has been resumed. Improves the solution for openhab#8808 Fixes openhab#10439 Signed-off-by: Martin Herbst <develop@mherbst.de> * Description was missing. Signed-off-by: Martin Herbst <develop@mherbst.de> * Changed setting name and description to avoid confusion Signed-off-by: Martin Herbst <develop@mherbst.de> * Added a troubleshooting tip to solve a communication problem Signed-off-by: Martin Herbst <develop@mherbst.de> * Shortened the label name to follow the guide lines Signed-off-by: Martin Herbst <develop@mherbst.de> * Print more information about the reason for the failure Signed-off-by: Martin Herbst <develop@mherbst.de> * Using scheduler thread pool and simplified configuration Instead of configuring separate values for retry delays and number of retries only the maximum time for retries can be configured. The init method uses fixed delays. Signed-off-by: Martin Herbst <develop@mherbst.de> * Don't retry to send if gateway does not answer at all Signed-off-by: Martin Herbst <develop@mherbst.de> * Improved reconnect handling - unregister callback not necessary if connection is lost - wait 30s until clients and servers are restarted to give the gateway some time to recover Signed-off-by: Martin Herbst <develop@mherbst.de> * Spotless Signed-off-by: Martin Herbst <develop@mherbst.de> * Cancel an active future if the binding is stopped Signed-off-by: Martin Herbst <develop@mherbst.de>
* Replace deprecated constructors * Removed no longer existing settings from the documentation. They were already marked as deprecated since several versions. * Refactored communication with the HM gateway - simplified coding for the communication with the gateway - buffer size for communication is now configurable to avoid problems with too small buffers - Previous solution for openhab#6963 was not sufficient. Should be finally done with these changes * Retrieving the duty cycle is sufficient to check connection - ping requests could therefore be safely removed problems with the automatic reconnection were solved. * Changed to explicit list of Exception Fixes openhab#8808 Signed-off-by: Martin Herbst <develop@mherbst.de>
…ab#11429) * Use globally unique id for registration of callback to allow ... the connection of multiple OH installations with one CCU. The bridge id is not sufficient for this purpose because it is same in all OH installations. Signed-off-by: Martin Herbst <develop@mherbst.de> * Retry callback re-registration after connection is resumed Some services on the CCU need longer to start and are not available immediately after the connection to the CCU has been resumed. Improves the solution for openhab#8808 Fixes openhab#10439 Signed-off-by: Martin Herbst <develop@mherbst.de> * Description was missing. Signed-off-by: Martin Herbst <develop@mherbst.de> * Changed setting name and description to avoid confusion Signed-off-by: Martin Herbst <develop@mherbst.de> * Added a troubleshooting tip to solve a communication problem Signed-off-by: Martin Herbst <develop@mherbst.de> * Shortened the label name to follow the guide lines Signed-off-by: Martin Herbst <develop@mherbst.de> * Print more information about the reason for the failure Signed-off-by: Martin Herbst <develop@mherbst.de> * Using scheduler thread pool and simplified configuration Instead of configuring separate values for retry delays and number of retries only the maximum time for retries can be configured. The init method uses fixed delays. Signed-off-by: Martin Herbst <develop@mherbst.de> * Don't retry to send if gateway does not answer at all Signed-off-by: Martin Herbst <develop@mherbst.de> * Improved reconnect handling - unregister callback not necessary if connection is lost - wait 30s until clients and servers are restarted to give the gateway some time to recover Signed-off-by: Martin Herbst <develop@mherbst.de> * Spotless Signed-off-by: Martin Herbst <develop@mherbst.de> * Cancel an active future if the binding is stopped Signed-off-by: Martin Herbst <develop@mherbst.de>
…ab#11429) * Use globally unique id for registration of callback to allow ... the connection of multiple OH installations with one CCU. The bridge id is not sufficient for this purpose because it is same in all OH installations. Signed-off-by: Martin Herbst <develop@mherbst.de> * Retry callback re-registration after connection is resumed Some services on the CCU need longer to start and are not available immediately after the connection to the CCU has been resumed. Improves the solution for openhab#8808 Fixes openhab#10439 Signed-off-by: Martin Herbst <develop@mherbst.de> * Description was missing. Signed-off-by: Martin Herbst <develop@mherbst.de> * Changed setting name and description to avoid confusion Signed-off-by: Martin Herbst <develop@mherbst.de> * Added a troubleshooting tip to solve a communication problem Signed-off-by: Martin Herbst <develop@mherbst.de> * Shortened the label name to follow the guide lines Signed-off-by: Martin Herbst <develop@mherbst.de> * Print more information about the reason for the failure Signed-off-by: Martin Herbst <develop@mherbst.de> * Using scheduler thread pool and simplified configuration Instead of configuring separate values for retry delays and number of retries only the maximum time for retries can be configured. The init method uses fixed delays. Signed-off-by: Martin Herbst <develop@mherbst.de> * Don't retry to send if gateway does not answer at all Signed-off-by: Martin Herbst <develop@mherbst.de> * Improved reconnect handling - unregister callback not necessary if connection is lost - wait 30s until clients and servers are restarted to give the gateway some time to recover Signed-off-by: Martin Herbst <develop@mherbst.de> * Spotless Signed-off-by: Martin Herbst <develop@mherbst.de> * Cancel an active future if the binding is stopped Signed-off-by: Martin Herbst <develop@mherbst.de>
…ab#11429) * Use globally unique id for registration of callback to allow ... the connection of multiple OH installations with one CCU. The bridge id is not sufficient for this purpose because it is same in all OH installations. Signed-off-by: Martin Herbst <develop@mherbst.de> * Retry callback re-registration after connection is resumed Some services on the CCU need longer to start and are not available immediately after the connection to the CCU has been resumed. Improves the solution for openhab#8808 Fixes openhab#10439 Signed-off-by: Martin Herbst <develop@mherbst.de> * Description was missing. Signed-off-by: Martin Herbst <develop@mherbst.de> * Changed setting name and description to avoid confusion Signed-off-by: Martin Herbst <develop@mherbst.de> * Added a troubleshooting tip to solve a communication problem Signed-off-by: Martin Herbst <develop@mherbst.de> * Shortened the label name to follow the guide lines Signed-off-by: Martin Herbst <develop@mherbst.de> * Print more information about the reason for the failure Signed-off-by: Martin Herbst <develop@mherbst.de> * Using scheduler thread pool and simplified configuration Instead of configuring separate values for retry delays and number of retries only the maximum time for retries can be configured. The init method uses fixed delays. Signed-off-by: Martin Herbst <develop@mherbst.de> * Don't retry to send if gateway does not answer at all Signed-off-by: Martin Herbst <develop@mherbst.de> * Improved reconnect handling - unregister callback not necessary if connection is lost - wait 30s until clients and servers are restarted to give the gateway some time to recover Signed-off-by: Martin Herbst <develop@mherbst.de> * Spotless Signed-off-by: Martin Herbst <develop@mherbst.de> * Cancel an active future if the binding is stopped Signed-off-by: Martin Herbst <develop@mherbst.de> Signed-off-by: Andras Uhrin <andras.uhrin@gmail.com>
Expected Behavior
After a reboot of the CCU3, I would like to have OH2 receiving again data without reboot.
Current Behavior
Whenever the CCU is rebooted while Openhab is running, data transmission from CCU to Openhab stops working (for example temperature changes are not submitted).
=> I have to restart Openhab (while CCU is running) and then it works again.
=> I did not find anything in the log file
In PaperUI it shows the things still online.
Possible Solution
Either reboot the binding automatically or even better ensure that it reconnects after a CCU restart
Steps to Reproduce (for Bugs)
=> Binding works
Context
The problem is, that after a power outage or an issue with the CCU, I must manually reboot Openhab. Any power outage will lead to a non working Home Automation as the variables from CCU are outdated.
Your Environment
The text was updated successfully, but these errors were encountered: