Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[homematic] No connection after CCU to Openhab connection interruption #8808

Closed
lobocobra opened this issue Oct 20, 2020 · 66 comments · Fixed by #9692
Closed

[homematic] No connection after CCU to Openhab connection interruption #8808

lobocobra opened this issue Oct 20, 2020 · 66 comments · Fixed by #9692
Labels
bug An unexpected problem or unintended behavior of an add-on

Comments

@lobocobra
Copy link

Expected Behavior

After a reboot of the CCU3, I would like to have OH2 receiving again data without reboot.

Current Behavior

Whenever the CCU is rebooted while Openhab is running, data transmission from CCU to Openhab stops working (for example temperature changes are not submitted).
=> I have to restart Openhab (while CCU is running) and then it works again.
=> I did not find anything in the log file
In PaperUI it shows the things still online.

Possible Solution

Either reboot the binding automatically or even better ensure that it reconnects after a CCU restart

Steps to Reproduce (for Bugs)

  1. Start CCU
  2. Start Openhab
    => Binding works
  3. Restart CCU
  4. Binding does not submitt any changes of variables like temperature. In PaperUI it shows the things still online.

Context

The problem is, that after a power outage or an issue with the CCU, I must manually reboot Openhab. Any power outage will lead to a non working Home Automation as the variables from CCU are outdated.

Your Environment

  • CCU3 on Raspberry Pi. Same issue was also on original CCU2.
  • Openhab 2.5 running on Synology NAS Linux container
@lobocobra lobocobra added the bug An unexpected problem or unintended behavior of an add-on label Oct 20, 2020
@MHerbst
Copy link
Contributor

MHerbst commented Oct 20, 2020

I also have got the same CCU3 configuration and my openHAB is running also on a Raspi and not in a container. I can't remember that I ever had a problem like this if I had to reboot the CCU.

Generally the binding contains the necessary coding to automatically reconnect.

Maybe the logs can help to figure out the reason:

  • please enable TRACE log mode for the openHAB binding
  • restart the ccu
  • wait about 5 min.
  • post the content of openhab.log here.

@wborn wborn changed the title [HomeMatic Binding] // no connection after CCU reboot [homematic] No connection after CCU reboot Oct 24, 2020
@lobocobra
Copy link
Author

lobocobra commented Oct 25, 2020

Many thanks for looking into it. I wanted a real case, this is why it took some days until it happened again. Today I had an issue with the network (I played around with some cables on the switch), which caused an interuption of the connection between ccu and openhab (different server) and again the connection was not re-established afterwards (I did not recognize the problem until my wife told me that AGAIN internet is not working ;-) .

I will paste a log as you have requested right after.... but here is what I found in the shortened log....

At around 12h the NW problem started (Trace was not activated)

`2020-10-25 13:10:24.104 [ERROR] [org.openhab.io.net.http.HttpUtil    ] - Fatal transport error: java.net.NoRouteToHostException: No route to host (Host unreachable)`

... then I had some hours the D/C.....

Here I then fixed the Network connection.... but then I get this kind of log entry....

`2020-10-25 13:19:06.781 [INFO ] [ommunicator.AbstractHomematicGateway] - Connection resumed on gateway '3014F711A061A7D5
699DAA26'`

At 22:41 I started then trace on the homematic bridge... as you proposed

2020-10-25 22:41:09.140 [TRACE] [nal.communicator.client.XmlRpcClient] - Client XmlRpcRequest (port 2001):
<?xml version="1.0" encoding="ISO-8859-1"?>
<methodCall><methodName>ping</methodName>
<params><param><value>3014F711A061A7D5699DAA26</value></param></params></methodCall>
2020-10-25 22:41:09.146 [TRACE] [nal.communicator.client.XmlRpcClient] - Client XmlRpcResponse (port 2001):
<?xml version="1.0" encoding="iso-8859-1"?>^M
<methodResponse><params><param>^M
        <value><array><data><value><boolean>1</boolean></value></data></array></value>^M
</param></params></methodResponse>^M

Here is the shortened log file .... !!!
openhab.log

REMARK
Only the devices connected with the bridge were not updated. All oter connections like http and mqtt, re-connected but not the bridge :-(

@lobocobra
Copy link
Author

lobocobra commented Oct 26, 2020

@MHerbst :-)

Here we are ... I did the following...

  1. I ensured OH2 and Homematic are connected
  2. started TRACE on the binding in karaf
  3. rebooted the ccu from the web interface
  4. I waited some time
  5. then I checked if CCU is up after 5 min... all good
  6. I checked if OH2 is up and running.... yes but not connected to CCU
  7. copied log file to here...

openhab.log

  • on line 528 the CCU stopped
  • on line678 the CCU came up again

To make the bridge work again I have to...
ssh openhab@localhost -p8101 'bundle:restart org.openhab.binding.homematic'
or
restart OH2

I would appreciate, if you could have a look what is wrong.

My setup is....

  • latest openhab on synology NAS VM (not container), with MQTT etc...
  • CCU3 on raspi in a container (works well newest version)
    => I wonder if I should install ccu3 on a VM and then have the Pi only to transmitt. I assume the problem is seen not so often, as most have the OH2 and CCU3 on a PI together ... means no NW interruption

How do I control, if the connection is not working....
=> In hab-panel the temperature of homematic devices either reopor an outdated value or 0°.
=> If I refreshe the open hab panel or restart web - broser it is the same.

@MHerbst
Copy link
Contributor

MHerbst commented Oct 26, 2020

I will have a closer look into the logs the next days. I had a quick look into the second log. Here I can see after line 678 that the data points are added again. In line 1175 (time 02:07:45.099) I can see that the binding receives values from the CCU. Was this after your manual restart?

In the second log I am missing the exceptions regarding the the re-connect problems that I can see in the first log.

=> I wonder if I should install ccu3 on a VM and then have the Pi only to transmitt. I assume the problem is seen not so often, as most have the OH2 and CCU3 on a PI together ... means no NW interruption

I have got two Raspberry 3. One for the CCU3 and one for openHAB. That works really good. Yesterday I installed the latest Raspberrymatic software including a reboot of the CCU. OH reconnected automatically after the CCU was online again.

But maybe it necessary to disconnect the CCU from the network for a longer time to get the same problem. Will give it a try.

If you want to run the CCU and openHAB on the same Raspi I think you should at least use a Raspi 4.

How do I control, if the connection is not working....
I think it is possible to query the thing status (of the bridge) in a rule and react on status changes.

@lobocobra
Copy link
Author

lobocobra commented Oct 26, 2020

Many thanks for the support

Yes, on the second log on line 1175 you can see that the CCU comes up again after the reboot, which I initiated from the GUI.

Interestingly the things are seen as connected in the paper UI, but the temperatures in HabPanel and PaperUI are either "0" or are the old ones (they do not react if you put them outside)..
=> If you check the CCU, it shows that the sensor works, but the data is not received by OH2.
=> on the same time MQTT messages are received and updated.

I would even have a raspi 4 here but as my VM is still faster than the raspi 4 and backup is so easy with snapshot, I would like to prevent that.

@lobocobra
Copy link
Author

lobocobra commented Oct 27, 2020

Today my son cut the power of a router between openhab2 and the CCU3.
=> Good news are that this is reproducable... as a result OH2 does not receive new data.
=> Bad news are that I turned off yesterday the trace mode

I believe the problem occurs when the connection is temporarily cut, without that OH2 or CCU had a problem. So can it be that a request just times out?
=> if you unplug a rj45, the system of OH2/CCU will be notified, but not if the router in the middle is off for a while

Below you see an example of what I mean....
=> The temperature is now 24.5° but openhab2 still shows 25.2°.

Paper-UI
image

Hab Panel
image

CCU3
image

PaperUI Things
image

As soon I restart the brige with...

ssh openhab@localhost -p8101 'bundle:restart org.openhab.binding.homematic'
The temperature is updated again....
image

Here is the log entry that shows the problem?

2020-10-27 18:25:14.901 [ERROR] [ommunicator.AbstractHomematicGateway] - java.util.concurrent.TimeoutException: Total ti
meout 15000 ms elapsed
java.io.IOException: java.util.concurrent.TimeoutException: Total timeout 15000 ms elapsed
        at org.openhab.binding.homematic.internal.communicator.client.XmlRpcClient.send(XmlRpcClient.java:147) ~[bundleF
ile:?]
        at org.openhab.binding.homematic.internal.communicator.client.XmlRpcClient.sendMessage(XmlRpcClient.java:81) ~[b
undleFile:?]
        at org.openhab.binding.homematic.internal.communicator.client.RpcClient.listDevices(RpcClient.java:169) ~[bundle
File:?]
        at org.openhab.binding.homematic.internal.communicator.AbstractHomematicGateway.getDeviceDescriptions(AbstractHo
mematicGateway.java:448) ~[bundleFile:?]
        at org.openhab.binding.homematic.internal.communicator.AbstractHomematicGateway.newDevices(AbstractHomematicGate
way.java:761) [bundleFile:?]
        at org.openhab.binding.homematic.internal.communicator.server.RpcResponseHandler.handleNewDevice(RpcResponseHand
ler.java:108) [bundleFile:?]
        at org.openhab.binding.homematic.internal.communicator.server.RpcResponseHandler.handleMethodCall(RpcResponseHan
dler.java:57) [bundleFile:?]
        at org.openhab.binding.homematic.internal.communicator.server.XmlRpcServer$ResponseHandler.handle(XmlRpcServer.j
ava:125) [bundleFile:?]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) [bundleFile:9.4.20.v20190813]
:

How could a work around function....

Maybe if I create a virtual device on the CCU (timer) and send the current time to OH2
=> if OH2 sees more than 1 min difference with real time, then the homematic bridge is restarted?

@lobocobra lobocobra changed the title [homematic] No connection after CCU reboot [homematic] No connection after CCU to Openhab connection interruption Oct 28, 2020
@MHerbst
Copy link
Contributor

MHerbst commented Oct 29, 2020

What I can see from your screenshot is that the binding successfully detects the CCU again and the state is "ONLINE" again. But it seems that the binding does not receive messages from the CCU if a value changes. I will have to check how this could happen.

Maybe if I create a virtual device on the CCU (timer) and send the current time to OH2
=> if OH2 sees more than 1 min difference with real time, then the homematic bridge is restarted?

Interesting idea, could work. I will also check whether I can extend the binding to provide a timestamp of the last time an event from the CCU was received.

@lobocobra
Copy link
Author

lobocobra commented Nov 5, 2020

While I can reproduce the problem easily (d/c the router which is between OH2 and CCU3), the restart of the bridge does not fix the problem all the time.
=> then only a restart of OH2 helps.

While restarting a bridge once a while is a feasible and acceptable work-around, the restart of OH2 is a bad idea, because it makes OH2 unavailable for several minutes.
=> I will have to find a better work-around, means also my idea of the trigger, will not solve the issue

Does this error: [ERROR] [ommunicator.AbstractHomematicGateway] - java.util.concurrent.TimeoutException: Total ti
meout 15000 ms elapsed
, not mean that the bridge waits on something for eternity?

@MHerbst
Copy link
Contributor

MHerbst commented Nov 5, 2020

Unfortunately I did not have the time to test it in my environment. In my opinion the binding should work as follows:

  • after the connection to the CCU was lost, the bridge should try to reconnect (this happens)
  • if the binding detects a running CCU again, it should automatically reconnect (I thinks this also happens).
  • after the reconnect the bridge should make sure that events from the CCU can be received and passed to the connected items (seems not to work correctly)
  • after the reconnect the bridge should request an update of all values from the CCU (similar to what happens after a complete restart of OH)

@MHerbst
Copy link
Contributor

MHerbst commented Nov 15, 2020

@lobocobra I have the feeling that this problem only occurs for HmIP devices.
After a reconnect the binding issues a request to the CCU to receive the current values. This seems to work for non HmIP devices but there may be a problem with the HmIP.

Do you only have HmIP devices or also some "older" devices? If yes, what happens to them.

@lobocobra
Copy link
Author

lobocobra commented Nov 15, 2020

I have both types of devices all mixed.
=> Yesterday I saw the problem again and then I realized that from my 6 temp sensors the 4 HMIP did not show any temperature in the HabPanel, while the "old" devices continued to transmitt the temperature.

I also saw that....

  1. if only HMIP sensor do not transmit temperature
    => I can restart the bridge and then they work
  2. if all sensors do not transmit temperature
    => I need to restart Openhab as the restart of the bridge, does not help

To exclude configuration errors on my SW, I moved not OH2 and CCU3 to a PI4 with the same configuration as on my productive setup.
=> this runs now for 3 days and I did not see any problem so far, which is not surprizing, as it is not possible to interrupt the connection if both elements are on the same PI.

If I now restart the CCU3 on the PI, then it reconnects to OH2. So it seems that the problem occurs, when the CC3 is silently d/c and then reconnected again.
=> I would guess that the bridge is timing out a request and freezes and the bridge does not recognize situations where CCU3 and OH2 are still running but the connection between gets d/c for a while

@MHerbst
Copy link
Contributor

MHerbst commented Nov 15, 2020

I have mostly "old" HM devices and their values are automatically updated after a temporary disconnect to the CCU. But for the HmIP device (a dimmer) there is even a problem to show the correct after a complete restart. Only after changing the level it is updated. Need to investigate it further.

@lobocobra
Copy link
Author

Many thanks for looking into this!

I can confirm now, that since I have Openhab2 on the same pi4 as the VCCU3, the issue did not appear.
=> so at least this is a work - around that at least helps for the moment. Because without this I can not do any reliable script (I want to optimize my heating depending the temperatures in the house).

For the future I would of course want to go back to my old solution as the NAS runs 7/24 anyhow and backup / restore is a dream with VMM.

@lobocobra
Copy link
Author

lobocobra commented Dec 8, 2020

Today I had to take to cut the power line of my house. After restart everything worked except all temp sensors did not work.
=> CCU3 and Openhab are on the same raspi, so it seems that the problem is not resolved by having both on the same HW.

Only after restarting the bridge with...

  • ssh openhab@localhost -p8101 'bundle:restart org.openhab.binding.homematic'
    Everything worked.

Did you find out anything new on your side or is there anything I can do? I want to program my heating with data from the CCU but with this current issue, I fear that the connection is far to unreliable to do such things.

The only other solution I can think off is to replace the bridge with a MQTT solution or can that be repaired?

Many thanks

@MHerbst
Copy link
Contributor

MHerbst commented Dec 8, 2020

I am currently investigating this problem and another one where the device is not updated reliably. I think, I also had a similar problem last week after I had some problems with my router.

Today I had to take to cut the power line of my house. After restart everything worked except all temp sensors did not work.
=> CCU3 and Openhab are on the same raspi, so it seems that the problem is not resolved by having both on the same HW.

If the CCU3 and openHAB where started at the same time it is possible that the CCU was not ready when the binding tried to connect to the CCU and then this caused the problem.

The only other solution I can think off is to replace the bridge with a MQTT solution or can that be repaired?

I am quite sure that it can be repaired as soon as I find the cause of the problem. Unfortunately some things in the binding are a bit complicated because the "old" Homematic devices, HmIP and CuxD are using different methods for the communication ...

@MHerbst
Copy link
Contributor

MHerbst commented Dec 8, 2020

I think I found a way to reproduce it. The behavior is a bit strange. After the re-connect I get a timeout each time I try to send a command to a HmIP device. As soon as I send a command to an older Homematic device the HmIP device starts to work again.

@lobocobra
Copy link
Author

Ahh cool! I saw also time out messages with 15000 ms.
java.io.IOException: java.util.concurrent.TimeoutException: Total timeout 15000 ms elapsed

Just an update... today (after the power cut yesterday), the temperatures are still frozen at a value of yesterday. So it seems to me too, that once a certain condition takes place (maybe interruption of connection) the timeout blocks successfull communication.

Unfortunatly a restart of the bridge does not always work (depends on the way of the exception condition).
=> Could this mean that there is also an issue related to openhab? Maybe openhab does not "trigger" the bridge correctly?

@MHerbst
Copy link
Contributor

MHerbst commented Dec 26, 2020

@lobocobra I think, I have found a solution. At least the first testw in my environment looked promising. The code needs some cleaning but then I will create a PR.

Are you on OH 3 or would you need a fix also for 2.5?

@lobocobra
Copy link
Author

@MHerbst

Hey that sounds great! I am currently on version 2.5 but as the future is v3, I am ready to upgrade (as I plan this anyhow).
=> I will adapt to what is easier for your.

@MHerbst
Copy link
Contributor

MHerbst commented Jan 4, 2021

@lobocobra My current development environment is based OH 3 and if possible I would like to avoid to set-up an additional environment for 2.5.

In the meantime I have uploaded a test version for OH 3 (and above) to https://github.com/MHerbst/openhab-addons-test

@Thousand81
Copy link

@MHerbst Thank you for the fix. I have exactly the same problem, but I'm still running on OH 2.5. Is there any chance you backport to 2.5 as well? Thanks in advance.

@BobSilent
Copy link

BobSilent commented Jan 8, 2021

Since more than one year I am trying to find the root cause of this issue: why my connection to CCU gets lost from time to time, i searched the issue in my setup but was not really successful.
At the beginning I was struggling with my RPI4 which had a wrong power adapter with to less Ampere, which leads to the fact that the PI sometimes disabled WiFi and after enabling the WiFi i had to restart openhab in order to get the ccu connection.
After changing the power supply, this reduced the error for a while. But nevertheless I did not understand why the connection could be re-established.

Now (in 2021 :)) I looked around the community and found this thread, it really looks similar to my problem.
With newer OH releases 2.5.9++ the connection loss occurs more frequent up to know where it was sometimes twice a day.
So i hope this PR will finally help me to get a stable OH CCU connection.
I Updated my system now to OH3 in order to test the binding.
In OH web I uninstalled the original binding and I added the binding jar into the addons directory, cleaned cache and also executed the

openhab> feature:install openhab-transport-upnp
openhab> feature:install openhab-transport-serial

but after OH restart I see in openhab.log this error

Error while starting bundle: file:/usr/share/openhab/addons/org.openhab.binding.homematic-3.1.0-SNAPSHOT.jar
org.osgi.framework.BundleException: Could not resolve module: org.openhab.binding.homematic [128]
  Unresolved requirement: Import-Package: org.openhab.core.config.discovery.upnp

The web does not show me the binding
image
but somehow my homematic things are shown online, so is it okay what i did?
image

@MHerbst
Copy link
Contributor

MHerbst commented Jan 8, 2021

The message regarding "Unresolved requirment" should not appear after the installation of openhab-transport-upnp was successful.

The web does not show me the binding
That's an expected behavior. The UI shows only bindings that have been installed via the UI. Bindings loaded from the add-ons folder are not shown.

You can check that the binding is correctly loaded with a Karaf console command:

bundle:list | grep Homematic

You should then see one entry for the Homematic binding with status "Active" and a version starting with 3.1.0

@MHerbst
Copy link
Contributor

MHerbst commented Jan 8, 2021

@MHerbst Thank you for the fix. I have exactly the same problem, but I'm still running on OH 2.5. Is there any chance you backport to 2.5 as well? Thanks in advance.

@Thousand81 I will have to test whether I can set up a dev. environment for 2.5 in parallel. But first of all I would like to wait for the merge into the current version.

@BobSilent
Copy link

You can check that the binding is correctly loaded with a Karaf console command:

bundle:list | grep Homematic

You should then see one entry for the Homematic binding with status "Active" and a version starting with 3.1.0

thanks for the hint, at least the commands show it is active

openhab> bundle:list | grep Homematic
128 │ Active │  80 │ 3.1.0.202101041658      │ openHAB Add-ons :: Bundles :: Homematic Binding

and upnp is also active

openhab> bundle:list | grep -i upnp
216 │ Active │  80 │ 2.5.2                   │ JUPnP Library
235 │ Active │  80 │ 3.0.0                   │ openHAB Core :: Bundles :: Configuration UPnP Discovery
242 │ Active │  80 │ 3.0.0                   │ openHAB Core :: Bundles :: UPnP Transport

@BobSilent
Copy link

@MHerbst since tonight ~4:50h I still face my issue that no update is send to openhab from ccu. Although the connection shows online.

@Joerg-Dr
Copy link

@MHerbst

I had been waiting for more than an 30 minutes. The HmIP devices did not reconnect during this time.
I will enable DEBUG log and post it here.

@Joerg-Dr
Copy link

@MHerbst

Next test, logging set to DEBUG:
Rebooting Homematic CCU2, then waiting 5 minutes.

The IP devices do not reconnect:
Homematic Things

The log entries:
OpenHab Homematic.log

@Joerg-Dr
Copy link

@MHerbst

After 2 Hours IP Devices are still on error:
Pausing and re-initializing HomeMatic Bridge

Re-Initializing Homematic Bridge.log

@Joerg-Dr
Copy link

On more hint for testing:

After cleaning of the cache via

  • openhab-cli stop
  • openhab-cli clean-cache
  • openhab-cli start

one has to re-install the UPNP bundle

  • feature:install openhab-transport-upnp

via karaf console.

@MHerbst
Copy link
Contributor

MHerbst commented Sep 26, 2021

I think, I have found the reason why the patch does not work in your environment. It takes much longer than a CCU3 until the CCU2 is ready again after a reboot. I will have to modify the retry logic and probably make the number of attempts and the wait time between two attempts configurable.

@Joerg-Dr
Copy link

@MHerbst
Do you plan to implement the necessary changes to support the CCU2 with Homematic IP devices in the future?
Is there anything more I can help with?

@MHerbst
Copy link
Contributor

MHerbst commented Oct 14, 2021

@Joerg-Dr I have planned it for this weekend.

@MHerbst
Copy link
Contributor

MHerbst commented Oct 16, 2021

@Joerg-Dr I have uploaded a new test version (https://github.com/MHerbst/openhab-addons-test). You can now configure the number of retries and the wait time as advanced options in the Bridge configuration:
image

@Joerg-Dr
Copy link

@MHerbst

Thank you very much for the new version, this solution works fine.
I tried with the following configuration:

  • number of connection retries: 40
  • Retry wait time: 10

After restarting the Homematic CCU2, connection attempt 17/40 had been successful.
You will find the log file attached.

I have two suggestions for the binding:

  1. You could add a binding version number to the configuration page, so one can verify that the latest version is running.
  2. If possible, retry the initialization of communication with homematic devices still on error every xx minutes. I had seen this happening anyway when you change a setting on the configuration page and then press "Save (Ctrl-S)". An automatic retry would help re-enabling homematic devices that are still on error, even after a longer time.

Homematic 19-10-2021.log

@MHerbst
Copy link
Contributor

MHerbst commented Oct 20, 2021

@Joerg-Dr
Thanks for your feedback. Got to hear that the last fix was successful. Regarding you suggestion:

  1. good idea
  2. that is different mechanism, and I am a bit surprised that this problem occurs at all. I thought that they would change their state if an event is received or some action is executed.

Because this is a different problem and this issue is already closed, I would like to ask you to create a new issue.

@eikowagenknecht
Copy link
Contributor

I have a quite similar problem using openHAB 3.2.0.M3. After a reboot of the CCU3, all HmIP devices are offline and stay offline:
image

Logs:
image

No matter how long I wait after that, only a bundle restart or openhab restart brings the devices back online.

Non-Ip devices are online just fine, the bridge also registers as "online".

Do you think this is connected or should I open a new issue?

@MHerbst
Copy link
Contributor

MHerbst commented Oct 23, 2021

@eikowagenknecht This really looks like the same problem. The problem is that the HmIP service on the CCU needs quite a long time after a CCU restart until it accepts event registration. The release or milestone verstion of the add-on version in the OH 3 tries it immediately after the CCU is reachable again but there were no retries if the event registration failed which may take about a minite.

Can you try it the latest version of my test add-in?

@eikowagenknecht
Copy link
Contributor

eikowagenknecht commented Nov 2, 2021

Trying it now. I noticed that the description and the actual code differ regarding the default value. You might want to synchronize that:

<parameter name="retryWaitTime" type="integer" min="1" unit="s">
				<label>Retry Wait Time</label>
				<description>Time in seconds between two connection retries.</description>
				<advanced>true</advanced>
				<default>3</default>
			</parameter>
- **retryWaitTime**
Time in seconds between two connection retries (default = 30s)

and maybe also add the default (20 times) here:

- **retryConnects**
Maximum number of retries to connect to the Homematic Gateway and register a callback.

@eikowagenknecht
Copy link
Contributor

Seems to work just fine with the default settings (20x 3s).

I notice that the log shows 37x the following entry (this snippet is the last one, the first was 2021-11-02 11:37:50.232):

2021-11-02 11:38:18.530 [ERROR] [ommunicator.AbstractHomematicGateway] - org.xml.sax.SAXException: Unknown XML-RPC tag: title

java.io.IOException: org.xml.sax.SAXException: Unknown XML-RPC tag: title

	at org.openhab.binding.homematic.internal.communicator.client.XmlRpcClient.sendMessage(XmlRpcClient.java:87) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.client.RpcClient.listDevices(RpcClient.java:192) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.AbstractHomematicGateway.getDeviceDescriptions(AbstractHomematicGateway.java:451) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.AbstractHomematicGateway.newDevices(AbstractHomematicGateway.java:761) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.server.RpcResponseHandler.handleNewDevice(RpcResponseHandler.java:108) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.server.RpcResponseHandler.handleMethodCall(RpcResponseHandler.java:57) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.server.XmlRpcServer$ResponseHandler.handle(XmlRpcServer.java:125) ~[?:?]

	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) ~[bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) [bundleFile:9.4.43.v20210629]

	at java.lang.Thread.run(Thread.java:829) [?:?]

Caused by: org.xml.sax.SAXException: Unknown XML-RPC tag: title

	at org.openhab.binding.homematic.internal.communicator.message.XmlRpcResponse$XmlRpcHandler.endElement(XmlRpcResponse.java:168) ~[?:?]

	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:610) ~[?:?]

	at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.endNamespaceScope(XMLDTDValidator.java:1985) ~[?:?]

	at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.handleEndElement(XMLDTDValidator.java:1934) ~[?:?]

	at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.endElement(XMLDTDValidator.java:865) ~[?:?]

	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1718) ~[?:?]

	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2883) ~[?:?]

	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605) ~[?:?]

	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534) ~[?:?]

	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888) ~[?:?]

	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824) ~[?:?]

	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) ~[?:?]

	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216) ~[?:?]

	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635) ~[?:?]

	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.message.XmlRpcResponse.<init>(XmlRpcResponse.java:55) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.client.XmlRpcClient.sendMessage(XmlRpcClient.java:81) ~[?:?]

	... 23 more

Probably this is during the reconnection phase?

I think it is way too verbose for something as normal as a reconnection occurring. It would be cleaner if there was a warning message like "Trying to reconnect, try x of y" and only if all those fail an error message. That would also give a hint if the amount of retries is set correctly or maybe the timing could be spaced out. Also this is the way e.g. the modbus binding handles reconnection attempts if I remember correctly.

This is minor nitpicking though, I'm really glad it works and I can finally restart the CCU without causing to much problems :-)

@eikowagenknecht
Copy link
Contributor

eikowagenknecht commented Nov 2, 2021

Not sure if this is related: I'm now seeing some of those

2021-11-02 11:42:46.721 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWDO-I:ccu3:00109A49A2759F' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException

after the reconnection where everything is "Online" for a while.

Some are even more verbose:

2021-11-02 11:43:31.856 [ERROR] [ommunicator.AbstractHomematicGateway] - java.util.concurrent.TimeoutException

java.io.IOException: java.util.concurrent.TimeoutException

	at org.openhab.binding.homematic.internal.communicator.client.XmlRpcClient.send(XmlRpcClient.java:122) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.client.XmlRpcClient.sendMessage(XmlRpcClient.java:77) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.client.RpcClient.listDevices(RpcClient.java:192) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.AbstractHomematicGateway.getDeviceDescriptions(AbstractHomematicGateway.java:451) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.AbstractHomematicGateway.newDevices(AbstractHomematicGateway.java:761) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.server.RpcResponseHandler.handleNewDevice(RpcResponseHandler.java:108) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.server.RpcResponseHandler.handleMethodCall(RpcResponseHandler.java:57) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.server.XmlRpcServer$ResponseHandler.handle(XmlRpcServer.java:125) ~[?:?]

	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) ~[bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) [bundleFile:9.4.43.v20210629]

	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) [bundleFile:9.4.43.v20210629]

	at java.lang.Thread.run(Thread.java:829) [?:?]

Caused by: java.util.concurrent.TimeoutException

	at org.eclipse.jetty.client.util.FutureResponseListener.get(FutureResponseListener.java:109) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.client.XmlRpcClient.send(XmlRpcClient.java:112) ~[?:?]

	... 24 more

So:

  1. All running - "Online" for all HmIP devices
  2. CCU reboot - Bridge goes offline, devices as well
  3. CCU back on - All HmIP devices to "Online"
  4. ~2 minutes later: A third of my devices changes to "Error: Comm" state, not all at once, but one after the other
  5. Most of them come back online, one after the other, over a period of some minutes

Here is the log filtered to the Timeout events.

2021-11-02 11:39:16.216 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWSD:ccu3:000A58A9AC2E8F' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException

2021-11-02 11:40:16.349 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWSD:ccu3:000A58A9AC3E08' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException

2021-11-02 11:41:16.681 [WARN ] [ternal.handler.HomematicThingHandler] - Could not reinitialize the device '0026DBE998FDC9': java.util.concurrent.TimeoutException

java.io.IOException: java.util.concurrent.TimeoutException

Caused by: java.util.concurrent.TimeoutException

2021-11-02 11:42:01.719 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-DSD-PCB:ccu3:0026DBE998FDC9' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException

2021-11-02 11:42:46.721 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWDO-I:ccu3:00109A49A2759F' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException

2021-11-02 11:43:31.856 [ERROR] [ommunicator.AbstractHomematicGateway] - java.util.concurrent.TimeoutException

java.io.IOException: java.util.concurrent.TimeoutException

Caused by: java.util.concurrent.TimeoutException

2021-11-02 11:44:17.525 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWDO-I:ccu3:00109A49A2765C' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException

2021-11-02 11:45:18.175 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-eTRV-2:ccu3:000A1D89A59EC0' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException

2021-11-02 11:46:03.288 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-eTRV-2:ccu3:000A1A49A1072B' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException

2021-11-02 11:47:18.670 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWDO-I:ccu3:00109A49A27660' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException

2021-11-02 11:48:21.906 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-DSD-PCB:ccu3:0026DBE998FDC9' changed from OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException to ONLINE

2021-11-02 11:48:34.339 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWDO-I:ccu3:00109A49A2765C' changed from OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException to ONLINE

2021-11-02 11:48:40.509 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWDO-I:ccu3:00109A49A2759F' changed from OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException to ONLINE

2021-11-02 11:48:53.774 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWSD:ccu3:000A58A9AC2E8F' changed from OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException to ONLINE

2021-11-02 11:49:02.644 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-eTRV-2:ccu3:000A1D89A59EC0' changed from OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException to ONLINE

2021-11-02 11:49:05.175 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-eTRV-2:ccu3:000A1A49A1072B' changed from OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException to ONLINE

2021-11-02 11:49:06.902 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWDO-I:ccu3:00109A49A27660' changed from OFFLINE (COMMUNICATION_ERROR): java.util.concurrent.TimeoutException to ONLINE

Right now (~10min later) only one device is still in Error:Comm state and doesn't look like it's changing anytime soon.

In the CCU3 itself all devices are online, no service alerts.

@MHerbst
Copy link
Contributor

MHerbst commented Nov 2, 2021

You might want to synchronize that:

Thanks for the hint. I also forgot to sync it in the PR that I have created some days ago ..

Probably this is during the reconnection phase?

This XML-RPC message appears because the CCU returns HTML code containing a message that it is not ready. Would be much better if it would return a correct error state ...

I can try to improve the error handling here. I will be better to create a new issue for this problem.

Right now (~10min later) only one device is still in Error:Comm state and doesn't look like it's changing anytime soon.

The timeout exception a bit strange. Maybe the binding sends too many requests to the CCU if there are quite a lot of HmIP devices.

Regarding the one remaining device: it would be interesting whether it is always the same device type or whether it is a different one if you test it again.

You could also try to increase the timeout value from 15s to a higher value.

@eikowagenknecht
Copy link
Contributor

eikowagenknecht commented Nov 2, 2021

The device stayed in Error:Comm state. It was of type homematic:HmIP-SWSD, so 1 of my 6 identical smoke alarms stayed in this state.

Allright, next try:

  1. Changed timeout to 30s
  2. Reboot of openHAB to make sure no sideeffects occurr
  3. Reboot of CCU3
  4. Same as before :-(

Aand exactly the same device has problems again. Mysterious indeed. The other 5 have no problems. Also it's neither the first nor last to go offline.

image

But this time no timeout errors in the logs, instead:

2021-11-02 14:26:09.748 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWDO-I:ccu3:00109A49A2759F' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): -1 Generic error (sending <?xml version="1.0" encoding="ISO-8859-1"?>
<methodCall><methodName>getParamset</methodName>
<params><param><value>00109A49A2759F:0</value></param><param><value>MASTER</value></param></params></methodCall>)
2021-11-02 14:27:11.209 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-eTRV-2:ccu3:000A1D89A59EC0' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): -1 Generic error (sending <?xml version="1.0" encoding="ISO-8859-1"?>
<methodCall><methodName>getParamset</methodName>
<params><param><value>000A1D89A59EC0:0</value></param><param><value>MASTER</value></param></params></methodCall>)
2021-11-02 14:29:11.374 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWSD:ccu3:000A58A9AC2E8F' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): -1 Generic error (sending <?xml version="1.0" encoding="ISO-8859-1"?>
<methodCall><methodName>getParamset</methodName>
<params><param><value>000A58A9AC2E8F:0</value></param><param><value>MASTER</value></param></params></methodCall>)
2021-11-02 14:30:11.501 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWSD:ccu3:000A58A9AC3E08' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): -1 Generic error (sending <?xml version="1.0" encoding="ISO-8859-1"?>
<methodCall><methodName>getParamset</methodName>
<params><param><value>000A58A9AC3E08:0</value></param><param><value>MASTER</value></param></params></methodCall>)
2021-11-02 14:31:11.578 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWDO-I:ccu3:00109A49A27660' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): -1 Generic error (sending <?xml version="1.0" encoding="ISO-8859-1"?>
<methodCall><methodName>getParamset</methodName>
<params><param><value>00109A49A27660:0</value></param><param><value>MASTER</value></param></params></methodCall>)
2021-11-02 14:33:11.729 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-eTRV-2:ccu3:000A1A49A1072B' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): -1 Generic error (sending <?xml version="1.0" encoding="ISO-8859-1"?>
<methodCall><methodName>getParamset</methodName>
<params><param><value>000A1A49A1072B:0</value></param><param><value>MASTER</value></param></params></methodCall>)
2021-11-02 14:34:11.801 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-eTRV-2:ccu3:000A1A49A10937' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): -1 Generic error (sending <?xml version="1.0" encoding="ISO-8859-1"?>
<methodCall><methodName>getParamset</methodName>
<params><param><value>000A1A49A10937:0</value></param><param><value>MASTER</value></param></params></methodCall>)
2021-11-02 14:35:11.119 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWDO-I:ccu3:00109A49A2759F' changed from OFFLINE (COMMUNICATION_ERROR): -1 Generic error (sending <?xml version="1.0" encoding="ISO-8859-1"?>
<methodCall><methodName>getParamset</methodName>
<params><param><value>00109A49A2759F:0</value></param><param><value>MASTER</value></param></params></methodCall>) to ONLINE
2021-11-02 14:35:24.420 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWSD:ccu3:000A58A9AC2E8F' changed from OFFLINE (COMMUNICATION_ERROR): -1 Generic error (sending <?xml version="1.0" encoding="ISO-8859-1"?>
<methodCall><methodName>getParamset</methodName>
<params><param><value>000A58A9AC2E8F:0</value></param><param><value>MASTER</value></param></params></methodCall>) to ONLINE
2021-11-02 14:35:34.952 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-eTRV-2:ccu3:000A1D89A59EC0' changed from OFFLINE (COMMUNICATION_ERROR): -1 Generic error (sending <?xml version="1.0" encoding="ISO-8859-1"?>
<methodCall><methodName>getParamset</methodName>
<params><param><value>000A1D89A59EC0:0</value></param><param><value>MASTER</value></param></params></methodCall>) to ONLINE
2021-11-02 14:35:38.596 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-eTRV-2:ccu3:000A1A49A1072B' changed from OFFLINE (COMMUNICATION_ERROR): -1 Generic error (sending <?xml version="1.0" encoding="ISO-8859-1"?>
<methodCall><methodName>getParamset</methodName>
<params><param><value>000A1A49A1072B:0</value></param><param><value>MASTER</value></param></params></methodCall>) to ONLINE
2021-11-02 14:35:42.083 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SWDO-I:ccu3:00109A49A27660' changed from OFFLINE (COMMUNICATION_ERROR): -1 Generic error (sending <?xml version="1.0" encoding="ISO-8859-1"?>
<methodCall><methodName>getParamset</methodName>
<params><param><value>00109A49A27660:0</value></param><param><value>MASTER</value></param></params></methodCall>) to ONLINE
2021-11-02 14:35:46.976 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-eTRV-2:ccu3:000A1A49A10937' changed from OFFLINE (COMMUNICATION_ERROR): -1 Generic error (sending <?xml version="1.0" encoding="ISO-8859-1"?>
<methodCall><methodName>getParamset</methodName>
<params><param><value>000A1A49A10937:0</value></param><param><value>MASTER</value></param></params></methodCall>) to ONLINE

I've never seen these messages before so I suspect the mostly-fixed ccu3 reboot bahaviour has something to do with it. If you want I can open a new issue of course.

Unfortunately the current state where devices flake in and out after a ccu3 reboot is somewhat worse than before bevause I can not reliably recognize this behaviour with my usual offline detection scripts any more.

@MHerbst
Copy link
Contributor

MHerbst commented Nov 2, 2021

Aand exactly the same device has problems again. Mysterious indeed.

Is there something different in the configuration of this "Rauchmelder"? Maybe a debug or trace log could help.

I've never seen these messages before so I suspect the mostly-fixed ccu3 reboot bahaviour has something to do with it. If you want I can open a new issue of course.

I agree with you that these messages have to do with the restart. Maybe it is because of some unknown state value. getParamset is AFAIK the request that fails while the binding tries to re-connect. I think this behavior is acceptable.

Unfortunately the current state where devices flake in and out after a ccu3 reboot is somewhat worse than before bevause I can not reliably recognize this behaviour with my usual offline detection scripts any more.

Maybe this can be improved but this would probably mean some greater changes. Maybe you can bypass it if you wait for some minutes after a discovered disconnect before you check again.

This issue is already closed, so it would be better to create new issues for the remaining problems like the XMLRPC messages and the strange behavior of the one Rauchmelder.

thinkingstone pushed a commit to thinkingstone/openhab-addons that referenced this issue Nov 7, 2021
* Replace deprecated constructors
* Removed no longer existing settings from the documentation. They were already marked as deprecated since several versions.
* Refactored communication with the HM gateway
- simplified coding for the communication with the gateway
- buffer size for communication is now configurable to avoid problems
with too small buffers
- Previous solution for openhab#6963 was not sufficient. Should be finally done
with these changes
* Retrieving the duty cycle is sufficient to check connection
- ping requests could therefore be safely removed problems with the automatic reconnection were solved.
* Changed to explicit list of Exception

Fixes openhab#8808

Signed-off-by: Martin Herbst <develop@mherbst.de>
@eikowagenknecht
Copy link
Contributor

It's configured exactly like all the other "Rauchmelder". I'll try to get a debug/trace log and open a new issue then.

fwolter pushed a commit that referenced this issue Dec 11, 2021
* Use globally unique id for registration of callback to allow ...

the connection of multiple OH installations with one CCU.

The bridge id is not sufficient for this purpose because it is same in
all OH installations.


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Retry callback re-registration after connection is resumed

Some services on the CCU need longer to start and are not available
immediately after the connection to the CCU has been resumed.

Improves the solution for #8808

Fixes #10439

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Description was missing.

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Changed setting name and description to avoid confusion


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Added a troubleshooting tip to solve a communication problem


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Shortened the label name to follow the guide lines

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Print more information about the reason for the failure

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Using scheduler thread pool and simplified configuration

Instead of configuring separate values for retry delays and number of
retries only the maximum time for retries can be configured.
The init method uses fixed delays.

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Don't retry to send if gateway does not answer at all

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Improved reconnect handling

- unregister callback not necessary if connection is lost
- wait 30s until clients and servers are restarted to give the gateway
some time to recover

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Spotless

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Cancel an active future if the binding is stopped

Signed-off-by: Martin Herbst <develop@mherbst.de>
NickWaterton pushed a commit to NickWaterton/openhab-addons that referenced this issue Dec 30, 2021
…ab#11429)

* Use globally unique id for registration of callback to allow ...

the connection of multiple OH installations with one CCU.

The bridge id is not sufficient for this purpose because it is same in
all OH installations.


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Retry callback re-registration after connection is resumed

Some services on the CCU need longer to start and are not available
immediately after the connection to the CCU has been resumed.

Improves the solution for openhab#8808

Fixes openhab#10439

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Description was missing.

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Changed setting name and description to avoid confusion


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Added a troubleshooting tip to solve a communication problem


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Shortened the label name to follow the guide lines

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Print more information about the reason for the failure

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Using scheduler thread pool and simplified configuration

Instead of configuring separate values for retry delays and number of
retries only the maximum time for retries can be configured.
The init method uses fixed delays.

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Don't retry to send if gateway does not answer at all

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Improved reconnect handling

- unregister callback not necessary if connection is lost
- wait 30s until clients and servers are restarted to give the gateway
some time to recover

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Spotless

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Cancel an active future if the binding is stopped

Signed-off-by: Martin Herbst <develop@mherbst.de>
Signed-off-by: Nick Waterton <n.waterton@outlook.com>
mischmidt83 pushed a commit to mischmidt83/openhab-addons that referenced this issue Jan 9, 2022
…ab#11429)

* Use globally unique id for registration of callback to allow ...

the connection of multiple OH installations with one CCU.

The bridge id is not sufficient for this purpose because it is same in
all OH installations.


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Retry callback re-registration after connection is resumed

Some services on the CCU need longer to start and are not available
immediately after the connection to the CCU has been resumed.

Improves the solution for openhab#8808

Fixes openhab#10439

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Description was missing.

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Changed setting name and description to avoid confusion


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Added a troubleshooting tip to solve a communication problem


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Shortened the label name to follow the guide lines

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Print more information about the reason for the failure

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Using scheduler thread pool and simplified configuration

Instead of configuring separate values for retry delays and number of
retries only the maximum time for retries can be configured.
The init method uses fixed delays.

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Don't retry to send if gateway does not answer at all

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Improved reconnect handling

- unregister callback not necessary if connection is lost
- wait 30s until clients and servers are restarted to give the gateway
some time to recover

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Spotless

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Cancel an active future if the binding is stopped

Signed-off-by: Martin Herbst <develop@mherbst.de>
Signed-off-by: Michael Schmidt <mi.schmidt.83@gmail.com>
nemerdaud pushed a commit to nemerdaud/openhab-addons that referenced this issue Jan 28, 2022
…ab#11429)

* Use globally unique id for registration of callback to allow ...

the connection of multiple OH installations with one CCU.

The bridge id is not sufficient for this purpose because it is same in
all OH installations.


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Retry callback re-registration after connection is resumed

Some services on the CCU need longer to start and are not available
immediately after the connection to the CCU has been resumed.

Improves the solution for openhab#8808

Fixes openhab#10439

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Description was missing.

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Changed setting name and description to avoid confusion


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Added a troubleshooting tip to solve a communication problem


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Shortened the label name to follow the guide lines

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Print more information about the reason for the failure

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Using scheduler thread pool and simplified configuration

Instead of configuring separate values for retry delays and number of
retries only the maximum time for retries can be configured.
The init method uses fixed delays.

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Don't retry to send if gateway does not answer at all

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Improved reconnect handling

- unregister callback not necessary if connection is lost
- wait 30s until clients and servers are restarted to give the gateway
some time to recover

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Spotless

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Cancel an active future if the binding is stopped

Signed-off-by: Martin Herbst <develop@mherbst.de>
marcfischerboschio pushed a commit to bosch-io/openhab-addons that referenced this issue May 5, 2022
* Replace deprecated constructors
* Removed no longer existing settings from the documentation. They were already marked as deprecated since several versions.
* Refactored communication with the HM gateway
- simplified coding for the communication with the gateway
- buffer size for communication is now configurable to avoid problems
with too small buffers
- Previous solution for openhab#6963 was not sufficient. Should be finally done
with these changes
* Retrieving the duty cycle is sufficient to check connection
- ping requests could therefore be safely removed problems with the automatic reconnection were solved.
* Changed to explicit list of Exception

Fixes openhab#8808

Signed-off-by: Martin Herbst <develop@mherbst.de>
marcfischerboschio pushed a commit to bosch-io/openhab-addons that referenced this issue May 5, 2022
…ab#11429)

* Use globally unique id for registration of callback to allow ...

the connection of multiple OH installations with one CCU.

The bridge id is not sufficient for this purpose because it is same in
all OH installations.


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Retry callback re-registration after connection is resumed

Some services on the CCU need longer to start and are not available
immediately after the connection to the CCU has been resumed.

Improves the solution for openhab#8808

Fixes openhab#10439

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Description was missing.

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Changed setting name and description to avoid confusion


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Added a troubleshooting tip to solve a communication problem


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Shortened the label name to follow the guide lines

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Print more information about the reason for the failure

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Using scheduler thread pool and simplified configuration

Instead of configuring separate values for retry delays and number of
retries only the maximum time for retries can be configured.
The init method uses fixed delays.

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Don't retry to send if gateway does not answer at all

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Improved reconnect handling

- unregister callback not necessary if connection is lost
- wait 30s until clients and servers are restarted to give the gateway
some time to recover

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Spotless

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Cancel an active future if the binding is stopped

Signed-off-by: Martin Herbst <develop@mherbst.de>
andan67 pushed a commit to andan67/openhab-addons that referenced this issue Nov 6, 2022
…ab#11429)

* Use globally unique id for registration of callback to allow ...

the connection of multiple OH installations with one CCU.

The bridge id is not sufficient for this purpose because it is same in
all OH installations.


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Retry callback re-registration after connection is resumed

Some services on the CCU need longer to start and are not available
immediately after the connection to the CCU has been resumed.

Improves the solution for openhab#8808

Fixes openhab#10439

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Description was missing.

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Changed setting name and description to avoid confusion


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Added a troubleshooting tip to solve a communication problem


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Shortened the label name to follow the guide lines

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Print more information about the reason for the failure

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Using scheduler thread pool and simplified configuration

Instead of configuring separate values for retry delays and number of
retries only the maximum time for retries can be configured.
The init method uses fixed delays.

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Don't retry to send if gateway does not answer at all

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Improved reconnect handling

- unregister callback not necessary if connection is lost
- wait 30s until clients and servers are restarted to give the gateway
some time to recover

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Spotless

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Cancel an active future if the binding is stopped

Signed-off-by: Martin Herbst <develop@mherbst.de>
andrasU pushed a commit to andrasU/openhab-addons that referenced this issue Nov 12, 2022
…ab#11429)

* Use globally unique id for registration of callback to allow ...

the connection of multiple OH installations with one CCU.

The bridge id is not sufficient for this purpose because it is same in
all OH installations.


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Retry callback re-registration after connection is resumed

Some services on the CCU need longer to start and are not available
immediately after the connection to the CCU has been resumed.

Improves the solution for openhab#8808

Fixes openhab#10439

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Description was missing.

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Changed setting name and description to avoid confusion


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Added a troubleshooting tip to solve a communication problem


Signed-off-by: Martin Herbst <develop@mherbst.de>

* Shortened the label name to follow the guide lines

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Print more information about the reason for the failure

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Using scheduler thread pool and simplified configuration

Instead of configuring separate values for retry delays and number of
retries only the maximum time for retries can be configured.
The init method uses fixed delays.

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Don't retry to send if gateway does not answer at all

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Improved reconnect handling

- unregister callback not necessary if connection is lost
- wait 30s until clients and servers are restarted to give the gateway
some time to recover

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Spotless

Signed-off-by: Martin Herbst <develop@mherbst.de>

* Cancel an active future if the binding is stopped

Signed-off-by: Martin Herbst <develop@mherbst.de>
Signed-off-by: Andras Uhrin <andras.uhrin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants