Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[openhabcloud] Reconnect on connection errors #11153

Merged
merged 7 commits into from
Nov 12, 2021

Conversation

ssalonen
Copy link
Contributor

@ssalonen ssalonen commented Aug 24, 2021

Requires openhab/openhab-osgiify#28

Aiming to resolve issues like openhab/openhab-cloud#134 . Common workaround seems to be to restart the bundle, see https://community.openhab.org/t/myopenhab-service-not-working/121711/5 and https://community.openhab.org/t/connection-to-myopenhab-broken/73808/80.

According to documentation (albeit for 2.x Socket IO version) [1], reconnection is responsibility of the user on connect_error events. Indeed, default implementation (1.0.0 and 1.0.1) only tries to reconnect if it is the first time we are connecting... **EDIT: ** please see #11153 (comment) for more discussion how this behaves in socket io 1.x java client.

[1] Lifecycle diagram in https://socketio.github.io/socket.io-client-java/socket_instance.html

Signed-off-by: Sami Salonen ssalonen@gmail.com

According to documentation (albeit for 2.x Socket IO version) [1],
reconnection is responsibility of the user on connect_error events.

[1] Lifecycle diagram in
  https://socketio.github.io/socket.io-client-java/socket_instance.html

Signed-off-by: Sami Salonen <ssalonen@gmail.com>
Signed-off-by: Sami Salonen <ssalonen@gmail.com>
@ssalonen ssalonen requested a review from kaikreuzer as a code owner August 24, 2021 08:02
@lolodomo
Copy link
Contributor

Take care, the version is also present in the feature.xml file.
And as a side note, the same library is also used by the ambientweather binding. So it should probably be updated too.

@digitaldan
Copy link
Contributor

Thanks for pointing out the lifecycle documentation and addressing this issue!

Signed-off-by: Sami Salonen <ssalonen@gmail.com>
Signed-off-by: Sami Salonen <ssalonen@gmail.com>
@ssalonen
Copy link
Contributor Author

ssalonen commented Aug 25, 2021

On closer look, Socket.EVENT_CONNECT_ERROR was named Socket.EVENT_ERROR in 1.x, see https://socketio.github.io/socket.io-client-java/migrating_from_1_x.html. Reading the 1.x codebase carefully [1], one can see that with the default retry-parameters, the Socket/Manager connect_error events are in fact retried automatically by socket.io library. I have tested to the extent possibly by simulating tcp delays and packet loss.

However, Socket.EVENT_ERROR event in 1.0.1 can be also transmitted even though connection is well established, see e.g. https://github.com/socketio/socket.io-client-java/blob/socket.io-client-1.0.1/src/main/java/io/socket/client/Socket.java#L318-L320 -- so it is not a one-to-one replacement of 2.x Socket.EVENT_CONNECT_ERROR.

We can see that in 2.0.1, Socket.EVENT_CONNECT_ERROR is emitted when Manager emits error event and Socket is unconnected:

https://github.com/socketio/socket.io-client-java/blob/socket.io-client-2.0.1/src/main/java/io/socket/client/Socket.java#L91-L98

One way to confirm changed behaviour is to introduce invalid UUID, which is rejected by the server ("not authorized" error). With the new code, client would automatically retry. Old implementation stopped alltogether without retriying.


[1] Manager.open() is first called with null OpenCallback. Error events are captured during connection phase by subscribing to engine socket events. Note how this error handler is removed when connection has been successfully subscribed (onopen calls cleanup()).

We can see how connect_error event is emitted in the error handler. Since OpenCallback == null, maybeReconnectOnOpen will be called. The conditional passes and reconnect method is called.

In reconnect method, Manager.open is called again, but now with non-null OpenCallback (!), that is, there is a custom error handler for errors occurring during connect. The custom error handler simply re-connects (with no upper bound by default), and thus side-steps the guards set in maybeReconnectOnOpen.

One can easily test this by blocking traffic to myopenhab.org, and observing how connection is retried, even though connect_error events are emitted.


I found it handy to simulate network issues while testing the changes (a bit glitchy as loadbalancer might hand out another ip suddenly):

Introducing 5 sec delay to myopenhab.org

IF=enp0s31f6
myohip=$(nslookup myopenhab.org|grep Address|tail -n1|cut -d ' ' -f2)
sudo tc qdisc del dev  $IF root \
 ; sudo tc filter del dev $IF \
 ; sudo tc qdisc add dev $IF root handle 1: prio priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
 && sudo tc qdisc add dev $IF parent 1:2 handle 20: netem delay 5000ms \
 && sudo tc filter add dev $IF parent 1:0 protocol ip u32 match ip dst $myohip/32 match ip dport 443 0xffff flowid 1:2

Packet loss

IF=enp0s31f6
myohip=$(nslookup myopenhab.org|grep Address|tail -n1|cut -d ' ' -f2)
sudo tc qdisc del dev  $IF root \
 ; sudo tc filter del dev $IF \
 ; sudo tc qdisc add dev $IF root handle 1: prio priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
 && sudo tc qdisc add dev $IF parent 1:2 handle 20: netem loss 100% \
 && sudo tc filter add dev $IF parent 1:0 protocol ip u32 match ip dst $myohip/32 match ip dport 443 0xffff flowid 1:2

Reseting simulations

IF=enp0s31f6
sudo tc qdisc del dev  $IF root \
; sudo tc filter del dev $IF

Signed-off-by: Sami Salonen <ssalonen@gmail.com>
Signed-off-by: Sami Salonen <ssalonen@gmail.com>
Signed-off-by: Sami Salonen <ssalonen@gmail.com>
@hmerk hmerk added the bug An unexpected problem or unintended behavior of an add-on label Sep 2, 2021
@digitaldan
Copy link
Contributor

Hi @ssalonen is this ready to review and try out ?

@ssalonen
Copy link
Contributor Author

ssalonen commented Sep 6, 2021

@digitaldan yes, please. No further changes planned at this stage.

@Nils-github
Copy link

Maybe the review is picked up faster without WIP in the title?

@ssalonen
Copy link
Contributor Author

ssalonen commented Sep 22, 2021

This should not be merged yet since it requires the other PR. Therefore I am keeping the WIP for safety.

@Nils-github
Copy link

@ssalonen:
Ok, sorry for my misunderstanding.
Maybe also a "awaiting other PR" label then?

@ssalonen ssalonen added the awaiting other PR Depends on another PR label Sep 23, 2021
@ssalonen ssalonen changed the title WIP: [openhabcloud] reconnect on connection errors [openhabcloud] reconnect on connection errors Sep 23, 2021
@lolodomo
Copy link
Contributor

lolodomo commented Oct 31, 2021

The awaiting PR is now merged, isn't it?
@digitaldan @kaikreuzer it is probably time to review

@wborn wborn added rebuild Triggers Jenkins PR build and removed awaiting other PR Depends on another PR rebuild Triggers Jenkins PR build labels Oct 31, 2021
@digitaldan
Copy link
Contributor

thanks @lolodomo for the reminder, i'll take a look.

Copy link
Contributor

@digitaldan digitaldan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I have been running this for the last week without issues, thanks for taking the time to dig into this, i also think the additional debug logging will be helpful for troubleshooting.

Copy link
Contributor

@lolodomo lolodomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lolodomo lolodomo merged commit dcca9c0 into openhab:main Nov 12, 2021
@lolodomo lolodomo added this to the 3.2 milestone Nov 12, 2021
@ssalonen
Copy link
Contributor Author

i also think the additional debug logging will be helpful for troubleshooting.

Exactly. Not sure if this PR will fix all the issues but hopefully points us to right direction with any open issues.

Btw, I noticed that the socket io client is somewhat old version... I am unsure of these are maintained anymore and if update would be preferable. Unfortunately, this might also imply updating server side...

@openhab-bot
Copy link
Collaborator

This pull request has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/is-home-myopenhab-org-down/130204/7

@wborn wborn changed the title [openhabcloud] reconnect on connection errors [openhabcloud] Reconnect on connection errors Dec 18, 2021
@openhab-bot
Copy link
Collaborator

This pull request has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/myopenhab-service-not-working/121711/15

NickWaterton pushed a commit to NickWaterton/openhab-addons that referenced this pull request Dec 30, 2021
* [openhabcloud] reconnect on connection errors

According to documentation (albeit for 2.x Socket IO version) [1],
reconnection is responsibility of the user on connect_error events.

[1] Lifecycle diagram in
  https://socketio.github.io/socket.io-client-java/socket_instance.html

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] Update Socket IO dependency to 1.0.1

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] feature.xml updated also with socket io 1.0.1

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] Re-connect manually on error events when not connected

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] less loud logging on retries

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] removing unnecessary conditional in logging

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] javadoc corrections and clarifications

Signed-off-by: Sami Salonen <ssalonen@gmail.com>
Signed-off-by: Nick Waterton <n.waterton@outlook.com>
mischmidt83 pushed a commit to mischmidt83/openhab-addons that referenced this pull request Jan 9, 2022
* [openhabcloud] reconnect on connection errors

According to documentation (albeit for 2.x Socket IO version) [1],
reconnection is responsibility of the user on connect_error events.

[1] Lifecycle diagram in
  https://socketio.github.io/socket.io-client-java/socket_instance.html

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] Update Socket IO dependency to 1.0.1

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] feature.xml updated also with socket io 1.0.1

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] Re-connect manually on error events when not connected

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] less loud logging on retries

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] removing unnecessary conditional in logging

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] javadoc corrections and clarifications

Signed-off-by: Sami Salonen <ssalonen@gmail.com>
Signed-off-by: Michael Schmidt <mi.schmidt.83@gmail.com>
nemerdaud pushed a commit to nemerdaud/openhab-addons that referenced this pull request Jan 28, 2022
* [openhabcloud] reconnect on connection errors

According to documentation (albeit for 2.x Socket IO version) [1],
reconnection is responsibility of the user on connect_error events.

[1] Lifecycle diagram in
  https://socketio.github.io/socket.io-client-java/socket_instance.html

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] Update Socket IO dependency to 1.0.1

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] feature.xml updated also with socket io 1.0.1

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] Re-connect manually on error events when not connected

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] less loud logging on retries

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] removing unnecessary conditional in logging

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] javadoc corrections and clarifications

Signed-off-by: Sami Salonen <ssalonen@gmail.com>
@openhab-bot
Copy link
Collaborator

This pull request has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/openhab-cloud-connection-stops-working-does-not-try-to-reconnect/133608/17

ssalonen added a commit to ssalonen/openhab2-addons that referenced this pull request Mar 6, 2022
Discussed in openhab#11153 (comment)

Signed-off-by: Sami Salonen <ssalonen@gmail.com>
lolodomo pushed a commit that referenced this pull request Mar 6, 2022
…12429)

Discussed in #11153 (comment)

Signed-off-by: Sami Salonen <ssalonen@gmail.com>
@openhab-bot
Copy link
Collaborator

This pull request has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/openhab-cloud-connection-stops-working-does-not-try-to-reconnect/133608/22

NickWaterton pushed a commit to NickWaterton/openhab-addons that referenced this pull request Apr 27, 2022
…penhab#12429)

Discussed in openhab#11153 (comment)

Signed-off-by: Sami Salonen <ssalonen@gmail.com>
Signed-off-by: Nick Waterton <n.waterton@outlook.com>
marcfischerboschio pushed a commit to bosch-io/openhab-addons that referenced this pull request May 5, 2022
* [openhabcloud] reconnect on connection errors

According to documentation (albeit for 2.x Socket IO version) [1],
reconnection is responsibility of the user on connect_error events.

[1] Lifecycle diagram in
  https://socketio.github.io/socket.io-client-java/socket_instance.html

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] Update Socket IO dependency to 1.0.1

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] feature.xml updated also with socket io 1.0.1

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] Re-connect manually on error events when not connected

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] less loud logging on retries

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] removing unnecessary conditional in logging

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] javadoc corrections and clarifications

Signed-off-by: Sami Salonen <ssalonen@gmail.com>
andan67 pushed a commit to andan67/openhab-addons that referenced this pull request Nov 6, 2022
* [openhabcloud] reconnect on connection errors

According to documentation (albeit for 2.x Socket IO version) [1],
reconnection is responsibility of the user on connect_error events.

[1] Lifecycle diagram in
  https://socketio.github.io/socket.io-client-java/socket_instance.html

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] Update Socket IO dependency to 1.0.1

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] feature.xml updated also with socket io 1.0.1

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] Re-connect manually on error events when not connected

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] less loud logging on retries

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] removing unnecessary conditional in logging

Signed-off-by: Sami Salonen <ssalonen@gmail.com>

* [openhabcloud] javadoc corrections and clarifications

Signed-off-by: Sami Salonen <ssalonen@gmail.com>
andan67 pushed a commit to andan67/openhab-addons that referenced this pull request Nov 6, 2022
andrasU pushed a commit to andrasU/openhab-addons that referenced this pull request Nov 12, 2022
…penhab#12429)

Discussed in openhab#11153 (comment)

Signed-off-by: Sami Salonen <ssalonen@gmail.com>
Signed-off-by: Andras Uhrin <andras.uhrin@gmail.com>
psmedley pushed a commit to psmedley/openhab-addons that referenced this pull request Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants