-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[remoteopenhab] Reliable SSE reconnection mechanism #9680
Comments
Couldn't you poll some state, that is available in all OH instances? |
This is what is already done but not enough reliable. I have implemented my option 2 but not yet created a PR. Not fully finished and not fully satisfied by my code. |
OK, great that you have a solution! Maybe my thoughts were too simple. I meant that you penetrate the remote OH so that an event via the SSE link is sent and if it's not received in time, the connection is re-established. |
The JAX RS SSE API allows to register onComplete. And you know ? This is triggered when the remote server is stopped. So I have a reliable way to detect that the remote server was shutdown and so reopen a new SSE when the remote server is alive again. I still have to check if this cover the case the server or the client is just disconnected from the network. |
Great that I could help by accident :) |
In fact you did not help ;) but I told me that I should test this feature ! Another good news: if the server and the client are not reachable during a certain time because I disconnected my network cable, when I reconnect my cable, the events are again received by the client without any additional code. |
Regarding the network cut, unfortunately it depends on the duration. If I cut the network around 45s, no problem events are coming back after reconnection. If the network is cut during 3 minutes, then events will never come again when I reconnect the cable and I have nothing that tells me that this SSE connection is a phantom connection. |
When the network is cut a certain amount of time (around 2 minutes in my case), it looks the OH server data sending is failing and the sink is closed.
So a timeout on server side is reached because the client is not reading data and as a consequence the OH server is closing the SSE sink. But as the network is still cut, the client is never informed of this closure. As you can see in the server stack trace, this is the Jetty idle timeout. At least now, I know that short network cuts are handled transparently and correctly. |
when the remote server is alive again Related to openhab#9680 Signed-off-by: Laurent Garnier <lg.hc@free.fr>
#10060 works for me if the remote is shut-down gracefully. "idleTimeout" sounds like client and server are exchanging data constantly. So, the client should be able to detect the silence, too, theoretically. I'm wondering why the client doesn't detect that the TCP connection is closed. I'd assume that the client's OS should close the TCP connection after some time when the remote is gone. |
For me this is logic. In case of a long network failure (because I unplug the network cable several minutes), the server is blocked and can no more send data to the client, it detects it and decide to close the connection (OH server deciision). As the client is not reachable, it is not informed. When I plug again the cable, the client is still waiting for events but it will never receive event because the server has already closed this connection. I worked this morning to add an advanced setting to restart the SSE connection in case there is no activity on the SSE link. During my tests, I discovered a new exception, not in SSE but when I run a first new HTTP request to the server after I plug again my calble: EOFException ! The next request is then OK. Maybe a retry mechanism is required. |
But the most important was the proper detection of the remote server shutdown or restart and at now this case is fixed ;) |
As I mentioned yesterday, if your network failure is short, something like 30s for example, everything is OK. |
Are you going to generate some dummy traffic by the client, that the connection is not closed if there is simply no data to be transmitted? I think you mentioned this case in an older post. |
In fact, if the client sends periodic keepalives you should be able to rely on the client's TCP stack to close the connection. I'd assume that |
The SSE in in one direction, server to client. The client cannot send to the server. |
…ivity Fix openhab#9680 Signed-off-by: Laurent Garnier <lg.hc@free.fr>
…ivity Fix openhab#9680 Signed-off-by: Laurent Garnier <lg.hc@free.fr>
New setting implemented. Disabled by default. I recommend to enable it if your remote server generate regular events. |
What a shame that the client can't send anything... I'm wondering if you could utilize the TCP stack's keepalive ability by setting SO_KEEPALIVE. There's a great article explaining this: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/ under "Idle ESTAB is forever". I peaked into the source and saw the flag: https://github.com/apache/cxf/search?q=keepalive |
I tested your code in #10063 sucessfully. Nevertheless, it would be nice if the OS could handle a dead TCP connection, but I didn't figure out how to tell Apache CXF to use SO_KEEPALIVE. |
openhab#10060) when the remote server is alive again Related to openhab#9680 Signed-off-by: Laurent Garnier <lg.hc@free.fr> Signed-off-by: John Marshall <john.marshall.au@gmail.com>
openhab#10063) * [remoteopenhab] New setting to restart the SSE connection after inactivity Fix openhab#9680 Signed-off-by: Laurent Garnier <lg.hc@free.fr> * Review comments: doc Signed-off-by: Laurent Garnier <lg.hc@free.fr> Signed-off-by: John Marshall <john.marshall.au@gmail.com>
openhab#10060) when the remote server is alive again Related to openhab#9680 Signed-off-by: Laurent Garnier <lg.hc@free.fr>
openhab#10063) * [remoteopenhab] New setting to restart the SSE connection after inactivity Fix openhab#9680 Signed-off-by: Laurent Garnier <lg.hc@free.fr> * Review comments: doc Signed-off-by: Laurent Garnier <lg.hc@free.fr>
openhab#10060) when the remote server is alive again Related to openhab#9680 Signed-off-by: Laurent Garnier <lg.hc@free.fr>
openhab#10063) * [remoteopenhab] New setting to restart the SSE connection after inactivity Fix openhab#9680 Signed-off-by: Laurent Garnier <lg.hc@free.fr> * Review comments: doc Signed-off-by: Laurent Garnier <lg.hc@free.fr>
A mechanism is already implemented to detect remote servers that are no more reachable and to automatically reconnect when the remote server is reachable again.
Unfortunately, this mechanism is not yet 100% reliable and the binding could sometimes not see that the remote server was OFF at a certain time, in particular if the remote server was restarted on a fast machine. This leads to the SSE connection not failing in error but not receiving new events when the remote server is alive again.
One option could be to frequently restart the SSE connection but I think there is a risk to loose some events and it looks a little too much just to avoid a potential remote server restart (that almost never happens in normal case).
Another option I just thought about would be to frequently open a new SSE connection and when opened, close the old one. Doing that, we are sure to reestablish the SSE link and to not loose events in normal situation. We just would have to avoid handling some events twice (one for each alive SSE link). I will try to implement this solution.
If someone has a better option to detect server disconnection from a SSE link, he is welcome to propose his solution.
PS: Ideally, the best solution would have been to have the openHAB server sending an ALIVE message to SSE clients. Like that, any client could detect a dead server. This could be implemented in OH3 but of course this would be missing in OH2.x servers.
The text was updated successfully, but these errors were encountered: