Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues detecting when connection is lost to server #203

Open
Swedish-Coder opened this issue Jun 28, 2017 · 6 comments
Open

Issues detecting when connection is lost to server #203

Swedish-Coder opened this issue Jun 28, 2017 · 6 comments

Comments

@Swedish-Coder
Copy link

Having trouble finding out whether or not connection to a server is present or not. (using SSL)
When the connection is lost on the server side there is no event being triggered on the client.

If simple the connection is terminated on our end (pulling WAN) the client state from ClientContext keeps reporting state 4 (ESTABLISHED).

We have a check implemented which determines if we have connection to the server. This check works well before connection has been established to the server.
We are sending using sendTXT, which returns true even if there was no WAN connected.

We are using the ESP8266 core for Arduino 2.4.0 rc1 alongside the arduinoWebSockets.
Is there a check that we missed that we can use before we try to send data to our webserver?

@Links2004
Copy link
Owner

what you are looking for is the ping/pong messages of the ws protocol,
they are there to check if the other side still alive.

may:
#130
and
d271957

will help you.

Note:
the ClientContext state will only update if you try to send date, the ESP TCP stack does not implement any keep alive on TCP level.
by sending the ping periodically the connection lost can be detected very easy.
The lib will automatically try to reconnect in the next loop run if a ping fails.

@odelot
Copy link
Contributor

odelot commented Aug 14, 2017

@Swedish-Coder, have you find a workaround?

I am facing the same problem. I cannot use the ping message as we are using the websocket as a tunnel to mqtt.

The websocket client writes on the secure tcp, but it seems that it does not update the connection status.

@Swedish-Coder
Copy link
Author

@odelot We implemented an async ping method and have a pong timer. there we are are not pinging our destination but rather a google dns server (8.8.8.8). If the pong timer exceeds previous ping + timeout we close the connection. It's not an elegant solution but it gets the job done.

@odelot
Copy link
Contributor

odelot commented Aug 16, 2017

@Swedish-Coder Thank you

I am investigating this and it seems that WifiClientSecure.write returns with success even though the device has no internet connection. The websocket code is ok.

As we are using TCP, it does not seems right. The write function should just return success if it receives the ACK packets from the remote server, isn't it right? The write function is a blocking function, right?. And the websocket library use the WifiClientSecure.setNoDelay(true) to turn off Nagle algorithm to combine small tcp packages in one.

I don't know if it is a bug or a caracterist from the simplified secure tcp implementation done by the comunity. I opened an issue fo the Arduino ESP team esp8266/Arduino#3517

I would love to know if it is a bug or the way it is on ESP before implement some workaround, as this one that you suggested (thank you very much btw).

@simap
Copy link
Contributor

simap commented Mar 24, 2018

I had a similar problem, I'm broadcasting some data and if a client drops off WiFi before closing the connection, it hangs on the broadcast call repeatedly for 2s at a time. It seems that the timeout does not cause it to disconnect. It takes a long time for it to realize the dropped client. As a workaround, I force a disconnect if the broadcast takes near 2s.

Would it make sense for the library to kill any connection that times out?

@sovcik
Copy link
Contributor

sovcik commented Oct 17, 2018

Having the same issue + ping freezes ESP for 600s after 2-3 pings until (assumption) TCP timeouts.
Then Ping started working again properly.

code:

if (wssConnected != NIQ_WSS_NOT_CONNECTED){
        ws->loop();
        if (pingTimer->timeout()){
            DEBUG_PRINT("[%s] Sending WS ping\n", module);
            if (!ws->sendPing()){
                log->log(INFO, module, "Ping failed");
                DEBUG_PRINT("[%s] Ping FAILED\n", module);
                wssConnected = NIQ_WSS_NOT_CONNECTED;
            };
            DEBUG_PRINT("[%s] Restarting Ping timer\n", module);
            pingTimer->restart();
        }
        
    } else { initiate server reconnect }

Output (WAN cable was disconnected after the first WS Ping):

[NiQWSC:loop] Sending WS ping
[WS][0][sendFrame] ------- send message frame -------
[WS][0][sendFrame] fin: 1 opCode: 9 mask: 1 length: 0 headerToPayload: 0
[write] n: 6 t: 449182
[WS][0][sendFrame] sending Frame Done (29837us).
[NiQWSC:loop] Restarting Ping timer
[WS][0][handleWebsocketWaitFor] size: 2 cWsRXsize: 0
[readCb] n: 2 t: 449282
[WS][0][handleWebsocketWaitFor][readCb] size: 2 ok: 1
[WS][0][handleWebsocket] ------- read massage frame -------
[WS][0][handleWebsocket] fin: 1 rsv1: 0 rsv2: 0 rsv3 0  opCode: 10
[WS][0][handleWebsocket] mask: 0 payloadLen: 0
[WS][0][handleWebsocket] get pong ()
[NiQWSC:loop] Sending WS ping
[WS][0][sendFrame] ------- send message frame -------
[WS][0][sendFrame] fin: 1 opCode: 9 mask: 1 length: 0 headerToPayload: 0
[write] n: 6 t: 454219
[WS][0][sendFrame] sending Frame Done (21605us).
[NiQWSC:loop] Restarting Ping timer
[StatPnl:loop] Alive. Wifi=1 Cloud=1
[main:loop] Alive. millis=458928, ip=172.16.126.110, free heap=6008, u-stack=1776, free-stack=3984
[NiQWSC:loop] Sending WS ping
[WS][0][sendFrame] ------- send message frame -------
[WS][0][sendFrame] fin: 1 opCode: 9 mask: 1 length: 0 headerToPayload: 0
[write] n: 6 t: 459245
[WS][0][sendFrame] sending Frame Done (30478us).
[NiQWSC:loop] Restarting Ping timer
[NiQWSC:loop] Alive
[NiQWSC:loop] Sending WS ping
[WS][0][sendFrame] ------- send message frame -------
[WS][0][sendFrame] fin: 1 opCode: 9 mask: 1 length: 0 headerToPayload: 0
[write] n: 6 t: 464282
[WS][0][sendFrame] sending Frame Done (21785us).
[NiQWSC:loop] Restarting Ping timer
[NiQWSC:loop] Sending WS ping
[WS][0][sendFrame] ------- send message frame -------
[WS][0][sendFrame] fin: 1 opCode: 9 mask: 1 length: 0 headerToPayload: 0
[write] n: 6 t: 469309
[WS][0][sendFrame] sending Frame Done (24500us).
[NiQWSC:loop] Restarting Ping timer
[NiQWSC:loop] Sending WS ping
[WS][0][sendFrame] ------- send message frame -------
[WS][0][sendFrame] fin: 1 opCode: 9 mask: 1 length: 0 headerToPayload: 0
[write] n: 6 t: 474340

After this it got stuck for some 630 seconds

[write] not connected!
[WS][0][sendFrame] sending Frame Done (629534817us).
[NiQWSC:wLE] log entry={ "cmd":"device/log-entry","dateTime":"2018-10-17T09:56:27Z","level":"info","data":{"module":"NiQWSC:loop","text":"Ping failed"}}
[WS-Client] connection lost.
[WS-Client] client disconnected.
[NiQWSC:wsEvent-start] Free heap=22128, u-stack=1776, free-stack=3232
[NiQWSC:wsEvent] [niqstat:setCloudStat] status=0
[StatPnl:cldset] status=0 new status=0
WS Disconnected
[NiQWSC:wsEvent-end] Free heap=22128, u-stack=1712, free-stack=3232
[NiQWSC:loop] Ping FAILED
[NiQWSC:loop] Restarting Ping timer
[niqstat:setCloudStat] status=3
[StatPnl:cldset] status=0 new status=3
[StatPnl:loop] Alive. Wifi=1 Cloud=3
[QD32x16:loop] Alive.
[main:loop] Alive. millis=1104153, ip=172.16.126.110, free heap=22320, u-stack=1712, free-stack=3984

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants