-
-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sente not detecting non-terminating client disconnects? (e.g. sudden power loss / airplane mode) #230
Comments
Hi there, thanks for the report. What Android device, browser, and HTTP Is this over Ajax, websockets, or both? On Mon, May 2, 2016, 21:56 altV notifications@github.com wrote:
|
Hi, Samsung Galaxy S6 / Default browser
It is over websocket - because |
Is this using the sample application? Turning the logging level up and providing the server and client logs would probably be quite a good start for tracking this down. |
sorry, new to clojure -- how do I switch on logs on sente? |
https://github.com/ptaoussanis/sente/blob/master/example-project/src/example/client.cljs#L18 and https://github.com/ptaoussanis/sente/blob/master/example-project/src/example/server.clj#L29. I recommend running from the example project so we can help debug from a known base. |
Just to add to Daniel's good advice - the most likely causes of Sente not reflecting a disconnected client would be (in order):
Debugging this with the example project, I'd start by making sure you've ruled out (1) for sure, then I'd suggest ruling out (2) by trying an alternative http server. Hope that helps, cheers! :-) |
Next steps:
|
Hmm, looks like there's a problem with the example project, it didn't work for me either. |
@altV Thanks for the info, will follow up in a moment. @danielcompton Hi Daniel, this is the example project currently on master? Seems to be working for me. Please note that it's started with I do |
@altV Responses inline...
Please note that the example project is started with
Could you please rephrase this, I don't follow? Figwheel isn't included in the reference example; are you including it (or any other plugins) manually? Please let me know of any changes you might be making to the reference example project.
This is in the example project? Are you sure you're using the latest example project (on master). The current example project already includes the latest+recommended version of http-kit ("2.2.0-alpha1").
My recommendation:
Then let's go from there. Thanks! |
Thank you, I'll try to do that. I've just switched to immutant/web (undertow?), and you won't believe this! It is still the same behavior - flight mode on the phone, and the client is still in the (My :user-id-fn is
just in case using vectors and long strings can affect something.) |
Peter, I ran example-project with lein start.
Web browser cannot request main.js - it's not served (404).
(my comment
means "I've read through the source code of sente, and found one mention of closing web-sockets, that looks like it's coming from the web-server. From it I see why you said that it is a possibility for bug coming from webserver") |
I've copied main.js from target/ to resources/public/ Now figuring out how to debug disconnects. Enabling more logging as described above - without it information is not enough from the log. |
Thanks for catching this! The project was supposed to contain a symlink; hand't noticed that it was being excluded by |
Sorry, can you clarify what you mean by "client dissapears in 5 seconds"? You connect your phone to a Sente channel socket. The uid is automatically added under the :ws key. You turn on your phone's flight mode to simulate a dropped connection. Then the uid is removed from the :ws key after 5 seconds? If I've understood that correctly, then that's exactly the correct+expected behaviour. Am I misunderstanding something? Are you suggesting that everything works correctly with Immutant, but you see different (broken) behaviour with http-kit? Please try be explicit about what behaviour you're expecting to see vs what behaviour you're actually seeing that seems broken. And if you are seeing behaviour that seems to be broken somehow, I'd ask that we please confirm that it's happening also with the reference example. Thanks a lot for all your help debugging! Cheers :-) |
In case this is the point of confusion- it's normal for a disconnect to reflect in the Is that maybe the behaviour you were identifying as broken? |
It stays forever with both immutant and http-kit. Scenario 1 - good one. tested with http-kit 2.1.19, http-kit 2.2.0-alpha, and immutant 2.1.4.
Scenario 2 - problematic one. tested with http-kit 2.1.19, http-kit 2.2.0-alpha, and immutant 2.1.4.
I'm in hotel now so I can't experiment until tomorrow, but I guess I need to 1. test it with example project with much more logging, and then after probably 2. build minimal websocket server with http-kit and see how it interacts with websockets and which events get sent. Scenario 2 works fine when client is iPad / Mobile Safari. |
umm. well, now that I'm reading the Internet and the design of TCP protocol - is it a feature, not a bug? Websockets are handled over TCP, and they say TCP does not handle all the disconnects, only proper explicit ones, so does it mean that in order to detect timeouts within some interval ("disconnects") I should enable some application level keep-alives with this desired interval? |
You should still see the uid eventually get auto-removed from your connected uids set after Sente's next WebSocket keepalive attempt fails though. Sente's default WebSocket keepalive is 25 seconds. When you switch off your phone / switch on airplane mode - is the uid still around in 26 seconds? |
Hmm, scratch that; obviously won't work if you've switched your phone off (the keepalive is currently initiated by the client). This may well be an edge case I hadn't considered that should be addressed Sente side. Let me think about this some + get back to you. Any other thoughts of course welcome in the meantime. |
What OS are you running your Sente server on? Any idea what your kernel TCP keepalive params are? |
@altV Have just pushed Just added a simple GC mechanism to catch dead WebSocket connections. Default settings will perform GC after 60 seconds of inactivity. I.e. after you enable airplane mode, you should now see the uid correctly disconnect after 60 seconds. |
If a WebSocket connection is closed without normal termination (e.g. as caused by sudden loss of power, airplane mode, etc.) - it can hang around in conns_ for an extended period of time until the underlying TCP connection is identified as dead. This simple mod allows Sente's server to perform WebSocket connection GC for clients that haven't communicated with the server w/in a defined amount of time (must be > client :ws-kalive-ms). Big thanks to @altV for helping to catch + diagnose this issue.
If a WebSocket connection is closed without normal termination (e.g. as caused by sudden loss of power, airplane mode, etc.) - it can hang around in conns_ for an extended period of time until the underlying TCP connection is identified as dead. This simple mod allows Sente's server to perform WebSocket connection GC for clients that haven't communicated with the server w/in a defined amount of time (must be > client :ws-kalive-ms). Big thanks to @altV for helping to catch + diagnose this issue.
@altV I'm curious, how long had you waited? In theory they should be disconnected after a long time, 2 hours and 12 minutes for a Linux server with the default config: http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html
Edit: The socket would need to be opened with keepalive support by the server for this to take affect, I am unsure if either server does this. Edit 2: I figured out why i haven't encountered this. I use nginx as a proxy which appanrelty closes connections that have been idle for 60 seconds: https://blog.martinfjordvald.com/2013/02/websockets-in-nginx/
@ptaoussanis Have you considered reversing the ping? Often a server would ping a client, and if the client doesn't respond in a specified time it disconnects them. IRC comes to mind: https://tools.ietf.org/html/rfc1459#section-4.6.2 I don't see a downside to your approach though. |
@theasp Hi Andrew,
I did (was my first inclination too), but it'd require more busywork on the server side. A client->server ping is simpler to implement, and won't affect server performance (the clients bear administrative costs like tracking when pings might/not be necessary due to other messages recently being sent, etc.). I.e. the current implementation is simpler + more efficient than a server->client ping would be to achieve the ~same result. Does that make sense? |
Yes, perfect sense. Thank you. |
I've been thinking about this, and I think this explains why we were seeing Heroku warning us that some connections were timing out. They were successfully killed, but I hadn't really understood why it was happening (sporadically). |
thank you, @ptaoussanis ! it is very nice.
Just to be 100% sure, am I correct that now clients send application-level heartbeats every 60 seconds? |
@altV, @danielcompton Could someone possibly post an example Heroku warning here? @altV Replies inline:
Please note that the values are already configurable with the
Yeah, that sounds like code reloading.
Sente clients have always automatically sent PINGs as a form of keepalive. These default to a 25 second interval, but may be skipped if other data was sent w/in the same window. The thing that's changed is that the server will now keep track of how long it's been since it has seen a message from each client. If no message has been received in |
The Heroku warning is an H15:
Reading that page, Heroku enforces a 30 second timeout for the first response back to the client, but web sockets only need to send keep alives every 55 seconds. |
… Heroku warnings With the old values, non-terminating connections (e.g. sudden airplane mode) could cause conns to be forcefully terminated by Heroku with an "idle connection" warning message[1]. With the new values, our gc should normally kick in before Heroku's. Waiting on confirmation that this commit actually helps resolve these warnings. [1] Ref. https://devcenter.heroku.com/articles/error-codes#h15-idle-connection
@danielcompton Thanks! |
… Heroku warnings With the old values, non-terminating connections (e.g. sudden airplane mode) could cause conns to be forcefully terminated by Heroku with an "idle connection" warning message[1]. With the new values, our gc should normally kick in before Heroku's. Waiting on confirmation that this commit actually helps resolve these warnings. [1] Ref. https://devcenter.heroku.com/articles/error-codes#h15-idle-connection
Umm. Unfortunately, I cannot provide more testing until the end of next week (pessimistically). We can close the issue and reopen it if there will be issues. I'll reply back by then. |
No problem, please take your time :-) |
This should in principle be resolved so going to close for now. Please feel free to reopen if you still happen to run into any issues again later. Cheers :-) |
looks like it works, but I will do even more testing. Many thanks for your fix! |
v1.9.0 is a significant, non-breaking upgrade focused on cleaning up Sente's implementation to make it more robust, and to help address a couple of issues that were recently identified (#201 and #230). TODO - [ ] Test new pings, timeouts, etc. (http-kit) [ ] Test new pings, timeouts, etc. (Immutant) [ ] Final round of housekeeping [ ] Cut an alpha1 release This is a squashed commit that includes: - Significant code refactor. - Extended ref example to incl. a pair of buttons to excercise async push features. - Drop (experimental) flexi packer. - [#161] Clojure-side Transit optimizations Specifically: 1. Now cache read-handlers and write-handlers. Can't tell if this actually realistically helps perf since I've got no handler maps to test against; feedback welcome. 2. Now cache (thread-local) writer (allows baos reuse). Doesn't seem to actually realistically help perf from what I can tell. Might nix this later since this does add some complexity to the impl. - [#201] Add support for more flexible conn-type upgrade/downgrade Initial downgrade strategy here is simple, may or may not turn out to be useful; still need to check. Point was to get a more flexible base that we can build on in future. - [#230] Server-side ping to help GC non-terminating WebSocket conns. If a WebSocket connection is closed without normal termination (e.g. as caused by sudden loss of power, airplane mode, etc.) - it could hang around in conns_ for an extended period of time until the underlying TCP connection was identified as dead. This mods Sente's keep-alive ping from client->server to server->client, allowing the server to identify (and auto-gc) dead but abnormally terminated WebSocket connections. Big thanks to @altV for helping to catch + diagnose this issue. - [#230] Experimental: tweak timeout defaults to help suppress possible Heroku warnings. With the old values, abnormally terminated conns (e.g. sudden airplane mode) could cause conns to be forcefully terminated by Heroku with an "idle connection" warning message[1]. With the new values, our gc should normally kick in before Heroku's. Waiting on confirmation that this commit actually helps resolve these warnings. [1] Ref. https://devcenter.heroku.com/articles/error-codes#h15-idle-connection - [#150 #159] Allow server to gc lp conns Using the infrastructure in place for #230, have decided to now initiate Ajax timeouts from the server side instead of the client side. As with #230, this'll help the underlying http server gc any abnormal connections. In particular, this should (?) resolve #159 (for Immutant) w/o the need for the earlier workaround that was introduced.
v1.9.0 is a significant, non-breaking upgrade focused on addressing a couple minor issues that were recently identified (#201 and #230). While in the code, also took the opportunity to refactor some implementation details. - @ptaoussanis This is a squashed commit that includes: - Extend ref example to incl. a pair of buttons to excercise async push features. - Drop (experimental) flexi packer. - [#161] Clojure-side Transit optimizations Specifically: 1. Now cache read-handlers and write-handlers. Can't tell if this actually realistically helps perf since I've got no handler maps to test against; feedback welcome. 2. Now cache (thread-local) writer (allows baos reuse). Doesn't seem to actually realistically help perf from what I can tell. Might nix this later since this does add some complexity to the impl. - [#201] Add support for more flexible conn-type upgrade/downgrade Initial downgrade strategy here is simple, may or may not turn out to be useful; still need to check. Point was to get a more flexible base that we can build on in future. - [#230] Server-side ping to help GC non-terminating WebSocket conns. If a WebSocket connection is closed without normal termination (e.g. as caused by sudden loss of power, airplane mode, etc.) - it could hang around in conns_ for an extended period of time until the underlying TCP connection was identified as dead. This mods Sente's keep-alive ping from client->server to server->client, allowing the server to identify (and auto-gc) dead but abnormally terminated WebSocket connections. Big thanks to @altV for helping to catch + diagnose this issue. This change should also help to suppress possible "idle connection" Heroku warnings, Ref. https://goo.gl/zLR0Gk - [#150 #159] Allow server to gc lp conns Using the infrastructure in place for #230, have decided to now initiate Ajax timeouts from the server side instead of the client side. As with #230, this'll help the underlying http server gc any abnormal connections. In particular, this should (?) resolve #159 (for Immutant) w/o the need for the earlier Immutant-side workaround introduced for that issue.
v1.9.0 is a significant, non-breaking upgrade focused on addressing a couple minor issues that were recently identified (#201 and #230). While in the code, also took the opportunity to refactor some implementation details. - @ptaoussanis This is a squashed commit that includes: - Extend ref example to incl. a pair of buttons to excercise async push features. - Drop (experimental) flexi packer. - [#161] Clojure-side Transit optimizations Specifically: 1. Now cache read-handlers and write-handlers. Can't tell if this actually realistically helps perf since I've got no handler maps to test against; feedback welcome. 2. Now cache (thread-local) writer (allows baos reuse). Doesn't seem to actually realistically help perf from what I can tell. Might nix this later since this does add some complexity to the impl. - [#201] Add support for more flexible conn-type upgrade/downgrade Initial downgrade strategy here is simple, may or may not turn out to be useful; still need to check. Point was to get a more flexible base that we can build on in future. - [#230] Server-side ping to help GC non-terminating WebSocket conns. If a WebSocket connection is closed without normal termination (e.g. as caused by sudden loss of power, airplane mode, etc.) - it could hang around in conns_ for an extended period of time until the underlying TCP connection was identified as dead. This mods Sente's keep-alive ping from client->server to server->client, allowing the server to identify (and auto-gc) dead but abnormally terminated WebSocket connections. Big thanks to @altV for helping to catch + diagnose this issue. This change should also help to suppress possible "idle connection" Heroku warnings, Ref. https://goo.gl/zLR0Gk - [#150 #159] Allow server to gc lp conns Using the infrastructure in place for #230, have decided to now initiate Ajax timeouts from the server side instead of the client side. As with #230, this'll help the underlying http server gc any abnormal connections. In particular, this should (?) resolve #159 (for Immutant) w/o the need for the earlier Immutant-side workaround introduced for that issue.
v1.9.0 is a significant, non-breaking upgrade focused on addressing a couple minor issues that were recently identified (#201 and #230). While in the code, also took the opportunity to refactor some implementation details. - @ptaoussanis This is a squashed commit that includes: - Extend ref example to incl. a pair of buttons to excercise async push features. - Drop (experimental) flexi packer. - [#161] Clojure-side Transit optimizations Specifically: 1. Now cache read-handlers and write-handlers. Can't tell if this actually realistically helps perf since I've got no handler maps to test against; feedback welcome. 2. Now cache (thread-local) writer (allows baos reuse). Doesn't seem to actually realistically help perf from what I can tell. Might nix this later since this does add some complexity to the impl. - [#201] Add support for more flexible conn-type upgrade/downgrade Initial downgrade strategy here is simple, may or may not turn out to be useful; still need to check. Point was to get a more flexible base that we can build on in future. - [#230] Server-side ping to help GC non-terminating WebSocket conns. If a WebSocket connection is closed without normal termination (e.g. as caused by sudden loss of power, airplane mode, etc.) - it could hang around in conns_ for an extended period of time until the underlying TCP connection was identified as dead. This mods Sente's keep-alive ping from client->server to server->client, allowing the server to identify (and auto-gc) dead but abnormally terminated WebSocket connections. Big thanks to @altV for helping to catch + diagnose this issue. This change should also help to suppress possible "idle connection" Heroku warnings, Ref. https://goo.gl/zLR0Gk - [#150 #159] Allow server to gc lp conns Using the infrastructure in place for #230, have decided to now initiate Ajax timeouts from the server side instead of the client side. As with #230, this'll help the underlying http server gc any abnormal connections. In particular, this should (?) resolve #159 (for Immutant) w/o the need for the earlier Immutant-side workaround introduced for that issue.
Quick sketch of a possible implementation. Didn't test, or think about this much. Situation before #230: Clients maintain keep-alive, sending pings to server. On sudden conn break (e.g. airplane mode), clients would know about the break but the server wouldn't Situation after #230: Server maintains keep-alive, sending pings to clients. On sudden conn break (e.g. airplane mode), server would know about the break but clients wouldn't. As of this commit: Server and clients each maintain a keep-alive[1]. I.e. each side will attempt to ping if it hasn't heard from the other side in a prescribed window. On sudden conn break, each side will be made aware of the break when its own scheduled window fires and fails to successfully send a ping. [1] Timeouts should be different. In particular, the client may like to choose a much more aggressive window (e.g. 5s) to provide a rapid indicator to users.
…ane mode) Situation before #230: Clients maintain keep-alive, sending pings to server. On sudden conn break (e.g. airplane mode), clients would know about the break but the server wouldn't Situation after #230: Server maintains keep-alive, sending pings to clients. On sudden conn break (e.g. airplane mode), server would know about the break but clients wouldn't. As of this commit: Server and clients each maintain a keep-alive[1]. I.e. each side will attempt to ping if it hasn't heard from the other side in a prescribed window. On sudden conn break, each side will be made aware of the break when its own scheduled window fires and fails to successfully send a ping. [1] Timeouts should be different. In particular, the client may like to choose a much more aggressive window (e.g. 5s) to provide a rapid indicator to users.
…ane mode) Situation before #230: Clients maintain keep-alive, sending pings to server. On sudden conn break (e.g. airplane mode), clients would know about the break but the server wouldn't Situation after #230: Server maintains keep-alive, sending pings to clients. On sudden conn break (e.g. airplane mode), server would know about the break but clients wouldn't. As of this commit: Server and clients each maintain a keep-alive[1]. I.e. each side will attempt to ping if it hasn't heard from the other side in a prescribed window. On sudden conn break, each side will be made aware of the break when its own scheduled window fires and fails to successfully send a ping. [1] Timeouts should be different. In particular, the client may like to choose a much more aggressive window (e.g. 5s) to provide a rapid indicator to users.
I'm using android default browser, and I see its id in
connected-ids
atom after turning on Flight mode or phone restarts.Closing android browser or navigating away from the page works fine, though.
The text was updated successfully, but these errors were encountered: