XHR Polling misbehaves under (very) poor network conditions #91

exFalso · 2016-02-05T13:55:00Z

A bit of context:
We started using a SockJS based protocol in production a couple of weeks ago. After a couple of days of monitoring we noticed some clients exhibiting weird behaviour in rare cases: it seemed as if newly connected clients started in the middle of the application-layer protocol rather than with a regular handshake.
After a bit of debugging we realized that the cause is that sometimes the XHR polling client sends a delayed (> 5 second delay) poll request, which is treated as a new connection. If in the meantime an /xhr_send also happens then on the server this is treated as the first message of a newly connected client.
The server side SockJS is our own implementation so I checked with the reference sockjs-node implementation and could reproduce the same issue.

Steps to reproduce:
I'll use the example echo service from the https://github.com/sockjs/sockjs-node README. The poor network condition is simulated with a port-forwarding proxy.

Start server on port 9999
Start port forward on localhost from 9998 to 9999
$ curl -X POST localhost:9998/echo/000/000/xhr open frame
$ curl -X POST localhost:9998/echo/000/000/xhr poll
$ curl localhost:9998/echo/000/000/xhr_send --data '["Hello"]' -H content-type:text/plain poll returns, server is waiting 5 seconds for next poll before close
Send SIGSTOP to the proxy. The port will stay open but data sent will be buffered by the OS
$ curl -X POST localhost:9998/echo/000/000/xhr poll, does not reach server
$ curl localhost:9998/echo/000/000/xhr_send --data '["World!"]' -H content-type:text/plain hangs
Wait 5 seconds so that server drops connection
Send SIGCONT to port forward. This will cause the /xhr and /xhr_send requests to be delivered (hopefully in this order). The /xhr request will open a new connection with the same session_id (return code 200) and the /xhr_send will send the message (return code 204), most probably causing havoc in the application layer protocol.

Solution proposal 1: Move the "open connection" functionality to a new endpoint, e.g. /xhr_open. If a rogue /xhr request arrives the server can simply disregard it. I think this is The Right Way to solve this, however it is obviously not backwards-compatible. Maybe keep the the original behaviour around as deprecated for a few versions?
Solution proposal 2: Keep around a set of recently-closed-session_ids on the server. New connections opened with such a session_id should be rejected. This has the benefit of being backwards compatible, however it adds additional complexity to the server and I reckon it also makes testing more cumbersome, as they may rely on being able to instantly open a new connection with the same session_id.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XHR Polling misbehaves under (very) poor network conditions #91

XHR Polling misbehaves under (very) poor network conditions #91

exFalso commented Feb 5, 2016

XHR Polling misbehaves under (very) poor network conditions #91

XHR Polling misbehaves under (very) poor network conditions #91

Comments

exFalso commented Feb 5, 2016