Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exhaustive state conditions for all realtime operations. #112

Merged
merged 24 commits into from
Jun 16, 2016

Conversation

tcard
Copy link
Contributor

@tcard tcard commented Apr 25, 2016

When calling Connection.close() before reaching the CONNECTED state,
the iOS library was trying to send a WebSocket message before the
handshake was even finished. That caused it to crash sometimes.

That situation wasn't contemplated on the spec. I've gone through it
filling a table and filling the gaps, so that the spec is exhaustive
about all possible combinations of state and operations.

(So, yes, zwopple/PocketSocket#48 wasn't fixing
the right problem.)

@tcard
Copy link
Contributor Author

tcard commented Apr 25, 2016

This is what it looks like, which may not be obvious from the source:

captura de pantalla 2016-04-25 a les 19 41 49

@mattheworiordan
Copy link
Member

Thanks @tcard, this is really good. I will double check all the links soon, merge & update the unified spec sheet

@tcard tcard mentioned this pull request Apr 26, 2016
tcard added a commit to ably/ably-cocoa that referenced this pull request Apr 27, 2016
tcard added a commit to ably/ably-cocoa that referenced this pull request Apr 27, 2016
* Add EventEmitter.timed method.

* Use EventEmitter instead of manual timers for Realtime timeouts.

This should simplify the logic there.

* Improve error reporting of Obj-C enum in tests.

* Conform to spec changes at ably/docs#112.

* Remove outdated test.

* Replace hardcoded realtime request timeout.
@@ -301,12 +302,16 @@ h3(#realtime-connection). Connection
** @(RTN11a)@ Explicitly connects to the Ably service if not already connected
** @(RTN11b)@ An error will be indicated if the state is @CLOSING@ as the connection must complete the close request before reconnecting
* @(RTN12)@ @Connection#close@ function:
** @(RTN12a)@ Sends a @CLOSE@ @ProtocolMessage@ to the server, sets the state to @CLOSING@ and waits for a @CLOSED@ @ProtocolMessage@ to be received
** @(RTN12f)@ If the connection state is @CONNECTING@, do the operation once the connection state is @CONNECTED@
Copy link
Member

@mattheworiordan mattheworiordan Apr 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why, why not simply close the connection and move the connection to CLOSED? So I propose If the connection state is @CONNECTING@, close all transports and immediately transition the state to @CLOSED@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the client simply transitions to CLOSED, then the realtime system will never get a close event, and then the connection state will remain for 2 minutes. It is preferable that, once a connection has been initiated, it is allowed to connect and is then closed.

It would be acceptable if there was a shorter timeout waiting for the CONNECTED state (eg the client waits for 5s) because we would still catch the majority of cases. However, I'm not sure it is worth the complexity of that special case - so if CONNECTING, wait for either a CONNECTED (then doing an explicit CLOSE), or any other state, in which case the client transitions to CLOSED without further interaction.

@mattheworiordan
Copy link
Member

I have added a few comments, great bit of work as these were definitely areas not covered previously.

@paddybyers I would suggest you take a look to be sure.
@SimonWoolf if you can cast your eye too to spot any issues that would be good

* @(RTN12)@ @Connection#close@ function:
** @(RTN12a)@ Sends a @CLOSE@ @ProtocolMessage@ to the server, sets the state to @CLOSING@ and waits for a @CLOSED@ @ProtocolMessage@ to be received
** @(RTN12f)@ If the connection state is @CONNECTING@, do the operation once the connection state is @CONNECTED@
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why, why not simply close the connection and move the connection to CLOSED? So I propose If the connection state is @CONNECTING@, close all transports and immediately transition the state to @CLOSED@

If the client simply transitions to CLOSED, then the realtime system will never get a close event, and then the connection state will remain for 2 minutes. It is preferable that, once a connection has been initiated, it is allowed to connect and is then closed.

It would be acceptable if there was a shorter timeout waiting for the CONNECTED state (eg the client waits for 5s) because we would still catch the majority of cases. However, I'm not sure it is worth the complexity of that special case - so if CONNECTING, wait for either a CONNECTED (then doing an explicit CLOSE), or any other state, in which case the client transitions to CLOSED without further interaction.

I see the point, but the result is still quite weird. Presumably, a lot of the time if someone's trying to close while the client is CONNECTING, it's because the client has network issues and the client never had a chance of getting through to realtime at all. In which case the only effect of this would be to make close() appear to have no effect for the 10s it takes for realtimeRequestTimeout to expire, which is a bit rubbish.

(Or if the client's network connection fails after doing a /connect but before it receives the CONNECTED - in which case waiting for the realtimeRequestTimeout to expire still doesn't help anything, as realtime keeping the connection state for 2 minutes is inevitable).

So while this does improve behaviour in the case of a slow-but-reliable network (ie where there's a larger chance of the close() happening between the connect request but before the CONNECTED), it's at the expense of making it seem unresponsive when the network is failing/unreliable. :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, even if we do go with for waiting-for-connected, need wording for if the first state change from CONNECTING is not CONNECTED. eg on the first state change from @CONNECTING@: if @CONNECTED@ do the operation per RTN12a; else if @DISCONNECTED@ or @SUSPENDED@ set the state to @CLOSED@; else if @FAILED@ do nothing.
Or more elegantly, maybe If the connection state is @CONNECTING@, wait for the first state change, then follow the corresponding RTN12 spec for the new state)

Copy link
Member

@paddybyers paddybyers Apr 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So while this does improve behaviour in the case of a slow-but-reliable network (ie where there's a larger chance of the close() happening between the connect request but before the CONNECTED), it's at the expense of making it seem unresponsive when the network is failing/unreliable. :/

Well, yes, but if they don't care about closing connections cleanly then they would never need to call close(); they just exit. So if they are calling close() then they are making a good faith attempt to release the resources associated with the connection, so why not do that?

At the expense of further complexity we might say that the wait only occurs if it is a resume and not a new connection?

I did also suggest the possibility of a shorter timeout as another way to catch the majority case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's complex to state that calling #close, if in the connecting state, has a timer for 5s then forcibly moves to the closed state. However, would this then apply in the disconnected state as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's complex to state that calling #close, if in the connecting state, has a timer for 5s then forcibly moves to the closed state. However, would this then apply in the disconnected state as well?

The timeout for close() could be 5s unconditionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep CONNECTING and CONNECTED as semantically close as possible. It simplifies everything. Assuming the caller means something different calling close while CONNECTING than while CONNECTED is reading too much, I think. We provide all this queuing before reaching CONNECTED so that operations (publishes, presence actions, etc.) called while INITIALIZED/CONNECTING transparently happen once CONNECTED and I don't think we should depart from that behavior here.

We should tell the developer that, if she's having connection problems and wants to handle it herself instead of letting Ably's library do its reconnection/recovering thing, just call close() to clean up on Ably's side if possible, forget about that instance and make a new one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be against adding another timer (we've had bugs in ably-js before from various timers interacting in unforeseen ways, and annoyingness-to-debug seems to be exponential with number of timers going on at once. Also 5s isn't that much less than 10s, which is the current realtimeRequestTimeout)

So, OK, one more suggestion: how about we follow @tcard's proposal, with the small modification that calling close() while CONNECTING will immediately put you in the CLOSING state (but will not touch the transports). Then, if a transport connects within the realtimeRequestTimeout, send a CLOSE on it, and await the CLOSED reply (and do not change state). If not, then go into the CLOSED state upon expiration of that timeout.

That allows realtime to release resources, but avoids the weirdness of staying in a CONNECTING state for a while after someone calls close(), and avoids having to go from CONNECTING to CLOSING via an instant of CONNECTED. And from the user's perspective, it makes CONNECTING behave the same as CONNECTED as calling close on each will immediately put you in CLOSING.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see that much of a problem with what you call "weirdness"; to me, it makes for a simpler state machine, which is easier to reason about. But I'm OK with what you say if you think that's more sensible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for @SimonWoolf's proposal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at 4c177e8. (Sorry for the delay, I was under the impression I had already done this.)

This was referenced Apr 28, 2016
@tcard
Copy link
Contributor Author

tcard commented May 4, 2016

Should we wrap this up? What do @mattheworiordan @paddybyers think of @SimonWoolf 's proposal?

@tcard tcard mentioned this pull request May 4, 2016
tcard and others added 24 commits June 16, 2016 10:50
When calling `Connection.close()` before reaching the CONNECTED state,
the iOS library was trying to send a WebSocket message before the
handshake was even finished. That caused it to crash sometimes.

That situation wasn't contemplated on the spec. I've gone through it
filling a table and filling the gaps, so that the spec is exhaustive
about all possible combinations of state and operations.

(So, yes, zwopple/PocketSocket#48 wasn't fixing
the right problem.)
@mattheworiordan mattheworiordan merged commit 74fff20 into source Jun 16, 2016
@mattheworiordan mattheworiordan deleted the exhaustive-states branch June 16, 2016 09:50
QuintinWillison pushed a commit to ably/specification that referenced this pull request Sep 20, 2022
QuintinWillison pushed a commit to ably/specification that referenced this pull request Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants