Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: connection resume does not fire channel detached #539

Merged
merged 9 commits into from
Nov 24, 2016
Merged

Conversation

ricardopereira
Copy link
Contributor

@ricardopereira ricardopereira commented Nov 23, 2016

@ricardopereira
Copy link
Contributor Author

@mattheworiordan PTAL and if it's good then I fix the client.

@ricardopereira ricardopereira changed the title Fix 538 Fix: connection resume does not fire channel detached Nov 23, 2016
guard let stateChange = stateChange else {
fail("ConnectionStateChange is nil"); done(); return
}
expect(client.connection.id).to(equal(disconnectedMessage.connectionId))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be equal to the fabricated message that was received? Realtime will assign a new connectionId as that ID is now invalid


let disconnectedMessage = ARTProtocolMessage()
disconnectedMessage.action = .Disconnected
disconnectedMessage.connectionId = "xxxxxxxxx"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit unusual as a connectionId is never sent with Disconnected, only Connected. Could you not instead simple do one or both of the following:

  1. Send a new CONNECTED message with a new connectionId whilst the connection is CONNECTED. This should trigger channels to detach anyway, but that behaviour is only really expected in the 0.9 spec.
  2. Manually terminate the connection, change the connectionKey in the object programatically so that it can no longer resume, check that the CONNECTED message received when resuming has a different connectionId to the original, then check that the channels have become DETACHED with a suitable error message as well.

defer { client.dispose(); client.close() }
let channel = client.channels.get("test")

expect(client.connection.state).toEventually(equal(ARTRealtimeConnectionState.Connected), timeout: testTimeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, this is not the behaviour we'd expect as a channel ATTACH that does not become ATTACHED would be resent or timeout.

The process I said we needed to test is:

  • Connect
  • Attach a channel (i.e wait for it to be attached)
  • Token expires and connection is disconnected by Ably
  • Resume happens automatically, wait until connected
  • Publish a message over REST on the channel
  • Ensure the message is received on the channel and the channel is still attached

@@ -1949,7 +1949,7 @@ class RealtimeClientConnection: QuickSpec {
context("System's response to a resume request") {

// RTN15c1
pending("CONNECTED ProtocolMessage with the same connectionId as the current client, and no error") {
it("CONNECTED ProtocolMessage with the same connectionId as the current client, and no error") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm which other tests are marked as pending still? I was not aware we had pending tests.

Copy link
Contributor Author

@ricardopereira ricardopereira Nov 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattheworiordan Remaining pending tests:

Pending: when publishing a Message with an explicit clientId that is incompatible with the identified client’s clientId
Pending: should be used a PresenceMap to maintain a list of members
Pending: should result in an error if the channel is in the FAILED state
Pending: should result in an error if the channel moves to the FAILED state

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ricardopereira please add an issue for those pending tests as we should look at them ASAP in case there are other spec requirements that the lib does not meet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done #541.

}
}

expect(channel.state).to(equal(ARTRealtimeChannelState.Detached))
expect(client.connection.id).toNot(equal(initialConnectionId))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the logs, when the token expires the connectionId always changes, so I decided to add that and additionally it checks if the channel gets detached.

@ricardopereira
Copy link
Contributor Author

@mattheworiordan PTAL at the last commit.

@ricardopereira
Copy link
Contributor Author

ricardopereira commented Nov 23, 2016

The problems I noticed while debugging:

  • ARTRealtimeConnectionState.Connected event is not emmited when the connection renews the token
  • The connection resume only detaches the channels when the connectionId is different from the last one and an error exists

@ricardopereira ricardopereira force-pushed the fix-538 branch 2 times, most recently from 0c627a1 to 61c4b8b Compare November 23, 2016 11:32
@mattheworiordan
Copy link
Member

@ricardopereira can you address whilst reviewing the verbose logs, I can unfortunately see that the client library does not output the URL it is connecting to for new WS connections. Can you please ensure all HTTP and WebSocket connections / requests output the complete URL with headers if appropriate when verbose logging is on? as well please? Or is that covered?

it("should resume the connection after an auth renewal and detach all channels if the connectionId changes") {
let options = AblyTests.commonAppSetup()
options.disconnectedRetryTimeout = 1.0
options.tokenDetails = getTestTokenDetails(ttl: 2.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit brittle, 2s to connect & attach in CI may fail, can we not just give it a bit of buffer, say 6 seconds to make sure we don't get test failures because of delays in running code?

// Wait for token to expire
waitUntil(timeout: testTimeout) { done in
let partialDone = AblyTests.splitDone(2, done: done)
channel.once(.Detached) { error in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the channel becoming detached? If a connection is closed because of an auth failure, the channel does not become detached, it remains attached until the connection is closed or moves to the suspended state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I answered in the other comment #539 (comment)

fail("ConnectionID is nil"); return
}
expect(client.connection.id).to(equal(receivedConnectionId))
expect(client.connection.id).toNot(equal(initialConnectionId))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow why the connectionId has changed? Why? In a normal operation it won't, and as far as I can tell, you are not doing anything that would force it to change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattheworiordan the connectionId changes and that's why the channel gets detached.

First CONNECTED:

2016-11-23 13:16:04.508 xctest[32837:4092858] VERBOSE: RS:0x7f955c623000 ARTJsonLikeEncoder<msgpack>: protocolMessageFromDictionary {
    action = 4;
    connectionDetails =     {
        connectionKey = "e91RTOWlwA0vIq!tCwrawkZGCTS118T-ca6e91RTOWlwA0vIq";
        connectionStateTtl = 120000;
        maxFrameSize = 262144;
        maxIdleInterval = 15000;
        maxInboundRate = 250;
        maxMessageSize = 16384;
        serverId = "frontend.64d2.1.eu-central-1-A.i-d35f816e";
    };
    connectionId = tCwrawkZGC;
    connectionKey = "e91RTOWlwA0vIq!tCwrawkZGCTS118T-ca6e91RTOWlwA0vIq";
    connectionSerial = "-1";
}

It gets the DISCONNECTED from Ably:

2016-11-23 13:16:05.998 xctest[32837:4092858] VERBOSE: RS:0x7f955c623000 ARTJsonLikeEncoder<msgpack>: protocolMessageFromDictionary {
    action = 6;
    error =     {
        code = 40142;
        message = "Key/token status changed (expire)";
        statusCode = 401;
    };
}

The second CONNECTED:

2016-11-23 13:16:06.614 xctest[32837:4092858] VERBOSE: RS:0x7f955c623000 ARTJsonLikeEncoder<msgpack>: protocolMessageFromDictionary {
    action = 4;
    connectionDetails =     {
        connectionKey = "e91NTgaIgA0vIq!guMApg59VswaKAc4-d86e91NTgaIgA0vIq";
        connectionStateTtl = 120000;
        maxFrameSize = 262144;
        maxIdleInterval = 15000;
        maxInboundRate = 250;
        maxMessageSize = 16384;
        serverId = "frontend.25af.1.eu-central-1-A.i-305dca8c";
    };
    connectionId = guMApg59Vs;
    connectionKey = "e91NTgaIgA0vIq!guMApg59VswaKAc4-d86e91NTgaIgA0vIq";
    connectionSerial = "-1";
}

Full log: https://gist.github.com/ricardopereira/15a80610b22c9216d5587f2375359921

@mattheworiordan
Copy link
Member

mattheworiordan commented Nov 23, 2016

This is not correct, please review.

Also, can you confirm that we are testing:

  • Even with the absence of an error, if the connection ID changes, then all channels should become detached (the error is only informational and not the reason channels are detached)
  • Auth renewal triggers a reconnect with resume and resumes connections
  • Auth renewal triggers a reconnect with (fabricated) faulty resume, resumes connection, emits an error, and detaches all channels.

@ricardopereira
Copy link
Contributor Author

ricardopereira commented Nov 23, 2016

@ricardopereira can you address whilst reviewing the verbose logs, I can unfortunately see that the client library does not output the URL it is connecting to for new WS connections. Can you please ensure all HTTP and WebSocket connections / requests output the complete URL with headers if appropriate when verbose logging is on? as well please? Or is that covered?

@mattheworiordan Sorry, can you point me to a log entry where it is missing the URL ouput?
On each WebSocket connect it prints the current URL.

UPDATE: Ok, I think HTTP requests are not logging everthing.

@mattheworiordan
Copy link
Member

@mattheworiordan Sorry, can you point me to a log entry where it is missing the URL ouput?
On each WebSocket connect it prints the current URL.

The log I sent you had no WebSocket URLs?

@ricardopereira
Copy link
Contributor Author

The log I sent you had no WebSocket URLs?

@mattheworiordan i.e. line 467 R:0x600000132b60 WS:0x6080000b2300 url /?accessToken=FretAw.EuwDunZs3QlP772pw4M6sWJs_kJ8lsmgyTVuReyckGYTv3rT19ecO810t8WhFtIQGlDfPXGseF_eSB0rO-zHXEMP_ZIPppXeFH-oNWRoQ1qV5cruvBwOXN7giWgLPbSEzgiZwYLbNK7lPWxO466ls1A&clientId=development.client/167420&echo=true&format=msgpack&v=0.8&lib=ios-(null) -- wss://realtime.ably.io:443

@mattheworiordan
Copy link
Member

Ah, my bad, mind including the host in there too though? I was scanning the logs for the host, my mistake, but I am sure it will happen to others.

expect(channel.errorReason!.code).to(equal(80008))
expect(channel.errorReason!.message).to(contain("Unable to recover connection"))
guard let channelError = channel.errorReason else {
fail("Channel error is nil"); return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. Don't change this, but in future we should have an error message that explains the problem rather than the boolean logic i.e. anyone can see from this code that channel error is nil, but what the reader cares about is why nil is a problem i.e. this should instead be fail("Connection resume failed and error should be propagated to the cannel"). This helps the reader of the tests in future understand why this is a problem as opposed to just seeing that it is expected to be nil.

Copy link
Member

@mattheworiordan mattheworiordan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, 👍

Can you confirm when we're ready for the user with the issue to test this now?

@ricardopereira
Copy link
Contributor Author

@mattheworiordan Yes, sure. If it passes then it's good to go. If not then I'll see what's wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants