Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reconnection problem when use same client id #173

Closed
werbenhu opened this issue Feb 21, 2023 · 18 comments
Closed

reconnection problem when use same client id #173

werbenhu opened this issue Feb 21, 2023 · 18 comments
Assignees
Labels
bug Something isn't working pending ok Released, awaiting confirmation of resolution

Comments

@werbenhu
Copy link
Member

BUG: If there is already a connection with a client ID, and another connection uses the same client ID to connect again, the new client cannot receive the subscription message, and the hook should call OnDisconnect() first when the new connection comes. , and then call OnConnect()

@thedevop
Copy link
Collaborator

Ensure the original client is not set to autoReconnect, otherwise both clients will take turn to connect and kick off the other client.

I just tried it with client not set to autoReconnect, and the new client with the same clientID is able to receive the subscription message.

@mochi-co , it is more logical to have OnDisconnect before OnConnect. But tricky given the current structure. BTW, in looking into this, I found cl.WritePacket is now no longer called by a single thread, so there can be multiple writes to the connection simultaneously.

@werbenhu
Copy link
Member Author

@thedevop Neither client has set to autoReconnect, but both of their Clean Session are set,you can try it again with Clean Session on.

@thedevop
Copy link
Collaborator

Thanks @werbenhu. I was able to reproduce the issue.

Below is a description of the issue:

  1. Each connection is handled by its own Go routine, and stays in attachClient
  2. When a new connection comes in with the same clientID, and let's assume for the rest of this discussion the previous connection (session) has cleanSession flag set, it will disconnect the previous connection (send out a disconn packet and potentially forcefully disconnect the connection) and unsubscribe all previous session topics, but it does not wait until the previous Go routine to exit
  3. It will then add the new session
  4. At a later time, the previous connection Go routine will detect the connection is lost, then it does a clean up
    https://github.com/mochi-co/mqtt/blob/1e8f92210202b61beb7f9fd104b057ec0a8386db/server.go#L367-L371
  5. This will remove the subscriptions of the previous session (normally not an issue as the new session probably didn't receive subscription yet), and delete the this clientID from Server.Clients (this is an issue, from the broker's perspective, this client doesn't exist any more)

In fact, if a 3rd session is created with the same clientID, it will now exhibit a different issue, both 2nd and 3rd session connections will exist on the broker, and broker state only thinks there is 3rd session.

@mochi-co , there are few potential solutions to this, for example, add an additional state, or move the clean up into cl.State.endOnce in cl.Stop, etc...

@werbenhu
Copy link
Member Author

werbenhu commented Feb 22, 2023

@thedevop i fixed it by the this rp, check client's stopCause if is ErrSessionTakenOver , if yes don't delete.

@thedevop
Copy link
Collaborator

Thanks @werbenhu , please see my comments there.

The more I think about the order or OnConnect (new connection) / OnDisconnect (previous session), the current state is accurate. The new connection (this is before establishing a session) does occur before disconnect of the previous session. There is another event OnSessionEstablished which occurs after OnDisconnect (previous session).

@werbenhu
Copy link
Member Author

@thedevop i tried OnSessionEstablished(), it is also called after OnDisconnect().

the old and new connections are in two goroutines.

new one call inheritClientSession() -> old one end cl.Read() and call OnDisconnect -> new one call OnSessionEstablished()

So OnDisconnect needs to be before cl.Read().

@torntrousers
Copy link
Collaborator

BUG: If there is already a connection with a client ID, and another connection uses the same client ID to connect again, the new client cannot receive the subscription message, and the hook should call OnDisconnect() first when the new connection comes. , and then call OnConnect()

I think we've seen the same issue too so can test it out when this is out in a release

@mochi-co mochi-co reopened this Feb 22, 2023
@mochi-co mochi-co added bug Something isn't working needs investigation Identify cause / reproduce issue labels Feb 22, 2023
@mochi-co
Copy link
Collaborator

I'm going to have a think about the simplest way to solve this issue

@werbenhu
Copy link
Member Author

werbenhu commented Feb 23, 2023

@mochi-co I just tested it on mosquitto, it will not send LWT in the case of reconnection by a same client id, maybe it is possible to consider not calling OnDisconnect(), just add an OnReconnect() interface?

@mochi-co mochi-co added pending ok Released, awaiting confirmation of resolution and removed needs investigation Identify cause / reproduce issue labels Feb 25, 2023
@mochi-co mochi-co self-assigned this Feb 25, 2023
@mochi-co
Copy link
Collaborator

I had quite a deep look into this over the last few days as this kind of behaviour is usually indicative of a more ambigious problem.

I can confirm that the issue is typically caused as @thedevop described. Simply:

  • A Client (1) connects with a specific configuration - either v3 clean session, or v5 with a 0 SessionExpiryInterval - which will later trigger the expire logic.
  • Subscriptions and Inflights are added to the Client (1) state.
  • A second Client (2) with no clean session flag connects, issues a disconnect to Client (1), and then inherits the subscriptions and inflights from Client (1).
  • The read loop for Client (1) breaks, and attachClient ends with the expire logic block being triggered and issuing unsubscribes for the client, removing the subscriptions from both the client itself (ok), and the server topics index by client id (not ok).

Additionally, the expire block then deletes the client by id, which likely causes some more unusual behaviour.

To correct this, I have implemented the following:

  1. A new test case was added which replicates this scenario (TestEstablishConnectionInheritExistingTrueTakeover).
  2. A new isTakenOver atomic bit switch has been added to the client. This is set to 1 when the client session is inherited by a new client and the client is orphaned (the pointer to the client has been, or shortly will be overwritten in the server client map).
  3. The s.unsubscribeClient logic has been adjusted to operate in two steps: first, the client subscriptions are moved from the client itself. Then, unsubscribe from the server topics index is only performed if the client has not been taken over (using cl.State.isTakenOver).
  4. When an actively connected client connection expires, it may only be deleted from the server clients map if the client has not been taken over (i.e. it is the active, operating client).
  5. All orphaned client states are now immediately cleaned up. This is to prevent sequential take-overs from potentially increasing memory usage by inflights + subs * client-id.

Finally, I want to thank you all very much for your work investigating and discussing this issue! It made my life much easier than it would have been otherwise, and I really appreciate you taking the time discuss it and help make this project better!

The PR for these changes is #180 and has been released as v2.2.4 - Please let me know if the latest release resolves the issue.

@thedevop
Copy link
Collaborator

@mochi-co , you may want make the entire expire block to check isTakenOver == 0
https://github.com/mochi-co/mqtt/blob/46babc89c82dd3ae4f45fbffb84ace8b75de1a7c/server.go#L365-L371
As both cl.ClearInflights and s.UnsubscribeClient are done in inheritClientSession already.

@mochi-co
Copy link
Collaborator

mochi-co commented Feb 25, 2023

@thedevop I did try this briefly, however note that expire must also occur on client sessions which have not been taken over - i.e. clean session clients or those with 0 expiry interval which are set to expire immediately:

expire := (cl.Properties.ProtocolVersion == 5 && cl.Properties.Props.SessionExpiryIntervalFlag && cl.Properties.Props.SessionExpiryInterval == 0) 
|| (cl.Properties.ProtocolVersion < 5 && cl.Properties.Clean)

I misread your point, you are absolutely correct. I'll try to remember to fix this, or anyone can issue a PR and I'll merge it in in for the next release 👍🏻

@torntrousers
Copy link
Collaborator

torntrousers commented Feb 27, 2023

Hello. I've updated to 2.2.4 and am still seeing a problem with reconnecting with the same client id. I test with a simple nodered flow with a single publisher and single subscriber on the same topic. The subscriber only receives messages every 2nd deploy of the nodered flow - every other deploy there is a disconnected from the broker and then the reconnection nolonger receives the messages for the subscription. Could i switch on verbose logging on Mochi or something to show you more info on what happens?

@thedevop
Copy link
Collaborator

atomic.StoreUint32(&existing.State.isTakenOver, 1) needs to be in the if block before return false as well.

https://github.com/mochi-co/mqtt/blob/46babc89c82dd3ae4f45fbffb84ace8b75de1a7c/server.go#L444-L450

thedevop added a commit to thedevop/mqtt that referenced this issue Feb 28, 2023
@werbenhu
Copy link
Member Author

werbenhu commented Feb 28, 2023

@thedevop @mochi-co thank everyone very much, this issue has been resolved.

@torntrousers
Copy link
Collaborator

could it be out in a release please so we can test it too?

mochi-co pushed a commit that referenced this issue Feb 28, 2023
* Skip expire cleanup for isTakenOver session

* Set prev connection to isTakenOver on CleanSession

#173.
@mochi-co
Copy link
Collaborator

Sorry for delay! Try https://github.com/mochi-co/mqtt/releases/tag/v2.2.5 and let me know! Thanks again @thedevop

@werbenhu werbenhu closed this as completed Mar 1, 2023
@torntrousers
Copy link
Collaborator

torntrousers commented Mar 1, 2023

Yes v2.2.5 works now for us with client reconnects. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending ok Released, awaiting confirmation of resolution
Projects
None yet
Development

No branches or pull requests

4 participants