-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clientconn: Subconn state may transition from CONNECTING directly to IDLE #7503
Comments
Once Happy Eyeballs is implemented, the new PF will move on to the next subconn after some time, so this unexpected transition will not block the connection attempt. |
I tried to repro this issue by adding delays in the code and realised that the code correctly handles the cases in which a GoAway (or any other error) is received before the subconn state changes to READY because it locks ac.mu assigns ac.transport to the new transport, puts the subconn in READY and only then release the ac.mu. If onClose is called before ac.mu is locked, the subconn state is not set to IDLE because of this early return block in onClose. The problem is in this block that has the comment: This block is there to handle the exact scenario that I suspected was causing the issue, but it sets the state from CONNECTING directly to IDLE. There doesn't seem to be any test that covers that block, but I was able to call I need to check if we should be transitioning to READY before IDLE or expect LB policies to handle a direct transition to IDLE. |
From discussing with other maintainers, the suggestion was to block the I tried to make Reporting READY before IDLE should theoretically work. We have locked We decided to handle the CONNECTING to IDLE in the new |
In this test run it is seen that the subconn transitioned directly from CONNECTING to IDLE when the server was shut down. The current pickfirst implementation handles this by transitioning the channel to IDLE itself. However, the new pickfirst for Dual Stack gets stuck when this happens because it behaves differently based on the state after CONNECTING:
Based on the above, if the new PF sees IDLE directly, it doesn't know how to handle it. If the missing state b/w CONNECTING and IDLE was READY, then the channel would also need to be sent into IDLE. If the missing state b/w CONNECTING and IDLE was TF, then the channel state would remain unchanged and the subconn may be asked to reconnect.
To fix this, we should ensure that the transition to IDLE due to server sent GOAWAY should be reported only once the subconn has exited CONNECTING.
The text was updated successfully, but these errors were encountered: