client: fix race between connection error and subconn shutdown #6494

dfawley · 2023-08-01T21:14:41Z

This is something I noticed when adding the SubConn.Shutdown method in #6493, but never observed. If we fail to check ac.ctx while holding ac.mu, then the following race is technically possible:

In resetTransport, tryAllAddrs is called and returns an error. acCtx (AKA ac.ctx) .Err() == nil as the addrConn is still in use.
From the LB policy, RemoveSubConn (or SubConn.Shutdown) is called. This sets the state to Shutdown and cancels ac.ctx.
Back in resetTransport, we grab ac.mu and update the state to TransientFailure.

The net result is the LB policy's state listener (or UpdateSubConnState) will receive Shutdown followed by TransientFailure. All LB policies should be ignoring updates after Shutdown anyway, but this is still an incorrect transition.

I don't believe this case is worth trying to stimulate with a test. The race window is extremely tiny, and such bugs could have easily existed in several other places in our code. I confirmed via manual inspection that everything either checks ac.ctx or ac.transport while holding ac.mu before calling ac.updateConnectivityState.

RELEASE NOTES: none

client: fix race between conn err and subconn shutdown

0478f66

dfawley added the Type: Bug label Aug 1, 2023

dfawley added this to the 1.58 Release milestone Aug 1, 2023

dfawley requested a review from easwars August 1, 2023 21:14

dfawley assigned easwars Aug 1, 2023

easwars approved these changes Aug 3, 2023

View reviewed changes

easwars assigned dfawley and unassigned easwars Aug 3, 2023

dfawley merged commit b9356e3 into grpc:master Aug 3, 2023
1 check passed

dfawley deleted the updateStateRace branch August 3, 2023 18:04

github-actions bot locked as resolved and limited conversation to collaborators Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client: fix race between connection error and subconn shutdown #6494

client: fix race between connection error and subconn shutdown #6494

dfawley commented Aug 1, 2023

client: fix race between connection error and subconn shutdown #6494

client: fix race between connection error and subconn shutdown #6494

Conversation

dfawley commented Aug 1, 2023