Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds/ringhash: cache connectivity state of subchannels inside picker #6351

Merged
merged 3 commits into from
Jun 7, 2023

Conversation

dfawley
Copy link
Member

@dfawley dfawley commented Jun 6, 2023

I did not add a new test for this. The sequence of operations required here is fairly long and specific and requires races that are too difficult to reliably stimulate. Corresponding change in C++: grpc/grpc@98cdac3 (#32326). (gRPC-Java always kept a copy of the states in the picker.)

RELEASE NOTES:

  • xds/ringhash: cache connectivity state of subchannels inside picker to avoid rare races

@dfawley dfawley added this to the 1.57 Release milestone Jun 6, 2023
@dfawley dfawley requested a review from zasweq June 6, 2023 16:24
Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

case connectivity.TransientFailure:
// Save error to be reported via picker.
b.connErr = state.ConnectionError
b.regeneratePicker()
case connectivity.Shutdown:
// When an address was removed by resolver, b called RemoveSubConn but
// kept the sc's state in scStates. Remove state for this sc here.
delete(b.scStates, sc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Not a part of this PR, but this scStates field I feel like is wrongly named. This is a map from balancer.SubConn to the Ring Hashes SubConn with extra info. I feel like states is only a part of the reason this map exists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it probably was changed during development. 99% of why there exists a map is state tracking. It also does a queueConnect in rare cases.

case connectivity.Shutdown:
// When an address was removed by resolver, b called RemoveSubConn but
// kept the sc's state in scStates. Remove state for this sc here.
delete(b.scStates, sc)
}

if sendUpdate {
if oldSCState != newSCState {
// Because the picker caches the state of the subconns, we always regnerate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regenerate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

if sendUpdate {
if oldSCState != newSCState {
// Because the picker caches the state of the subconns, we always regnerate
// and update the picker when the effective state changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "of the SubConn." Optional addition: "This is so the Client Conn gets the most updated state as a former picker has a static cache of SubConn states at it's build time."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@zasweq zasweq assigned dfawley and unassigned zasweq Jun 6, 2023
@dfawley dfawley merged commit 761c084 into grpc:master Jun 7, 2023
1 check passed
@dfawley dfawley deleted the rh_copy_state branch June 7, 2023 16:23
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants