-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
leastrequest: fix data race in leastrequest picker #6587
Conversation
@zasweq : Could you please take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
// There is some degree of raciness in Pick, this is allowed: "it may be argued that | ||
// it's better to accept some degree of raciness in the picker instead." - A48 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this comment, I don't think you need to state it. If you do, post the full quote and wrap to 60 characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
60 characters
It's irrelevant now, but note that we wrap at 80 columns, not 60.
I need to add a test for concurrent RPCs (i.e. a bunch of RPCs happening in parallel on a channel with the least request lb present), as that would have caught this problem. Thanks for the contribution. |
@@ -162,18 +162,18 @@ func (p *picker) Pick(balancer.PickInfo) (balancer.PickResult, error) { | |||
pickedSC = &sc | |||
continue | |||
} | |||
if *sc.numRPCs < *pickedSC.numRPCs { | |||
if sc.numRPCs.Load() < pickedSC.numRPCs.Load() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q for @zasweq: should we cache pickedSC.numRPCs
, or reload it on every iteration as we are currently doing?
for ... {
index := ...
sc := ...
scRPCs := sc.numRPCs.Load()
if pickedSC == nil || scRPCs < pickedSCRPCs{
pickedSC = &sc
pickedSCRPCs = scRPCs
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you're saying. I think both caching it and the way it is fall within the language described here: https://github.com/grpc/proposal/blob/master/A48-xds-least-request-lb-policy.md#outstanding-request-counter-raciness. Caching it before the iteration would get rid of the race where other threads and picks can add or subtract between iterations. "While reading the outstanding request counters of samples in the picker, previously read values may become outdated" - specifically, this language I think encompasses a. scs that have already been checked and b. the picked sc. As Eric mentioned offline, I think eventually this algorithm goes towards consistency even if the checks aren't perfect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, just talked to Eric about this and he agrees with your comment. a. loads are expensive, particularly if you get a subsequent write in another Pick call, since it requires exclusive access similar to a rw lock. Thus, it is more performant to perform the operation once. b. Wrt atomics, Eric has the philosophy of only Loading once, as it makes the code easier to reason about and also prevents concurrent operations on that data breaking assumptions and logic about the code. After this is merged I'll change it and write a test for concurrent RPC's.
// There is some degree of raciness in Pick, this is allowed: "it may be argued that | ||
// it's better to accept some degree of raciness in the picker instead." - A48 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
60 characters
It's irrelevant now, but note that we wrap at 80 columns, not 60.
Also, this will need to be backported to the v1.58.x branch. |
This PR fixes a data race in least request picker which will make race detector happy by using atomic type.
Note: Even after this PR merged, there is still some raciness in this implementation, but as section Outstanding request counter raciness in A48 points out, it's on purpose for performance reason.
RELEASE NOTES: none