-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpc: eliminate panics in server worker implementation #6856
Conversation
This PR eliminates two possible panics: - write to a closed channel - this was possible when the close of the serverWorkerChannel from Stop/GracefulStop was racing with handling of a new stream. - close of a closed channel - this was possible when Stop/GracefulStop was called more than once.
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #6856 +/- ##
==========================================
- Coverage 83.64% 83.61% -0.03%
==========================================
Files 286 286
Lines 30801 30786 -15
==========================================
- Hits 25763 25743 -20
- Misses 3978 3979 +1
- Partials 1060 1064 +4
|
test/server_test.go
Outdated
defer wg.Done() | ||
for { | ||
if _, err := client.EmptyCall(ctx, &testpb.Empty{}); err != nil { | ||
t.Logf("EmptyCall() failed: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We expect some to fail with unavailable, right? So maybe silently ignore those and Error
for other errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what I see is that once the server listener is closed, the client channel moves to IDLE --> CONNECTING --> TRANSIENT_FAILURE. So, the RPCs fail with deadline exceeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, some fail with UNAVAILABLE as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems wrong, since they're not wait-for-ready RPCs. They should all fail immediately with UNAVAILABLE if there is no working connection to a server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense now. Switched to test deadline for the RPCs, which ensures that there is enough time for the the LB policy to kick the subConn into connecting (after it goes IDLE on transport closure) and then into TF.
test/server_test.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These probably truly belong in /server_ext_test.go
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the concept of these _ext_test.go
files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main point is that they're not in the grpc package, but they're adjacent to the code they're testing in the grpc package.
The test
package is too big. It has many tests of many sub-components, and we should be looking to move tests closer to the code they're testing. Also growing this one package means it doesn't scale, since the tests within it can't run in parallel.
Handled the comments. PTAL. Thanks. |
server_ext_test.go
Outdated
sCtx, sCancel := context.WithTimeout(ctx, defaultTestShortTimeout) | ||
defer sCancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably why you're getting the deadline exceeded errors. Just leave these using the base ctx and it should be fine I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh crap, didn't see this comment. But figure out the same anyways :). Could have saved 30m if I had looked earlier, but I think the investigation helped.
server_ext_test.go
Outdated
} | ||
defer ccs[i].Close() | ||
client := testgrpc.NewTestServiceClient(ccs[i]) | ||
if _, err := client.EmptyCall(ctx, &testpb.Empty{}, grpc.WaitForReady(true)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: delete this WaitForReady
as it should not be necessary and could persist a mistaken understanding that it might be necessary. (We used to have a bug waaay back when that would have required it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
This PR eliminates two possible panics:
Fixes #6854
RELEASE NOTES:
NumStreamWorkers
(experimental feature).