PubSub: Fix pubsub Streaming Pull shutdown on RetryError #7863
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #7709.
If a gRPC channel is in
TRANSIENT_FAILURE
state for too long, the retry timeout configured in subscriber client config kicks in, and aRetryError
is raised in a background thread, but the client keeps running, and the error is not propagated to the top level code.This PR makes sure that the following happens:
future.result()
, allowing the user code a chance to catch the error and react to it.How to test
I was not able to reproduce the actual error users reported in a real setup (a sample pubsub app deployed to K8s), but figured out what is probably happening and faked the error.
Steps to reproduce:
grpc
dependency in your local Python environment, example:total_timeout_millis
setting in subscriber client config to 10 (seconds... in order to not wait for too long)Actual result (before the fix):
A
RetryError
occurs in the background after ~10 seconds, some of the threads exit, but the subscriber client keeps running, and the error is not propagated to the main thread (the future returned by thesubscribe()
method is not resolved)Expected result (after the fix):
Everything gets shut down cleanly, and
RetryError
is propagated to and raised in the main code.