Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PubSub: Fix pubsub Streaming Pull shutdown on RetryError #7863

Merged
merged 1 commit into from
May 7, 2019

Conversation

plamut
Copy link
Contributor

@plamut plamut commented May 6, 2019

Closes #7709.

If a gRPC channel is in TRANSIENT_FAILURE state for too long, the retry timeout configured in subscriber client config kicks in, and a RetryError is raised in a background thread, but the client keeps running, and the error is not propagated to the top level code.

This PR makes sure that the following happens:

  • The streaming pull manager shutdown is triggered, shutting down all background threads.
  • The reason for the shutdown (RetryError) is propagated to the main thread that awaits future.result(), allowing the user code a chance to catch the error and react to it.

How to test

I was not able to reproduce the actual error users reported in a real setup (a sample pubsub app deployed to K8s), but figured out what is probably happening and faked the error.

Steps to reproduce:

--- /home/peter/workspace/google-cloud-python/venv-3.6/lib/python3.6/site-packages/grpc/_channel.py     2019-04-23 17:01:39.282064676 +0200
+++ /home/peter/workspace/google-cloud-python/venv-3.6/lib/python3.6/site-packages/grpc/_channel.py     2019-04-25 15:49:05.220317794 +0200
@@ -456,6 +456,11 @@
 
 
 def _end_unary_response_blocking(state, call, with_call, deadline):
+    state.code = grpc.StatusCode.UNAVAILABLE
+    state.details = "channel is in **fake** TRANSIENT_FAILURE state"
+    state.debug_error_string = (
+        "transient failure is faked during a fixed time window in an hour"
+    )
     if state.code is grpc.StatusCode.OK:
         if with_call:
             rendezvous = _Rendezvous(state, call, None, deadline)
  • Adjust the total_timeout_millis setting in subscriber client config to 10 (seconds... in order to not wait for too long)
  • Run a sample subscriber script (example). Prerequisite: having a Google service account configured for the subscriber client, and an active subscription to a topic.

Actual result (before the fix):
A RetryError occurs in the background after ~10 seconds, some of the threads exit, but the subscriber client keeps running, and the error is not propagated to the main thread (the future returned by the subscribe() method is not resolved)

Expected result (after the fix):
Everything gets shut down cleanly, and RetryError is propagated to and raised in the main code.

If a RetryError occurs, it is time to stop waiting for the underlying
gRPC channel to recover from a transient failure, and a clean shutdown
needs to be triggered.

This commit assures that this indeed happens (it used to happen on
terminal channel errors only).
@plamut plamut added the api: pubsub Issues related to the Pub/Sub API. label May 6, 2019
@plamut plamut requested a review from crwilcox as a code owner May 6, 2019 14:49
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label May 6, 2019
@plamut plamut added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 7, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 7, 2019
@plamut plamut added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 7, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 7, 2019
@plamut
Copy link
Contributor Author

plamut commented May 7, 2019

As discussed offline, the failing reCAPTCHA Enterprise build is not related, and we agreed to merge this.

@plamut plamut merged commit e00f6b3 into googleapis:master May 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the Pub/Sub API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Uncaught exceptions within the streaming pull code.
4 participants