Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PubSub: Prevent unhandled background error on SPM shutdown #8111

Merged
merged 1 commit into from
May 28, 2019

Conversation

plamut
Copy link
Contributor

@plamut plamut commented May 23, 2019

Closes #6130.

This PR fixes errors in background threads that might happen on user-initiated streaming pull shutdown, i.e. cancelling the future returned by the subscriber.subscribe() call.

How to test

Steps to reproduce:

  • Follow the steps in the issue description.
  • Preferably set the log level of the test script to logging.DEBUG. FWIW, I used the following config during development:
import logging

log_format = (
    "%(levelname)-8s [%(asctime)s] %(threadName)-33s "
    "[%(name)s] [%(filename)s:%(lineno)d][%(funcName)s] %(message)s"
)
logging.basicConfig(level=logging.DEBUG, format=log_format)

Actual result (before the fix):
An error can be observed in Thread-CallbackRequestDispatcher (the one listed under "Stacktrace 2" in the issue description). The issue is reproducible with high consistency:

AttributeError: 'NoneType' object has no attribute 'remove'

NOTE: The error listed under "Stacktrace 1" is not reproducible anymore, it has been fixed quite some time ago.

Expected result (after the fix):
The StreamingPullManager gets shut down cleanly, i.e. without the error reported. There might be a stacktrace of the Cancelled exception produced by api_core.bidi, but that is a harmless "expected exception":

Thread-ConsumeBidirectionalStream caught error 499 Locally cancelled by application! and will exit. Generally this is due to the RPC itself being cancelled and the error will be surfaced to the calling code
...
google.api_core.exceptions.Cancelled: 499 Locally cancelled by application!

@plamut plamut added the api: pubsub Issues related to the Pub/Sub API. label May 23, 2019
@plamut plamut requested a review from crwilcox as a code owner May 23, 2019 17:44
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label May 23, 2019
@tseaver tseaver changed the title Prevent unhandled background error on SPM shutdown Pubsub: prevent unhandled background error on SPM shutdown May 23, 2019
@plamut
Copy link
Contributor Author

plamut commented May 24, 2019

The main problem here is that the leaser and dispatcher use each other's methods, thus cannot be shut down in a clear linear sequence (and breaking this inter-dependency would likely require substantial refactoring).

It's also tricky to synchronize the manager shutdown with any remaining work that dispatcher might have left in its queue (such as pending ACK requests). The thread that initiated the manager shutdown waits on the dispatcher thread to terminate, thus if the latter tries to sync with the manager shutdown through a shared lock, a deadlock can happen.

The current fix uses the fact that once in the shutdown mode, several operations on the manager have no effect (maybe_resume_consumer() / maybe_pause_consumer()), as do not the add() and remove() operations on a stopped leaser.

I still need to think of a meaningful yet understandable test to properly cover this.

@plamut plamut added the needs work This is a pull request that needs a little love. label May 24, 2019
@tseaver
Copy link
Contributor

tseaver commented May 24, 2019

Could we delay setting the attributes to None until the very end?

@plamut plamut changed the title Pubsub: prevent unhandled background error on SPM shutdown PubSub: Prevent unhandled background error on SPM shutdown May 27, 2019
@plamut
Copy link
Contributor Author

plamut commented May 27, 2019

@tseaver Definitely, that's exactly the fix that I am just about to submit. Much easier and cleaner than trying to refactor the streaming pull and trying to break the cyclic dependency between the dispatcher and the leaser.

@plamut plamut removed the needs work This is a pull request that needs a little love. label May 27, 2019
@plamut plamut requested a review from tseaver May 27, 2019 15:30
@plamut plamut merged commit f3d014c into googleapis:master May 28, 2019
@plamut plamut deleted the iss-6130 branch May 28, 2019 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the Pub/Sub API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PubSub: dispatch_callback errors after cancel() of a subscription
3 participants