-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pub/Sub Messages stop being Processed #1579
Comments
Possibly seeing the same problem, were doing subscription.pullAsync(..) and sometimes messages just stop being processed and build up until the consumer is restarted. Considering moving to pull() and managing threading / timeouts ourselves. |
I filed a ticket with cloud support. The response I got after some back and forth was it's is a issue with there grpc load balancers for pub sub. They've been slowly rolling out a fix since yesterday.
--
Garrett
On Thu, Feb 2, 2017 at 6:21 PM -0500, "nhoughto" <notifications@github.com> wrote:
Possibly seeing the same problem, were doing subscription.pullAsync(..) and sometimes messages just stop being processed and build up until the consumer is restarted. Considering moving to pull() and managing threading / timeouts ourselves.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Good to know, thanks. Interesting that the MessageConsumer obj doesn't expose an isAlive() or something similar to detect failure, so the assumption is that it manages it for you. Presume even if gRPC LBs are misbehaving the gRPC client should handle the failure mode itself. |
Setting timeout to infinity does indeed sound like a problem. The motivation of making the infinite timeout seems to be #1157 . We are currently working on a higher-performance PubSub client, currently in the pubsub-hp branch. In this new implementation, the timeout is properly set. The difference is that now the RPC will return immediately if there's no message available instead of waiting. It will exponentially back off though (max 10s). You might be making more RPCs with the new client, though I don't think the difference will be big enough to be a problem. Once streaming pull becomes available, the difference should become moot. @GEverding @nhoughto Do you think this will help your use case? |
Yep that sounds good. Happy with immediate return with a backoff (seems like a good tradeoff). Any idea on timelines for landing pubsub-hp or streaming pull changes? We are going to halt our pubsub roll out to new components until we sort this out. |
That seems like a much better idea. Is there a eta for when that branch will be ready? |
We expect to release it next week (sometime Feb 7-Feb 10). |
I'm still encountering this issue. Bye changing the way I poll (calling pullAsync but with a limit and retrying automatically), I finally got this exception (pasted below) java.util.concurrent.ExecutionException: com.google.cloud.pubsub.PubSubException: io.grpc.StatusRuntimeException: UNAVAILABLE: HTTP/2 error code: NO_ERROR |
@BodySplash I don't think this is immediately related. According to pubsub's error code doc, this seems to be an issue that "happens sometimes". That said, the new client will do the retry for you. |
The high-performance rewrite was merged and released. I recommend trying version |
We're integrating today. I'll get back to you
…--
Garrett
On Wed, Feb 22, 2017 at 9:03 PM -0500, "garrettjonesgoogle" <notifications@github.com> wrote:
The high-performance rewrite was merged and released. I recommend trying version 0.9.3-alpha. @nhoughto and @GEverding can you let us know how it works?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Looking at 0.9.3-alpha, I assume the new high-performance changes are found in the com.google.cloud.pubsub.spi.v1 classes rather than the now deprecated classes? Any examples of how the new classes should be used? Readme / examples all reference the deprecated stuff. |
Yep that's useful, so the SubscriptionListener exposes the listener failure so you can manually restart it? Rather than handling it for you and auto reconnecting as per the buggy deprecated impl? We shouldn't expect a startAsync() to handle reconnects and should build some auto reconnect with backoff etc logic ourselves?
|
To answer your question specifically, Subscriber does attempt some retries on its own. Eg, pubsub service replying with error code "UNAVAILABLE" usually signifies a transient condition; Subscriber knows this and will sleep for a bit then try again. If it encounters a non-retryable error (or sees retryable errors too many times consecutively), it will call the |
Yep seems reasonable, I'll give it a try 👍🏼
|
I'm going to go ahead and resolve this now. |
Hi,
Over the weekend I've been experiencing abnormal behaviour with Pub/Sub. Since Friday at 4pm ET I've experienced numerous cases were my services will stop processing messages. I can see in stackdriver that messages are being queued but not processed. Restarting the services allows the messages to be processed until it stops again.
This seems to be a issue with v0.8.x and less with v0.4.0. In v0.8.x there is a infinite timeout (DefaultPubSubRpc.java) and in v0.4.0 there doesn't seem to be (DefaultPubSubRpc.java).
In my experience setting a infinite timeout is rarely a good idea. Is there a reason why you do this?
The text was updated successfully, but these errors were encountered: