-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEADLINE_EXCEEDED makes application not receiving messages at all #770
Comments
Hey @mahaben did this issue recently start happening? |
Hey @bcoe, first time after upgrading googlecloud/pubsub to ^1.0.0. Any workaround to recreate subsciption after this error? |
We used to see these error messages, we now see these errors in all our projects that use it:
|
I can confirm this after upgrading to PubSub ^1.0.0, all our services stop sending pubsubs after the error occurs. The full stacktrace is
Can I suggest raising the priority on this issue? |
Non of our services using pubsub is working anymore either. We are using version 1.1.0 Getting this:
And this:
We have to restart our services every 10 minutes because of that. It also seems til it is storing more and more to disk as disk usage goes up over time. |
We also are hitting this. It happens after an hour or two and our publishing stops completely. My only suspicion was that since we created and cached the topic in our constructor, the topic was timing out. we changed our implementation to to call publish like this:
Now I running some tests to see if that was it or not. If not I am out of ideas since our code matches the examples in this repo. Our platform is node 12 on alpine on GKE. |
Seeing the same error |
Same error here with "@google-cloud/storage": "^3.3.1". Having
and leaving the nodejs process running, raises the error from time to time.
|
Nope. That didn't work. ;-( still getting this.
|
Getting similar To me it seems google pubsub servers have a bug or degraded to a point that it makes them not respond within the expected deadlines. |
I have now downgraded to @google-cloud/pubsub version
Seems like things are working for longer than 10 minutes now. |
Hello, |
Since grpc/grpc-node#1064 (comment) Using
as a temporary fix, works for me. |
I'm putting this issue at the top of my list. Would anyone be able to re-test with the latest version of gRPC? A release (v0.6.6) went out yesterday and it may or may not have a fix for this. All that should be needed is to delete any lock files you might have and re-install the PubSub client with the same version you currently have pinned. |
I believe we are hitting this one as well. After the application runs fine for several hours, we get the following logged from the subscription error handler. New messages that usually arrive once a minute have stopped arriving 5 minutes earlier. wondering if this issue is related to this @bcoe are you thinking the same? Here are some environment details: GKE: 1.14.3-gke.11
|
@gae123 mind adding One thing that is jumping out at me immediately though, is that you're not running |
@mahaben closing this for now, as we believe it is fixed with the latest version of PubSub we've released. @gae123 could I bother you to open a new issue. The dependency graph you're using is using PubSub in a variety of places, as a deep dependency, but none of the versions linked are up-to-date. I believe you are running into different issues related to older versions of the grpc library. |
@bcoe @callmehiphop I don't think this issue should be closed. It still doesn't work after upgrading to "@google-cloud/pubsub": "^1.1.1" |
@bcoe Why not releasing a new version using the This issue is hard to catch on dev environments as you have to wait one hour for the error to be triggered. Therefore we can assume that at least some of them will end up pushing broken code to production. |
@MatthieuLemoine @MichaelMarkieta I have been running a PubSub consumer for NAME READY STATUS RESTARTS AGE
good-reader-8f5fbb755-jbf28 1/1 Running 0 4d15h This is an issue hitting a percentage of our users, but is not hitting 100% of library users, and we are continuing to attempt to find the actual root cause. This is why we haven't opted to switch the |
this comment makes no sense from optics and I totally agree with @MatthieuLemoine release a proper fix for this or one with this 'workaround' built in, asking customers to change their production code with some speculative 'fix' is irresponsible what happens when this is actually fixed and this ends up causing more problems later the updated documentation does not even mention what kind of workloads would be better to use native or not making the mere suggestion of using it even more confusing and bug prone potentially |
This has been a workaround for us too, no issues in the last 24 hours. I ll keep monitoring....
|
We're very much trying to avoid this, I realize how frustrating the string of patches to Our libraries were designed to allow Now, even though we are advising that folks running into immediate issues switch to We are continuing to try to ascertain a consistent reproduction of the issue effecting users. What has made this difficult, is that we see the default configuration (with Rather than continuing to float patches to
|
Have run with grpc since 10.30 AM today. App. 400' msgs read. No errors, no hangups. For information - messages like these below (had app 200 of them during 46 hours) has also disappeared (node:16) Error: Failed to add metadata entry A...: Mon, 14 Oct 2019 11:40:19 GMT. Metadata key "a..." contains illegal characters |
@gberth Thanks for letting us know! Just to give you some details, the metadata warnings are caused not by
|
just adding some observations in case it helps. have two services both running pubsub 1.0.0. One is older, and have no issues, one is newer and have the issue. have not tried to roll back the new service, using the workaround for now. Here are the differences when running "npm ls @grpc/grpc-js" old: new |
@bcoe
Issue doesn't start exactly one hour after start for us so I suspect it is related to connectivity to pubsub servers and how those are rolled in and out of service for updates. |
I'd like to 👍 this going back to a P1. Providing a workaround isn't an acceptable response. Indeed, we have implemented the workaround and rolled out to Prod to see that while it successfully mitigated the lost pub/sub connection, it also introduces a memory leak that requires periodic restart of our k8s pods regardless. |
As mentioned on grpc/grpc-node#1064 , this is also an issue that is surfacing when using scheduled Firebase Functions, which you do not have access to the grcp config value. When the connection bridge drops the entire suite of firebase functions hosted also begin to fail. The only workaround is to not have the scheduled function deployed, which is not an acceptable solution.
|
Has there been any progress? If not, and the work around is the official way forward are all the docs updated? |
Everytime there has been a new version of @google-cloud/pubsub we've updated, but since grpc-js, we've had increased latency in sending and/or receiving (dependant on version used) or memory leaks. In the past weeks we've been pinning versions to work around various bugs. We've made the decision to go back to Could we have a newer pubsub release which enables grpc-js as an option? as this seems the most unstable part of this eco system. |
I just wanted to give an update before the weekend, we do have a version of
The reason I was holding off on this update, was that we were doing more stress testing on the system that we had managed to reproduce this issue on. If anyone is still bumping into issues on
So far, with debug information, @murgatroid99 has been able to address issues almost immediately. If you do not want to share your logs publicly (understandably) you can open an issue through our issue tracker, and also email me ( https://issuetracker.google.com/savedsearches/559741 Now, if folks start using |
@bcoe I don't fully understand the fix. In order to get my system working, a rollback of pubsub to 29.1 wasn't enough. I had to roll back a bunch of other google projects I'm using (logging, storage, kms etc) also. I don't know what combination of rollbacks fixed the problem and so I'm worried about rolling forward now. What exactly is the proposed fix please? Which projects do I need to apply the fix to? |
I've done a fresh install of |
After a few more hours of testing, the behaviour has improved but I'm still seeing the odd message being lost and nacked back into the queue, which results in latency spikes. This does not happen with This means |
👋 as mentioned Friday, we're testing early this week with We ask that folks upgrade to @grpc/grpc-js@0.6.9, and let us know if you have any trouble doing so. If you continue to run into issues with this new version of the dependency, I ask that we: 1. create a new issue on this repo, which I will prioritize as P1@pmcnr-hx, I have already done so for the memory issue you've raised. 2. run your system with debugging enabled, so that we can ship logs to the gRPC folks
3. share the logs with the engineers debugging this issueYou can open an issue on the issue tracker here to deliver the logs, if there's anything you wish to keep private. https://issuetracker.google.com/savedsearches/559741 You can also send an email to If it becomes obvious that things are stable, we will start working on a more significant rollback early this week. |
Environment details
Node.js version: v12.7.0
npm version: 6.10.0
@google-cloud/pubsub version: "^1.0.0",
Error:
insertId: "gnr3q1fz7eerd" jsonPayload: { level: "error" message: "unhandledRejection" originalError: { ackIds: [1] code: 4 details: "Deadline exceeded" } }
After receiving this error, the app does not receive messages anymore and we have to exit the application to recreate the kubernetes pod.
Any help would be appreciated!
The text was updated successfully, but these errors were encountered: