-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logging stops after a few hours with "retry total timeout exceeded" #596
Comments
We restarted the deployment and it happened again after a few hours. The cluster has three pods, what I see happening is that the issue starts in one pod, that pod keeps sending "ERROR:" while the other pods continue working fine, eventually all three of them start sending only "ERROR:" This is really serious, it prevents other issues from being diagnosed. Wondering, why do the console.log() messages continue working? |
As another data point, we removed completely google-logging from our code base and we now just use console.log() (for info, debug) console.warning for warning, and console.error() (for error) The problem is not happening any more after 24 hours of observation. Although this is not the desirable solution for a number of reasons:
it is a workaround for now... environment:
|
@alexander-fenster, in our meeting today you mentioned that this potentially relates to some ongoing issues with |
Any update on this one? Thanks!! |
@gae123 could you please try: https://www.npmjs.com/package/@google-cloud/logging/v/5.4.1 and let us know if this version is continuing to have problems? |
With 5.4.1, the system has now run for about 36 hours with no issue, we had never gone past 24 hours before so this is definitely helping!! |
@gae123 that's great news, I'm going to keep this open for the time being, because I expect we'll have one or two more patches for grpc-js in the near future. But, it's good to know that some of the more critical issues are being addressed. |
@gae123 please feel free to reopen this if you bump into this issue again, I think the issues with |
(This is in continuation of #586)
Because of #586 we yanked out the google winston log library and used directly
"@google-cloud/logging": "5.3.1",
24 hours went by with no memory leak BUT I just realized that logging did stop about 15 hours after it started. Every attempt to write an entry since then produced the following (thousands of such messages in the log, the one below is the first).
These messages start appearing about 7 hours after the deployment but between the 7th and the 15ht hour I do see some of our log messages as well as the "ERROR: " messages.
The message is written using a console.log() in the error path of the promise. Here is the approximate code that produces this message:
The text was updated successfully, but these errors were encountered: