-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logging unstable under loss of connection #745
Comments
Thank you for the report. I believe this is related to #614 -- the CANCELED grpc netty futures have something holding on to them preventing garbage collection. I am looking into which layer of our stack is holding on to the objects. |
@elefeint have you got any idea about the timeline for a fix? This is causing production outages for us. If a fix is going to be sometime we'll need to look at removing the offending code. |
Did production issues start after a particular time / upgrade? |
Not a particular upgrade, the problem appears to be caused by unstable network (which we're looking into separately). |
One thing I can recommend while we are looking at the issue is to switch from using direct API appender ("STACKDRIVER") to using structured console logging ("CONSOLE_JSON"). Because the latter does not send logs directly to Cloud Logging, it should not result in the same issue when network connection breaks. |
If I've understood that correctly, it will just log to the console, log events will not be forwarded to GCP for non-GCP (i.e. on-premise) deployments? |
@gjferriercoats Sorry, I missed this. That's correct for on-prem / local -- the logs are only processed when running in a GCP environment. |
@gjferriercoats There are two things that in combination will help:
These won't fix logging (I suspect due to an issue similar to googleapis/java-logging#645), but they will keep the rest of application from being unstable. UPDATE: logging does recover after ~40 minutes. |
Please re-open if it's still an issue. |
Using the latest sample from spring-cloud-gcp-samples/spring-cloud-gcp-logging-sample I am able to make the implementation unstable (runaway memory, thread allocations) by disabling the network connection so that the attempt to log to GCP fails.
I was able to recreate the problem by adding the following -
package com.example;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
@component
public class OnASchedule {
}
Start the application, witness local console and GCP logs being received. Terminate the network connection (this was done locally by simply turning off the wifi of the laptop, and in server setting by using iptables to drop packets for logging.googleapis.com, for 5 minutes. When the network connection is restored I expect the logs to flow to the console and GCP again, this is not the case. Instead exceptions are thrown repeatedly, massive memory and thread allocation. In my latest test ~7000 threads and ~75M objects of the types - com.google.common.util.concurrent.AbstractFuture$Listener, com.google.common.util.concurrent.AggregateFuture$Listener and com.google.api.core.ApiFutureToListenableFuture (all from VisualVM).
If necessary I can provide a PR that can be used to demonstrate.
The text was updated successfully, but these errors were encountered: