Skip to content
This repository has been archived by the owner on Dec 23, 2023. It is now read-only.

Stackdriver Monitoring exporter recieves Goaway after running an hour #869

Closed
rakyll opened this issue Dec 3, 2017 · 11 comments
Closed
Assignees

Comments

@rakyll
Copy link

rakyll commented Dec 3, 2017

After an hour later I start a server that exports to Stackdriver Monitoring, the worker thread throws the following exception:

WARNING: Exception thrown when exporting TimeSeries.
com.google.api.gax.rpc.UnavailableException: io.grpc.StatusRuntimeException: UNAVAILABLE: HTTP/2 error code: NO_ERROR
Received Goaway
max_age
	at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:69)
	at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.setException(GrpcExceptionCallable.java:118)
	at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:101)
	at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:61)
	at com.google.common.util.concurrent.Futures$4.run(Futures.java:1123)
	at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:435)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:900)
	at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:811)
	at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:675)
	at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:491)
	at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:466)
	at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:41)
	at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:663)
	at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:41)
	at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:392)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:443)
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:525)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:446)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:557)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: HTTP/2 error code: NO_ERROR
Received Goaway
max_age
	at io.grpc.Status.asRuntimeException(Status.java:526)
	... 19 more
@sebright sebright added the bug label Dec 3, 2017
@rakyll
Copy link
Author

rakyll commented Dec 3, 2017

FWIW, it is not consistently starting to fail after a long period of time. It sometimes takes minutes to start seeing the exception.

@songy23 songy23 self-assigned this Dec 3, 2017
@songy23
Copy link
Contributor

songy23 commented Dec 12, 2017

I've tried directly running the Stackdriver monitoring client (not the exporter) for ~1 hour, and I also saw this exception. After searching for some document on this, I think this is not an error. A quote from HTTP/2 specs (http://httpwg.org/specs/rfc7540.html#NO_ERROR):

NO_ERROR (0x0):
The associated condition is not a result of an error. For example, a GOAWAY might include this code to indicate graceful shutdown of a connection.

@songy23 songy23 closed this as completed Dec 12, 2017
@bogdandrutu
Copy link
Contributor

Should we re-connect? Are data still uploaded? What is the impact?

@bogdandrutu bogdandrutu reopened this Dec 12, 2017
@rakyll
Copy link
Author

rakyll commented Dec 12, 2017

If there is no impact, is it expected to log it as a warning? I'd expect us to be more silent.

@songy23
Copy link
Contributor

songy23 commented Dec 12, 2017

I'm not 100% sure but I think gRPC will do the retry in this case (from googleapis/google-cloud-java#1579). @zhangkun83 could you confirm that?

I assume this has no impact, and this is not a bug of exporter, so skip logging this kind of error makes sense to me. I'll post a PR to improve logging once this is confirmed.

@zhangkun83
Copy link
Contributor

/cc @ejona86

@bogdandrutu
Copy link
Contributor

@ejona86 ping on this.

@ejona86
Copy link

ejona86 commented Jan 17, 2018

I don't understand the questions here. What do you mean by reconnect and retry? Is there a reason you think something is going wrong (as in, behaviorally you see something is broken)? Is the failing call a long-lived RPC?

max_age is the server saying the connection is too old. This is typically gracefully handled by the server (long story) and not seen by applications, but it would be normal to see it (as an error) on long-lived RPCs. The Channel handles connections like normal; issue a new RPC and things should still be functioning. You have to figure out for yourself how to recover from the failed RPC (replay, resume from checkpoint, etc.).

@songy23
Copy link
Contributor

songy23 commented Jan 20, 2018

I filed googleapis/google-cloud-java#2795 to GCP Java client for further investigation.

@songy23
Copy link
Contributor

songy23 commented Jan 26, 2018

Update: Stackdriver thinks this is related to googleapis/google-cloud-java#2066.

@songy23
Copy link
Contributor

songy23 commented Mar 16, 2018

googleapis/google-cloud-java#2066 is fixed. I have tried running the StackdriverStatsExporter recently and haven't seen this issue again. Closing for now, feel free to reopen if you saw this issue again.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants