Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC trace missing #2564

Closed
asarkar opened this issue Mar 11, 2021 · 23 comments
Closed

gRPC trace missing #2564

asarkar opened this issue Mar 11, 2021 · 23 comments
Labels
bug Something isn't working

Comments

@asarkar
Copy link

asarkar commented Mar 11, 2021

Spin off from #1550. gRPC client interceptor missing trace. Quoting @anuraaga's comment:

have a feeling this is because of the switch off of using grpc context directly for OTel. So it's time to implement a proper context bridge it'll take some time.

Attached app can be used to reproduce the issue. Extract and run ./gradlew test from the root directory.

otel-grpc-tracing.zip

@asarkar asarkar added the bug Something isn't working label Mar 11, 2021
@asarkar
Copy link
Author

asarkar commented Mar 20, 2021

@anuraaga Any idea which release this might go into? Seems like a major problem to me if trace disappears in a mundane client interceptor.

@anuraaga
Copy link
Contributor

Hoping it gets in soon - I started with a prerequisite for it but have some tricky issues

#2601

@ankitgaur-cer
Copy link

ankitgaur-cer commented Apr 19, 2021

@anuraaga please provide us an update on this.

@anuraaga
Copy link
Contributor

Hi @ankitgaur-cer @asarkar - sorry for not getting to this until now. Actually 1.1.0 includes the context bridge, and I can run the repro and the test passes! @asarkar please confirm that the issue has been solved.

@asarkar
Copy link
Author

asarkar commented Apr 20, 2021

@anuraaga Using io.opentelemetry.instrumentation:opentelemetry-instrumentation-bom-alpha:1.1.0-alpha, I see the trace, but also a lot of the following error:

Context was not attached when detaching java.lang.Throwable: null
	at i.o.i.g.v.i.ContextStorageBridge.detach(ContextStorageBridge.java:86)
	at i.g.o.ContextStorageOverride.detach(ContextStorageOverride.java:26)
	at io.grpc.Context.detach(Context.java:463)
	at i.g.k.GrpcContextElement.restoreThreadContext(GrpcContextElement.kt:36)
	at i.g.k.GrpcContextElement.restoreThreadContext(GrpcContextElement.kt:27)
	... 15 frames truncated

java.lang.Throwable: null
	at io.opentelemetry.instrumentation.grpc.v1_5.internal.ContextStorageBridge.detach(ContextStorageBridge.java:86)
	at io.grpc.override.ContextStorageOverride.detach(ContextStorageOverride.java:26)
	at io.grpc.Context.detach(Context.java:463)
	at io.grpc.kotlin.GrpcContextElement.restoreThreadContext(GrpcContextElement.kt:36)
	at io.grpc.kotlin.GrpcContextElement.restoreThreadContext(GrpcContextElement.kt:27)
	at kotlinx.coroutines.internal.ThreadContextKt$restoreState$1.invoke(ThreadContext.kt:55)
	at kotlinx.coroutines.internal.ThreadContextKt$restoreState$1.invoke(ThreadContext.kt)
	at kotlin.coroutines.CombinedContext.fold(CoroutineContextImpl.kt:131)
	at kotlin.coroutines.CombinedContext.fold(CoroutineContextImpl.kt:131)
	at kotlin.coroutines.CombinedContext.fold(CoroutineContextImpl.kt:131)
	at kotlin.coroutines.CombinedContext.fold(CoroutineContextImpl.kt:131)
	at kotlin.coroutines.CombinedContext.fold(CoroutineContextImpl.kt:131)
	at kotlinx.coroutines.internal.ThreadContextKt.restoreThreadContext(ThreadContext.kt:89)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:230)
	at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)

@anuraaga
Copy link
Contributor

Hi @asarkar - looks like it was a bug in kotlin coroutines up to 1.4.2. I updated your code to 1.4.3 and the error goes away.

https://github.com/Kotlin/kotlinx.coroutines/releases/tag/1.4.3

ThreadContextElements are now restored in the opposite order from update (Kotlin/kotlinx.coroutines#2195)

@asarkar
Copy link
Author

asarkar commented Apr 21, 2021

Seems to be working. I'll leave this ticket open until we get a chance to verify more thoroughly. Thank you.

@iNikem
Copy link
Contributor

iNikem commented Apr 28, 2021

@asarkar Any news on "verify more thoroughly"? :) Can we close this ticket now?

@asarkar
Copy link
Author

asarkar commented Apr 28, 2021

@iNikem Sorry, the detail QA verification is supposed to happen next week. I understand our scheduling is not your problem, so if you must, go ahead and close it, and if something horrible comes up, I'll reopen it. However, it'd be nice if you could wait one more week. Up to you.

@denis111
Copy link

I see this with 1.1.0 in GCP and app can't start:

[opentelemetry.auto.trace 2021-04-29 15:00:52:239 +0000] [main] ERROR io.opentelemetry.javaagent.shaded.instrumentation.grpc.v1_5.internal.ContextStorageBridge - Context was not attached when detaching
java.lang.Throwable
at io.opentelemetry.javaagent.shaded.instrumentation.grpc.v1_5.internal.ContextStorageBridge.detach(ContextStorageBridge.java:86)
at io.grpc.Context.detach(Context.java:463)
at io.opencensus.trace.CurrentSpanUtils$ScopeInSpan.close(CurrentSpanUtils.java:98)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1046)
at com.google.cloud.ServiceOptions.getAppEngineProjectIdFromMetadataServer(ServiceOptions.java:503)
at com.google.cloud.ServiceOptions.getAppEngineProjectId(ServiceOptions.java:472)
at com.google.cloud.ServiceOptions.getDefaultProjectId(ServiceOptions.java:379)
at com.google.cloud.spring.core.DefaultGcpProjectIdProvider.getProjectId(DefaultGcpProjectIdProvider.java:36)
at com.google.cloud.spring.logging.StackdriverJsonLayout.start(StackdriverJsonLayout.java:200)

@anuraaga
Copy link
Contributor

@denis111 That is a log message, are you also seeing an exception causing the app to fail to start?

@anuraaga
Copy link
Contributor

@denis111 Also we fixed an issue in #2876 if you can confirm with snapshots that would be very helpful.

@skoppers
Copy link

skoppers commented May 7, 2021

@anuraaga I seem to be having the same problem as @denis111 using kotlin, spring boot and the Google cloud trace exporter (0.0.15-alpha). I'm using kotlin coroutines version 1.4.3, so that shouldn't be an issue.
I tried using the latest snapshot: https://oss.sonatype.org/content/repositories/snapshots/io/opentelemetry/javaagent/opentelemetry-javaagent/1.2.0-SNAPSHOT/opentelemetry-javaagent-1.2.0-20210507.042949-11.jar
However, the error still occurs.

@anuraaga
Copy link
Contributor

anuraaga commented May 8, 2021

Hi @skoppers is your app also failing to startup? If so, do you have an exception where it actually fails? That would be helpful to know.

Also does it only happen on GCP, not localhost presumably since it triggers that code path?

@skoppers
Copy link

Yes it happens at startup, there are actually 2 exception: 1 from the javaagent and 1 from the spring application:

spring application:

Caused by: java.lang.NullPointerException: null
	at io.opentelemetry.javaagent.shaded.instrumentation.grpc.v1_5.internal.ContextStorageBridge.detach(ContextStorageBridge.java:90) ~[na:na]
	at io.grpc.Context.detach(Context.java:463) ~[grpc-context-1.36.0.jar!/:1.36.0]
	at io.opencensus.trace.CurrentSpanUtils$ScopeInSpan.close(CurrentSpanUtils.java:98) ~[opencensus-api-0.24.0.jar!/:0.24.0]
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1043) ~[google-http-client-1.34.2.jar!/:1.34.2]
	at com.google.cloud.ServiceOptions.getAppEngineProjectIdFromMetadataServer(ServiceOptions.java:480) ~[google-cloud-core-1.91.3.jar!/:1.91.3]
	at com.google.cloud.ServiceOptions.getAppEngineProjectId(ServiceOptions.java:449) ~[google-cloud-core-1.91.3.jar!/:1.91.3]
	at com.google.cloud.ServiceOptions.getDefaultProjectId(ServiceOptions.java:356) ~[google-cloud-core-1.91.3.jar!/:1.91.3]
	at com.google.cloud.ServiceOptions.getDefaultProject(ServiceOptions.java:333) ~[google-cloud-core-1.91.3.jar!/:1.91.3]
	at com.google.cloud.ServiceOptions.<init>(ServiceOptions.java:283) ~[google-cloud-core-1.91.3.jar!/:1.91.3]
	at com.google.cloud.storage.StorageOptions.<init>(StorageOptions.java:83) ~[google-cloud-storage-1.107.0.jar!/:1.107.0]
	at com.google.cloud.storage.StorageOptions.<init>(StorageOptions.java:31) ~[google-cloud-storage-1.107.0.jar!/:1.107.0]
	at com.google.cloud.storage.StorageOptions$Builder.build(StorageOptions.java:78) ~[google-cloud-storage-1.107.0.jar!/:1.107.0]
	at com.google.cloud.storage.StorageOptions.getDefaultInstance(StorageOptions.java:126) ~[google-cloud-storage-1.107.0.jar!/:1.107.0]
	at com.vlinderstorm.fisso.service.StorageService.<init>(StorageService.kt:46) ~[classes!/:na]
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:na]
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) ~[na:na]
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) ~[na:na]
	at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source) ~[na:na]
	at kotlin.reflect.jvm.internal.calls.CallerImpl$Constructor.call(CallerImpl.kt:41) ~[kotlin-reflect-1.4.31.jar!/:1.4.31-release-344 (1.4.31)]
	at kotlin.reflect.jvm.internal.KCallableImpl.call(KCallableImpl.kt:108) ~[kotlin-reflect-1.4.31.jar!/:1.4.31-release-344 (1.4.31)]
	at kotlin.reflect.jvm.internal.KCallableImpl.callDefaultMethod$kotlin_reflection(KCallableImpl.kt:159) ~[kotlin-reflect-1.4.31.jar!/:1.4.31-release-344 (1.4.31)]
	at kotlin.reflect.jvm.internal.KCallableImpl.callBy(KCallableImpl.kt:112) ~[kotlin-reflect-1.4.31.jar!/:1.4.31-release-344 (1.4.31)]
	at org.springframework.beans.BeanUtils$KotlinDelegate.instantiateClass(BeanUtils.java:855) ~[spring-beans-5.3.1.jar!/:5.3.1]
	at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:197) ~[spring-beans-5.3.1.jar!/:5.3.1]
	... 30 common frames omitted

javaagent:

java.lang.Throwable
	at io.opentelemetry.javaagent.shaded.instrumentation.grpc.v1_5.internal.ContextStorageBridge.detach(ContextStorageBridge.java:86)
	at io.grpc.Context.detach(Context.java:463)
	at io.opencensus.trace.CurrentSpanUtils$ScopeInSpan.close(CurrentSpanUtils.java:98)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1043)
	at com.google.cloud.ServiceOptions.getAppEngineProjectIdFromMetadataServer(ServiceOptions.java:480)
	at com.google.cloud.ServiceOptions.getAppEngineProjectId(ServiceOptions.java:449)
	at com.google.cloud.ServiceOptions.getDefaultProjectId(ServiceOptions.java:356)
	at com.google.cloud.ServiceOptions.getDefaultProject(ServiceOptions.java:333)
	at com.google.cloud.ServiceOptions.<init>(ServiceOptions.java:283)
	at com.google.cloud.storage.StorageOptions.<init>(StorageOptions.java:83)
	at com.google.cloud.storage.StorageOptions.<init>(StorageOptions.java:31)
	at com.google.cloud.storage.StorageOptions$Builder.build(StorageOptions.java:78)
	at com.google.cloud.storage.StorageOptions.getDefaultInstance(StorageOptions.java:126)
	at com.vlinderstorm.fisso.service.StorageService.<init>(StorageService.kt:46)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
	at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
	at kotlin.reflect.jvm.internal.calls.CallerImpl$Constructor.call(CallerImpl.kt:41)
	at kotlin.reflect.jvm.internal.KCallableImpl.call(KCallableImpl.kt:108)
	at kotlin.reflect.jvm.internal.KCallableImpl.callDefaultMethod$kotlin_reflection(KCallableImpl.kt:159)
	at kotlin.reflect.jvm.internal.KCallableImpl.callBy(KCallableImpl.kt:112)
	at org.springframework.beans.BeanUtils$KotlinDelegate.instantiateClass(BeanUtils.java:855)
	at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:197)
	at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:117)
	at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:310)
	at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:180)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1356)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1206)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:571)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:531)
	at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
	at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
	at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
	at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:944)
	at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:925)
	at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:588)
	at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:144)
	at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:767)
	at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:759)
	at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:426)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:326)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:1309)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:1298)
	at com.vlinderstorm.fisso.FissoApplicationKt.main(FissoApplication.kt:18)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
	at org.springframework.boot.loader.Launcher.launch(Launcher.java:107)
	at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
	at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88)

I'm currently running into some issues trying to run it locally. However, I think this error will also occur then, as you need to setup credentials for the google cloud exporter to work properly.

@anuraaga
Copy link
Contributor

Thanks for the stack traces. I've currently not been able to repro locally but if you are able to come up with one that will be really great! In the meantime, I have tried an approach to make the bridge more robust in

#2948

The next nightly snapshot will have it so if you are able to give it a try that would be very helpful.

@skoppers
Copy link

Ah, seems like last nights build failed. In the meantime, I'll see if I can reproduce it in a new project.

@anuraaga
Copy link
Contributor

@skoppers
Copy link

@anuraaga Just tried the new snapshot. The application doesn't crash anymore and seems to export some traces. However, the java agent still throw a lot of errors.

This is thrown a couple of times before startup:

[opentelemetry.auto.trace 2021-05-12 07:57:56:442 +0000] [Subscriber-SE-1-4] ERROR io.opentelemetry.javaagent.shaded.instrumentation.grpc.v1_6.internal.ContextStorageBridge - Context was not attached when detaching

java.lang.Throwable
	at io.opentelemetry.javaagent.shaded.instrumentation.grpc.v1_6.internal.ContextStorageBridge.detach(ContextStorageBridge.java:71)
	at io.grpc.Context.detach(Context.java:463)
	at io.opencensus.trace.CurrentSpanUtils$ScopeInSpan.close(CurrentSpanUtils.java:98)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1043)
	at com.google.auth.oauth2.ServiceAccountCredentials.refreshAccessToken(ServiceAccountCredentials.java:429)
	at com.google.auth.oauth2.OAuth2Credentials.refresh(OAuth2Credentials.java:157)
	at com.google.auth.oauth2.OAuth2Credentials.getRequestMetadata(OAuth2Credentials.java:145)
	at com.google.auth.Credentials.blockingGetToCallback(Credentials.java:112)
	at com.google.auth.Credentials$1.run(Credentials.java:98)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

After startup, it seems that the same error triggers every time the google cloud storage library is used, but I need to look into this further.

"java.lang.Throwable
	at io.opentelemetry.javaagent.shaded.instrumentation.grpc.v1_6.internal.ContextStorageBridge.detach(ContextStorageBridge.java:71)
	at io.grpc.Context.detach(Context.java:463)
	at io.opencensus.trace.CurrentSpanUtils$ScopeInSpan.close(CurrentSpanUtils.java:98)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1043)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:541)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:474)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:591)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.list(HttpStorageRpc.java:360)
	at com.google.cloud.storage.StorageImpl$8.call(StorageImpl.java:376)
	at com.google.cloud.storage.StorageImpl$8.call(StorageImpl.java:373)
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.storage.StorageImpl.listBlobs(StorageImpl.java:372)
	at com.google.cloud.storage.StorageImpl.list(StorageImpl.java:328)
       ....

@anuraaga
Copy link
Contributor

Thanks for verifying @skoppers! That log message is incorrect, I've sent https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/2970/files to remove it, there should otherwise not be any harm to your app which sounds correct based on your verification.

@anuraaga
Copy link
Contributor

@asarkar It's been a while since adding the fix for the original part of this issue, let's go ahead and close this as it's gotten a bit off topic (no problem, but actually these were two different issues). If anything comes up feel free to reopen this or open another issue.

@skoppers If you come up with any updates, possibly after a snapshot with that PR is released, do you mind sharing on #2959? Let's use that issue for what you're seeing.

@skoppers
Copy link

@anuraaga Yes will do! Thanks so much for the help!

@asarkar
Copy link
Author

asarkar commented May 14, 2021

@anuraaga With Kotlin 1.4.32 and Coroutines 1.4.3, I don't see the exception anymore. Thanks for your help.
When do you plan on releasing a new version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants