-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opentelemetry-otlp async pipeline stuck after FramedWrite #473
Comments
Btw have pushed my testing code for this issue to clux/tracing-otlp-test. It has a few extra things in there relating to fetching trace ids, but it's not relevant to the issue at hand (that's just to help me debug against tempo, outcome is the same). |
Ok, more testing. This does not happen with grpc-sys: -opentelemetry-otlp = { version = "0.5.0", features = ["async", "tokio"] }
+opentelemetry-otlp = { version = "0.5.0", features = ["tokio", "grpc-sys", "openssl"], default-features=false } works. It sends spans to tempo, they are visible there, and the app doesn't hang. -http://0.0.0.0:55680
+0.0.0.0:55680 |
currently does not work with tonic for some reason: open-telemetry/opentelemetry-rust#473
Thanks for the detailed report and example! This example also seems to be working if the span is not reported via |
Hm I can't actually reproduce the behavior. Both the code in the description and cloning and running https://github.com/clux/tracing-otlp-test/blob/master/blah.rs seem to run and exit normally. @clux anything specific needed to show the hanging behavior? |
@jtescher I was able to reproduce the issue after I spin up an otel-collector container and use the example to connect it |
Oh I see the issue now (hadn't bound the ports properly when starting the otlp docker container was my issue). So this looks like the issue is this configuration is actually trying to trace itself. The cycle is completed from top level The internal tracing spans are at let collector = tracing_subscriber::registry()
.with(tracing_opentelemetry::layer().with_tracer(tracer))
.with(tracing_subscriber::EnvFilter::from("DEBUG"))
.with(tracing_subscriber::fmt::layer()); Can likely come up with a better solution for detecting this case in |
Ah, a tracing cycle. Interesting. The I am thinking that this might be a bit more insidious in a bigger example where dependencies are shared. If you are interested in the bigger example, keep in mind it requires k8s cluster to talk to + installing a crd before running, but otherwise not sure how much I can really remove as it's already an example controller, but I could try to make the original tracing-otlp-test repo bigger as well so it doesn't require quite so much as controller-rs. |
Just a small update, but no success this time: I tried to make the bug repo re-trigger the bug with an updated EnvFilter, because it is still present on controller-rs. Tried a few things like including a hyper .get inside the traced async fn and spawning a long running task outside, but neither triggered the bug, so haven't pushed anything new there. Guess there's something funky going on with a combination of a long running task and actix in controller-rs. Ultimately had to give up on it for now, and am stuck on grpcio for the time being. |
Ok, I have finally managed to make it work with Still not sure why it's hanging in my more advanced app, but have at least found out that it's related to the simple exporter, which I probably shouldn't use anyway, and am able to work around it, without switching out tonic. When I enabled "tokio-support" feature on |
Closing as we can track a proper fix via #761 |
The following minimal otlp example with latest
tracing
,tracing-subscriber
,tracing-opentelemetry
,opentelemetry
,opentelemetry-otlp
gets stuck inside tonic or h2 when pushing spans in an async context:which prints the following and hangs:
deps:
Not sure what exactly is going on here, but have done the following debugging to pinpoint a few ways that work:
Bypassing opentelemetry_otlp
The entire pipeline works perfectly without
opentelemetry_otlp
by replacing it with a opentelemetry stdout pipeline:Spans printed fine (albeit with two logging layers), and the program exits.
Bypassing a global subscriber
The following works end-to-end using
opentelemetry-otlp
, and shows that my endpoint (a port-forwarded grafana-traces-agent (tempo)) is able to receive the data normally (they show up after program exit on their interface)but this only works in non-async contexts afaikt, and looks awkward to integrate with a larger application.
..so. Am I doing something obviously wrong here? Why is
opentelemetry_otlp
getting stuck in an async context with a global subscriber, and doing perfectly fine in a non-async context using a scoped subscriber?The text was updated successfully, but these errors were encountered: