Infinite trace generator #761

hamiltop · 2022-03-14T23:27:35Z

When I use the opentelemetry_datadog library with reqwest as the client, each batch of spans will generate an additional 2 spans. These spans look like this:

[[{'service': 'test', 'name': 'opentelemetry-datadog', 'resource': 'encode_headers', 'trace_id': 9313706354471130378, 'span_id': 2181458161362850175, 'parent_id': 0, 'start': 1647298645595811251, 'duration': 156890, 'error': 0, 'meta': {'busy_ns': '66163', 'code.namespace': 'hyper::proto::h1::role', 'code.filepath': '/home/hamiltop/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.17/src/proto/h1/role.rs', 'code.lineno': '1061', 'idle_ns': '90678'}}], [{'service': 'test', 'name': 'opentelemetry-datadog', 'resource': 'parse_headers', 'trace_id': 11785853278508097401, 'span_id': 9996945950193117364, 'parent_id': 0, 'start': 1647298645596617264, 'duration': 159966, 'error': 0, 'meta': {'code.namespace': 'hyper::proto::h1::role', 'code.lineno': '1183', 'busy_ns': '119852', 'code.filepath': '/home/hamiltop/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.17/src/proto/h1/role.rs', 'idle_ns': '36137'}}]]

I believe this is from the http request to datadog to save the original spans. Now, the request to send these 2 spans also generates 2 new spans, which generate 2 new spans, etc.

To reproduce:

Run this script as a mock datadog agent: https://gist.github.com/kongo2002/a4d560dca89e24bf27c8d59dc63f4b1e This will echo all incoming traces to stdout for easy inspecting.
Run this code:

use opentelemetry_datadog::{new_pipeline, ApiVersion};
use tracing::trace_span;
use tracing_subscriber::layer::SubscriberExt;

#[tokio::main]
async fn main() {
    let _tracer = new_pipeline()
        .with_agent_endpoint("http://localhost:8126")
        .with_service_name("test")
        .with_version(ApiVersion::Version03)
        .install_batch(opentelemetry::runtime::TokioCurrentThread)
        .unwrap();

    let telemetry = tracing_opentelemetry::layer().with_tracer(_tracer);

    let subscriber = tracing_subscriber::Registry::default().with(telemetry);

    tracing::subscriber::set_global_default(subscriber)
        .expect("Failed to install `tracing` subscriber.");

    trace_span!({ "test" }, "test span").in_scope(|| {
        println!("Hello, world!");
    });

    tokio::time::sleep(tokio::time::Duration::from_secs(60)).await;

    opentelemetry::global::shutdown_tracer_provider();
}

Watch output in script (1) to see the encode_headers/parse_headers traces get generated over and over again.

I would expect to have some way to disable tracing of the tracer. Is this a bug? Do I just need to configure trace levels more precisely?

The text was updated successfully, but these errors were encountered:

jherico · 2022-05-26T22:31:17Z

I think the answer here is that you're seeing a lot of spam in the output because you haven't included any filtering in the registry stack. The default behavior is to include everything, even items that are tagged with trace (here meaning the logging level that's one step more verbose than debug, not to be confused with service traces).

In this case the hyper crate being used for transport itself uses the tracing facade and can produce events. This might be equivalent to configuring Java logging for the Apache HTTP package to spit out logs for all headers.

A typical application will have some filtering mechanism baked into the registry and would normally pick INFO or WARN as the default logging level rather than TRACE, and only increase logging as needed for specific packages. See below for an example which would respect the RUST_LOG environment variable as specified by the logging crate, or if none exists default to excluding all output below the level of INFO

    let subscriber = tracing_subscriber::Registry::default().with(telemetry)
      .with(tracing_subscriber::EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new("info")))

cijothomas · 2023-08-07T23:38:16Z

Just found this issue, before i opened #1171. Same solution should work for both, I think..

hdost · 2024-02-21T10:47:48Z

Sounds like this might be related to #473

lalitb · 2024-03-04T17:14:27Z

There was some attempt made to fix this as part of #1330, if someone wants to have a look further. Otherwise, I will get back to this issue shortly. The fix was to suppress the logs invoked in the context of the (async) task doing the export.

NickLarsenNZ · 2024-04-20T20:55:19Z

@jherico, filtering is one way (in fact, what I'm having to do right now, eg: h2=off to stop this problem occurring), but that means I miss out on h2 traces from my application.

Hopefully something like #1330 can fix this properly?

lalitb · 2024-05-21T01:46:48Z

Trying to document few options we discussed in various forums to address this issue:

Create a Tracing Filter: Use a filter to drop all events (logs/traces) emitted by the crates used in specific exporters in the opentelemetry pipeline. This solution is straightforward but may not be ideal as it will also filter out events from these crates when used within the application.
Use a Suppression Flag with Thread-Local Storage: Add a suppression flag in opentelemetry-sdk as thread-local storage. Define a struct that wraps the future and ensures this flag is set before polling the future. We could use the existing WithContext struct for this, as it already handles OtelContext. However, this flag would only be set within the first async task invoked by the exporter. Ensuring that any async task created in external crates also sets this flag would be challenging. This approach was attempted in PR [WIP] Suppression of nested logs from dependencies #1330.
Single-Threaded Export with Suppression Flag: Similar to the second option of adding the suppression flag and additionally ensure the exporter invokes all async tasks under a single thread by explicitly using the Tokio current_thread runtime or async_std::task::block_on. This confines changes to opentelemetry-rust but may impact performance due to single-threaded execution.
Use Tokio Task-Local Storage: Implement the suppression flag using Tokio's task-local storage. This approach still requires scoping the variable when calling async tasks, necessitating cooperation from other crates to scope this flag properly. The only minimal advantage over option 2 and 3 is that no explicit Future wrapper is needed. And it adds hard dependency on Tokio runtime, unless async-std has such feature too.
Utilize Tracing Context and Span Extensions: Store the suppression flag as a span extension in the Context provided by the tracing crate. External crates must ensure to copy this flag from the current active span to any newly created spans. This approach leverages the existing context propagation supported by most tracing-instrumented crates.

Would like to discuss this further in the community meeting.

cijothomas · 2024-05-21T16:24:41Z

From 5/21 Community Call:

We need to see how other tracing layers/subscribers are solving this problem (or have they solved it?)
What do we need to ensure this is done? Lets write down the exact thing the authors of hyper, etc. must do to make sure [WIP] Suppression of nested logs from dependencies #1330 works.
If 2 is not feasible (for example, if hyper don't want to depend on opentelemetry), what is the alternate? Could we see if tracing can expose something to help us.
Proposal to Adopt Tokio Tracing as the OTel Tracing API #1689 will influence our decision OR be influenced by our decision!

It causes infinite cascading events to be emitted. See open-telemetry/opentelemetry-rust#761

* fix(stackable-telemetry): disable export of h2 events via OTLP It causes infinite cascading events to be emitted. See open-telemetry/opentelemetry-rust#761 * chore(stackable-telemetry): update changelog

mladedav · 2024-06-02T16:26:39Z

This feels like it boils down to propagating a piece of information through a stack of spawned tasks. Or having access to parent tasks inside a spawned task to get information like this. Something like looking up ppid of a process to check that it was not spawned by the exporter task/process.

I don't know about anything like that in tokio but it might be useful in other cases so it might be worth it following up there.

But otherwise, I don't think it's currently possible to propagate information through tokio::spawn and std::thread::spawn which the exporter can call whenever and however it wants.

lalitb · 2024-06-02T17:22:14Z

This feels like it boils down to propagating a piece of information through a stack of spawned tasks. Or having access to parent tasks inside a spawned task to get information like this. Something like looking up ppid of a process to check that it was not spawned by the exporter task/process.

Yes @mladedav, this is something we are trying to achieve. I think the problem is not specific to tracing-opentelemetry, infact any tracing subscriber may want to suppress the internal traces generated through the chain of the spawned tasks.

Most of the OpenTelemetry language implementations provide their own tracing library for instrumentation, and they add this piece of information (as a suppression flag) into the context. Then all the instrumented libraries will propagate the context into spawned tasks, and this flag can be eventually checked in the subscriber and dropped. As done in opentelemetry-js:

API to suppress/unsuppress flag in the context: https://github.com/open-telemetry/opentelemetry-js/blob/v1.24.1/packages/opentelemetry-core/src/trace/suppress-tracing.ts
And set this flag while calling the exporter: https://github.com/open-telemetry/opentelemetry-js/blob/v1.24.1/packages/opentelemetry-sdk-trace-base/src/export/BatchSpanProcessorBase.ts#L180

In the case of rust, the libraries are instrumented using tokio-rs tracing, do you think similar flag can be added in tracing context which can be used by the subscribers? Sorry if this doesn't make sense, I haven't yet gone through this library in depth. But I can try to dig further if this is the right direction.

I don't know about anything like that in tokio but it might be useful in other cases so it might be worth it following up there.

opentelemetry-rust provides an abstraction over the async runtimes, so any such mechanism if available should be there across all the async implementations. Tokio runtime provides an option (as mentioned in option 3 in the listed suggestions), but this has its performance overhead, and not sure if a similar is there in async-std.

mladedav · 2024-06-02T17:47:20Z

Other subscribers I've worked with usually don't need a spawned task or don't produce any further tracing information. See the tracing-tree example I've linked in the other discussion.

I think the tasks can be assigned a slightly different subscriber that would suppress the otel layer, but it would ideally leave other layers intact. I'm not sure if that's possible, maybe the task would have to just have its tracing completely disabled.

But then the libraries also need to propagate the subscriber if they spawn further tasks. I'm not exactly sure about the details of this though, I vaguely remember someone advising not to do that, but I don't remember why.

I know that otel is runtime agnostic, but some kind of local-storage preserving some kind of structural concurrency ownership concept might be interesting to all of them. Not sure though. Maybe it's a bad idea, maybe there would be technical limitations.

cijothomas · 2024-06-03T19:59:45Z

Other subscribers I've worked with usually don't need a spawned task or don't produce any further tracing information

Any subscriber which need to make a grpc/http call should face this. For example: https://github.com/krojew/tracing-elastic-apm/blob/master/src/apm_client.rs#L103-L104

(not just grpc/http call, but pretty much any API that is already instrumented with tracing. grpc/http is showing up now, as they are instrumented.)

mladedav · 2024-06-03T21:42:02Z

I didn't mean it as that the problem isn't valid or that no one has tried to solve it before, it's just I don't have personal experience with it.

By the way the example you linked seems to also go the way of spawning a single-threaded runtime, but I think it should be possible to use standard multi-threaded one with multiple threads and on_thread_start where you set the NoSubscriber as the current thread subscriber.

It might still be a bit wasteful since you're starting new threads for the runtime but at least it seems we're not constrained to a single thread.

TommyCpp added the A-trace Area: issues related to tracing label Mar 16, 2022

hdost added this to the Tracing API And SDK Stable milestone Mar 2, 2023

hdost added the component:contrib label Oct 17, 2023

cijothomas added the A-log Area: Issues related to logs label Oct 19, 2023

cijothomas assigned lalitb Oct 24, 2023

cijothomas mentioned this issue Oct 24, 2023

Need a way to suppress telemetry from SDKs own operation #1171

Closed

hdost added the release:required-for-stable Must be resolved before GA release, or nice to have before GA. label Oct 24, 2023

hdost removed the component:contrib label Nov 12, 2023

stormshield-fabs mentioned this issue Mar 4, 2024

SimpleSpanProcessor is delaying traces (and sometimes crashing) #1143

Closed

cijothomas mentioned this issue Apr 23, 2024

Path to Stable Release: Roadmap #1678

Open

This was referenced May 13, 2024

Modify loggerprovider shutdown, flush to return single Result and handle repeat shutdown calls #1750

Merged

Show working example for OTLP Http #1756

Merged

lalitb mentioned this issue May 18, 2024

Remove opentelemetry-jaeger #1782

Merged

cijothomas mentioned this issue May 21, 2024

Prevents logs from own operation #1797

Closed

NickLarsenNZ added a commit to stackabletech/operator-rs that referenced this issue May 24, 2024

fix(stackable-telemetry): disable export of h2 events via OTLP

d7bde99

It causes infinite cascading events to be emitted. See open-telemetry/opentelemetry-rust#761

NickLarsenNZ mentioned this issue May 24, 2024

fix(stackable-telemetry): Disable export of h2 events via OTLP stackabletech/operator-rs#796

Merged

lalitb mentioned this issue Jul 2, 2024

[WIP] Suppression of nested logs from dependencies #1330

Closed

cijothomas mentioned this issue Sep 23, 2024

Improve internal opentelemetry logging #2128

Merged

4 tasks

cijothomas mentioned this issue Nov 5, 2024

Simplified self-diagnostics example #2274

Merged

cijothomas mentioned this issue Dec 11, 2024

opentelemetry-otlp async pipeline stuck after FramedWrite #473

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite trace generator #761

Infinite trace generator #761

hamiltop commented Mar 14, 2022

jherico commented May 26, 2022 •

edited

Loading

cijothomas commented Aug 7, 2023

hdost commented Feb 21, 2024

lalitb commented Mar 4, 2024 •

edited

Loading

NickLarsenNZ commented Apr 20, 2024

lalitb commented May 21, 2024

cijothomas commented May 21, 2024

mladedav commented Jun 2, 2024

lalitb commented Jun 2, 2024 •

edited

Loading

mladedav commented Jun 2, 2024 •

edited

Loading

cijothomas commented Jun 3, 2024

mladedav commented Jun 3, 2024

Infinite trace generator #761

Infinite trace generator #761

Comments

hamiltop commented Mar 14, 2022

jherico commented May 26, 2022 • edited Loading

cijothomas commented Aug 7, 2023

hdost commented Feb 21, 2024

lalitb commented Mar 4, 2024 • edited Loading

NickLarsenNZ commented Apr 20, 2024

lalitb commented May 21, 2024

cijothomas commented May 21, 2024

mladedav commented Jun 2, 2024

lalitb commented Jun 2, 2024 • edited Loading

mladedav commented Jun 2, 2024 • edited Loading

cijothomas commented Jun 3, 2024

mladedav commented Jun 3, 2024

jherico commented May 26, 2022 •

edited

Loading

lalitb commented Mar 4, 2024 •

edited

Loading

lalitb commented Jun 2, 2024 •

edited

Loading

mladedav commented Jun 2, 2024 •

edited

Loading