[exporterhelper] Use span links when tracing across queue #12212

jade-guiton-dd · 2025-01-30T15:45:02Z

Is your feature request related to a problem? Please describe.

The observability requirements for stable components requires the processing performance of a component to be observable.

At the moment, this is not quite true for the OTLP exporter (cf. telemetry review), and prevents its stabilization. Using the exporterhelper, it emits a span for the exporting operation, (including retries), but there is no link with the initial trace that enqueued the data, which prevents analysis of latency and queueing times.

Update: In the case of the in-memory queue and with the fix at #12225, there will actually be a parent-child relationship. But even in that case, I think a span link would be more appropriate for the reasons below.

Describe the solution you'd like

When the queue is enabled in exporterhelper, I think a span should be emitted, its SpanContext should be saved in the queue (potentially serialized if using the persistent queue), and the span for the export operation should have a span link linking it to the queueing operation. Note that the links may be N to N, because the batching exporter may fuse or split batches. Adding an attribute on each link describing how many items were processed from an initial batch may be useful.

Describe alternatives you've considered

It would also be possible to measure the latency using a new metric, but this would provide less information, and would still require some context propagation through the queue.

Additional context

Related issues: #11740, #8804

The text was updated successfully, but these errors were encountered:

bogdandrutu · 2025-01-31T05:11:16Z

This is only a problem for the persistent queue implementation. The in-memory queue propagates everything but the deadline. See https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/internal/queue_sender.go#L167C15-L167C28

jade-guiton-dd · 2025-01-31T14:31:52Z

@bogdandrutu In my tests, I think the traces were broken even with the memory queue, but I guess that must be a bug then. I'll check again.

jade-guiton-dd · 2025-01-31T15:35:27Z

It looks like my observations are because of a bug with the batching code: when batching is not enabled, data goes through the DisabledBatcher, which discards the provided Context and creates a new one with context.Background for some reason. I filed a PR to fix that: #12225 (Edit: Fix merged in #12231)

However, even with we fix that bug, I think there is still an improvement to be made with the in-memory queue: it currently creates a parent-child relationship between the previous component's span and the export operation span, which will not always make sense because of the asynchronous nature of the queue, and the potentially N-to-N relation between enqueue operations and export operations caused by the batcher. This is why I am arguing we should use span links instead. I'll rename the issue.

jade-guiton-dd · 2025-01-31T15:49:37Z

To give an example of what I mean by the parent-child relationship not making sense, this is what traces look like with a fix to the above bug:

The exporter/otlphttp/bar/traces span is not contained within the parent receiver/otlp/foo/TracesDataReceived span, and could hypothetically be arbitrary long after if the queue weren't empty.

bogdandrutu · 2025-01-31T20:33:30Z

The exporter/otlphttp/bar/traces span is not contained within the parent receiver/otlp/foo/TracesDataReceived span, and could hypothetically be arbitrary long after if the queue weren't empty.

Is there a strong requirement for this?

bogdandrutu · 2025-01-31T20:37:16Z

However, even with we fix that bug, I think there is still an improvement to be made with the in-memory queue: it currently creates a parent-child relationship between the previous component's span and the export operation span, which will not always make sense because of the asynchronous nature of the queue, and the potentially N-to-N relation between enqueue operations and export operations caused by the batcher. This is why I am arguing we should use span links instead. I'll rename the issue.

If:

queue and batch are disabled -> simple child is ok
queue is disabled and batch enabled -> In general we need a new trace and links to "parents", but if the batch is only one request do we do a child Span?
If queue is enabled and batch is disabled -> Is it "that" wrong if this is a child?

jade-guiton-dd · 2025-02-03T11:45:03Z

As you said, we have a technical requirement to use span links when an export operation has multiple parent enqueue operations because of batching (because a span can only have one parent, but can have multiple links). I agree that we could technically use child spans in the other cases (batching disabled, or enabled but no merging was performed), but I think it would be good to do the same thing in all cases to avoid confusion.

Moreover, while I agree it's not "that" wrong to use child spans when batching is disabled but the queue is enabled, I think span links would give easier-to-interpret output, notably under load where there might be a considerable delay between the enqueue operation and the export operation. It sounds like span links were designed specifically for this sort of asynchronous processing according to the docs:

For example, let’s say we have a distributed system where some operations are tracked by a trace. In response to some of these operations, an additional operation is queued to be executed, but its execution is asynchronous. We can track this subsequent operation with a trace as well. We would like to associate the trace for the subsequent operations with the first trace, but we cannot predict when the subsequent operations will start. We need to associate these two traces, so we will use a span link.

jade-guiton-dd added area:exporter enhancement New feature or request labels Jan 30, 2025

jade-guiton-dd added this to Collector: v1 Jan 30, 2025

github-project-automation bot moved this to Todo in Collector: v1 Jan 30, 2025

jade-guiton-dd mentioned this issue Jan 30, 2025

[exporter/otlp] Review telemetry #11140

Open

jade-guiton-dd changed the title ~~[exporterhelper] Preserve trace context through queue~~ [exporterhelper] Use span links when tracing across queue Jan 31, 2025

mx-psi added this to the Self observability milestone Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[exporterhelper] Use span links when tracing across queue #12212

[exporterhelper] Use span links when tracing across queue #12212

jade-guiton-dd commented Jan 30, 2025 •

edited

Loading

bogdandrutu commented Jan 31, 2025

jade-guiton-dd commented Jan 31, 2025 •

edited

Loading

jade-guiton-dd commented Jan 31, 2025 •

edited

Loading

jade-guiton-dd commented Jan 31, 2025 •

edited

Loading

bogdandrutu commented Jan 31, 2025

bogdandrutu commented Jan 31, 2025

jade-guiton-dd commented Feb 3, 2025 •

edited

Loading

[exporterhelper] Use span links when tracing across queue #12212

[exporterhelper] Use span links when tracing across queue #12212

Comments

jade-guiton-dd commented Jan 30, 2025 • edited Loading

bogdandrutu commented Jan 31, 2025

jade-guiton-dd commented Jan 31, 2025 • edited Loading

jade-guiton-dd commented Jan 31, 2025 • edited Loading

jade-guiton-dd commented Jan 31, 2025 • edited Loading

bogdandrutu commented Jan 31, 2025

bogdandrutu commented Jan 31, 2025

jade-guiton-dd commented Feb 3, 2025 • edited Loading

jade-guiton-dd commented Jan 30, 2025 •

edited

Loading

jade-guiton-dd commented Jan 31, 2025 •

edited

Loading

jade-guiton-dd commented Jan 31, 2025 •

edited

Loading

jade-guiton-dd commented Jan 31, 2025 •

edited

Loading

jade-guiton-dd commented Feb 3, 2025 •

edited

Loading