-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporterhelper] Use span links when tracing across queue #12212
Comments
This is only a problem for the persistent queue implementation. The in-memory queue propagates everything but the deadline. See https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/internal/queue_sender.go#L167C15-L167C28 |
@bogdandrutu In my tests, I think the traces were broken even with the memory queue, but I guess that must be a bug then. I'll check again. |
It looks like my observations are because of a bug with the batching code: when batching is not enabled, data goes through the However, even with we fix that bug, I think there is still an improvement to be made with the in-memory queue: it currently creates a parent-child relationship between the previous component's span and the export operation span, which will not always make sense because of the asynchronous nature of the queue, and the potentially N-to-N relation between enqueue operations and export operations caused by the batcher. This is why I am arguing we should use span links instead. I'll rename the issue. |
Is there a strong requirement for this? |
If:
|
As you said, we have a technical requirement to use span links when an export operation has multiple parent enqueue operations because of batching (because a span can only have one parent, but can have multiple links). I agree that we could technically use child spans in the other cases (batching disabled, or enabled but no merging was performed), but I think it would be good to do the same thing in all cases to avoid confusion. Moreover, while I agree it's not "that" wrong to use child spans when batching is disabled but the queue is enabled, I think span links would give easier-to-interpret output, notably under load where there might be a considerable delay between the enqueue operation and the export operation. It sounds like span links were designed specifically for this sort of asynchronous processing according to the docs:
|
Is your feature request related to a problem? Please describe.
The observability requirements for stable components requires the processing performance of a component to be observable.
At the moment, this is not quite true for the OTLP exporter (cf. telemetry review), and prevents its stabilization. Using the
exporterhelper
, it emits a span for the exporting operation, (including retries), but there is no link with the initial trace that enqueued the data, which prevents analysis of latency and queueing times.Update: In the case of the in-memory queue and with the fix at #12225, there will actually be a parent-child relationship. But even in that case, I think a span link would be more appropriate for the reasons below.
Describe the solution you'd like
When the queue is enabled in
exporterhelper
, I think a span should be emitted, itsSpanContext
should be saved in the queue (potentially serialized if using the persistent queue), and the span for the export operation should have a span link linking it to the queueing operation. Note that the links may be N to N, because the batching exporter may fuse or split batches. Adding an attribute on each link describing how many items were processed from an initial batch may be useful.Describe alternatives you've considered
It would also be possible to measure the latency using a new metric, but this would provide less information, and would still require some context propagation through the queue.
Additional context
Related issues: #11740, #8804
The text was updated successfully, but these errors were encountered: