Sampler API V2: What, Why, and How? #4044
Labels
area:sampling
Related to trace sampling
sig-issue
A specific SIG should look into this before discussing at the spec
spec:trace
Related to the specification/trace directory
triage:accepted:ready-with-sponsor
Ready to be implemented and has a specification sponsor assigned
[Filing this per the discussion in the Sampling SIG. The goal is to improve clarity for everyone involved (including me) - by attempting to summarize in one place the motivation for why we need to enhance the Sampling API: what problems we need to solve and how we will solve them.]
Note: This is work in progress, feedback/corrections are welcome, and will iterate on this text.
V2 of Sampler API: What, Why, and How?
Executive Summary
The V1 of the Sampler and Span Processor APIs were defined in the OpenTelemetry specification a few years back. Since that time, customers and community members have shared many feedback items to improve it.
These involve aspects such as supporting deferred dropping of spans, making certain additional fields available for the sampling decision, better support for consistent sampling of linked traces, and the ability to have isolated processor/exporter pipelines.
Hence, we must introduce a V2 of the Sampling API to solve these problems. This will make sampling more powerful and flexible for OpenTelemetry customers to enable them to achieve the above aspects.
Problems with the current Sampling API
Here are a few problems that have been identified with the current sampling API:
1. No support for deferred dropping of spans
Currently, once a sampler decides to drop a span, it is dropped before it gets to the span processor or exporter. There's no support for deferring the dropping of such spans to a later stage in data collection, say to an out-of-proc collector. Why is this a problem? There are two reasons:
For more details, see the below issues:
2. No support for customizing behavior per exporter
Currently in the Tracing API, there's no way to cleanly have multiple processing + export pipelines with isolated behaviors. Currently when multiple processors are configured, a subsequent processor sees the changes made by a prior processor. However, there are situations where you want to have independent sampling behavior and independent processing and exporting of spans.
For example, in Metrics SDK's Reader and Exporter model, it is possible to have independent MetricReader and MetricExporter pipelines:
In the tracing API, we need a way of specifying that each exporter should have isolated control over its custom processing and sampling decision.
For more details, see Sampling: Each exporter should have isolated control over its sampler decision and custom processing · Issue #3284 · open-telemetry/opentelemetry-specification (github.com).
3. Certain fields are not available when making sampling decision
There are a few fields which are not available today while making a sampling decision. The below are the fields, and a summary of why it would be helpful:
ShouldSample
doesn't currently have a way to know the new Span'sTraceFlags
, so it can't determine whether the Random Trace ID Flag is set. Hence, we should consider takingTraceFlags
as an additional parameter.For more details, see the below issues:
4. The description of a sampler is immutable which makes it less useful
Currently, the sampler's description is immutable. Ideally, this should be mutable, so that a sampler's current behavior (e.g., its current sampling rate - say if it was updated by talking to a config service) can be used programmatically or for debugging purposes.
For more details, see issue Remove unreasonable restriction on Sampler's description to be immutable · Issue #2095 · open-telemetry/opentelemetry-specification (github.com).
5. Composing samplers in a consistent manner is difficult
With the current sampler model, it is not easy enough to achieve composition of samplers that play well with consistent probability sampling requirements. For an example of a specific problem, please see this comment:
tracestate
specification #2047 (comment)This is being addressed by the following OTEP.
Solution Approach
This section is TBD (work in progress).
The text was updated successfully, but these errors were encountered: