Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

Probability sampler composition rules #175

Merged
merged 74 commits into from
Oct 14, 2021
Merged
Changes from 5 commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
14bd54e
Specify how to propagate head sampling probability
Jul 23, 2021
1d5d60a
edit
Jul 23, 2021
c741f7e
version
Jul 23, 2021
6adbd1a
links to OTEP 148 are TODOs
Jul 23, 2021
11206d7
rename
Jul 27, 2021
4085972
Add a tracestate variation
Jul 27, 2021
5cd3b9a
redraft using tracestate and two values
Jul 28, 2021
5aedc9c
edits
Jul 28, 2021
32544ea
Drop mention of inflationary
Jul 28, 2021
aa22609
detail about samplers
Jul 28, 2021
73f3b6f
edit
Jul 29, 2021
2fbcb30
change the format to otel=k1:v;k2:v; explain geometric distribution
Aug 10, 2021
695025c
followup from feedback and this week's SIG
Aug 20, 2021
fb75d9c
edits
Aug 20, 2021
8f7ad73
Let 2^61 be the min probability; leaves one unused value to represent…
Aug 23, 2021
765bd12
worked example (draft)
Sep 3, 2021
56910bd
corner cases
Sep 8, 2021
e06a7cf
corner case edits
Sep 8, 2021
0804649
corner case edits
Sep 8, 2021
cb068a2
edit
Sep 8, 2021
c9fa24f
from @oertl feedback especially
Sep 8, 2021
1b3ae23
clarify
Sep 8, 2021
d0c2697
Apply suggestions from code review
jmacd Sep 9, 2021
98f6403
rewrite explaination for r-value
Sep 9, 2021
16947f7
more
Sep 9, 2021
d9a4d59
example
Sep 9, 2021
34ec604
selection probability -> probabilty of r
Sep 9, 2021
f94c2d5
Merge branch 'main' of github.com:open-telemetry/oteps into jmacd/tra…
Sep 9, 2021
48123fe
typos
Sep 9, 2021
139f248
another example
Sep 10, 2021
2a37c4c
off-by-ones
Sep 10, 2021
a9c7500
discuss naming
Sep 10, 2021
b11f70e
Apply suggestions from code review
jmacd Sep 10, 2021
2a59cfc
off-by-zero
Sep 13, 2021
bb92360
Merge branch 'jmacd/traceprop' of github.com:jmacd/oteps into jmacd/t…
Sep 13, 2021
57b9a55
add composition rules (draft)
Sep 14, 2021
ac3447b
edits
Sep 14, 2021
6ef69e0
give examples
Sep 14, 2021
0706963
cut examples
Sep 14, 2021
23120ec
whitespace
Sep 14, 2021
3097dcb
lint
Sep 15, 2021
0acc729
lint
Sep 15, 2021
fa2ded1
Remove log_head_adjusteed_count; remove the +1 bias for p-values; r n…
Sep 21, 2021
46c7377
Remove log_head_adjusteed_count; remove the +1 bias for p-values; r n…
Sep 21, 2021
99793cb
edits
Sep 21, 2021
d119c57
Use 7/16
Sep 21, 2021
5ea047e
Use 7/16
Sep 21, 2021
28779fe
Use 7/16
Sep 21, 2021
04b37e4
Merge branch 'main' into jmacd/traceprop
jmacd Sep 21, 2021
32c384e
5%
Sep 28, 2021
efc4bb0
Merge branch 'jmacd/traceprop' of github.com:jmacd/oteps into jmacd/t…
Sep 28, 2021
f6ffd02
mention w3c trace context issue 467 (randomess bit); move issue 463 t…
Sep 28, 2021
549a049
Merge branch 'jmacd/traceprop' of github.com:jmacd/oteps into jmacd/1…
Sep 28, 2021
89868b3
remove accidental file add
Sep 29, 2021
84406fb
Merge branch 'main' of github.com:open-telemetry/oteps into jmacd/170…
Sep 29, 2021
8189c80
oertl's feedback on 168
Sep 29, 2021
e477dad
Define adjusted count in OTEP 168, linked to OTEP 170
Sep 29, 2021
752704b
head -> parent
Sep 29, 2021
86620dc
Revise wording of TraceIDRatioBased
Sep 29, 2021
8ab9f5e
clarify head vs parent trace sampling
Sep 29, 2021
174cda9
improve def of consistent sampling
Sep 29, 2021
f6489cd
log_adjusted_count
Sep 29, 2021
3938f94
misspell
Sep 29, 2021
1e54c51
off-by-one (agin); use 'parent sampling probability'
Sep 30, 2021
5101cd9
use 'parent sampling probability' in 170
Sep 30, 2021
be98b31
edit consistent sampling def
Sep 30, 2021
d82c8f7
seven
Sep 30, 2021
d211687
Update text/trace/0168-sampling-propagation.md
jmacd Oct 4, 2021
b537764
typo fix
Oct 5, 2021
10f77b8
Merge branch 'jmacd/170composition' of github.com:jmacd/oteps into jm…
Oct 5, 2021
89be84e
Update text/trace/0168-sampling-propagation.md
jmacd Oct 11, 2021
c2bb667
Merge branch 'main' into jmacd/170composition
jmacd Oct 11, 2021
7a2b745
Merge branch 'main' of github.com:open-telemetry/oteps into jmacd/170…
Oct 11, 2021
3513f9d
Merge branch 'jmacd/170composition' of github.com:jmacd/oteps into jm…
Oct 11, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
201 changes: 196 additions & 5 deletions text/trace/0170-sampling-probability.md
Original file line number Diff line number Diff line change
Expand Up @@ -586,7 +586,7 @@ unset. Thus, the 0 value shall mean unknown adjusted count.

The OTEP 168 proposal for _propagating_ head sampling probability uses
6 bits of information, with 62 ordinary values, one zero value, and a
single unused value.
single Unknown value.

Here, we propose a biased encoding for head sampling probability equal
to 1 plus the `P` value as proposed in OTEP 168. The proposed span
Expand Down Expand Up @@ -617,10 +617,201 @@ stream of Span data with non-zero values in the `log_head_adjusted_count`
field can approximately and accurately count Spans using adjusted
counts.

Non-probabilistic Samplers such as the [Leaky-bucket rate-limited
sampler](https://github.com/open-telemetry/opentelemetry-specification/issues/1769)
SHOULD set the `log_head_adjusted_count` field to zero to indicate an
unknown adjusted count.
### Span data model changes

The OpenTelemetry trace data "model" is not currently specified in a
stand-alone way, the way it has been done for logs and metrics. This
OTEP calls for the creation of a new `trace/datamodel.md` file to
capture this sort of specification detail.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

Addition to the Span data model:
jmacd marked this conversation as resolved.
Show resolved Hide resolved

```
### Definitions Used in this Document

#### Sampler
jmacd marked this conversation as resolved.
Show resolved Hide resolved

A Sampler provides configurable logic, used by the SDK, for selecting
which Spans are "recorded" and/or "sampled" in a tracing client
library. To "record" a span means to build a representation of it in
the client's memory, which makes it eligible for being exported. To
"sample" a span implies setting the W3C `sampled` flag and recording
the span for export.

OpenTelemetry supports spans that are "recorded" and not "sampled"
for "live" observability of spans (e.g., z-pages).

#### Parent-based sampling

A Sampler that makes its decision to sample based on the W3C `sampled`
flag is said to use parent-based sampling.

#### Head sampling

In a tracing context, Head sampling refers to the initial decision to
sample a span or a trace, which determines the W3C `sampled` flag of
the child context. The OpenTelemetry tracing data model currently
supports only head sampling.

#### Probability sampler

A probability Sampler is a Sampler that knows immediately, for each
of its decisions, the probability that the span had of being selected.

Sampling probability is defined as a number less than or equal to 1
and greater than 0 (i.e., `0 < probability <= 1`). The case of 0
probability is treated as a special, non-probabilistic case.

#### Consistent probability sampler

A consistent probability sampler is a Sampler that for a configured
sampling probability will make the same decision as another sampler
with the same or greater probability. In OpenTelemetry, consistent
jmacd marked this conversation as resolved.
Show resolved Hide resolved
probability samplers are limited to power-of-two probabilities.

Consistent probability sampling is defined in terms of a "p-value"
and an "r-value", both of which are propagated via the context to assist
in making consistent sampling decisions.

### Always-on sampler

An always-on sampler is another name for a consistent probability
sampler with probability equal to one.

### Always-off sampler

An always-off Sampler has the effect of disabling a span completely,
effectively excluding it from the population. This is not defined as
a probability sampler with zero probability, because these spans are
effectively uncountable.

### Non-probability sampler

A non-probability sampler is a Sampler that makes its decisions not
based on chance, but instead uses arbitrary logic and internal state
to make its decisions. Because OpenTelemetry specifies the use of
consistent probability samplers, any sampler other than a parent-based
sampler that does not meet all the requirements for consistent probability
sampling is termed a non-probability sampler.

#### Adjusted count

Adjusted count is defined as a measure of representivity, the number
of spans in the population that are represented by the individually
sampled span. Span-to-metrics pipelines may be built by adding the
adjusted count of each sample span to a counter of matching spans,
observing the duration of each sample span in a histogram adjusted
count many times, and so on.

The adjusted count 1 means an one-to-one sampling was in effect.
Adjusted counts greater than 1 indicate the use of a probability
sampler. Adjusted counts are unknown when using a non-probability
sampler.

Zero adjusted count is defined in a way to support composition of
probability and non-probability samplers. In effect, spans that are
"recorded" but not "sampled" have adjusted count of zero.

#### Unbiased probability sampling

The statistical term "unbiased" is a requirement applied to the
adjusted count of a span, which states that the expected value of the
sum of adjusted counts across all exported spans MUST equal the true
number of spans in the population. Statistical bias, a measure of the
difference between an estimate and its true value, of the estimated
span count in the population should equal zero. Moreover, this
requirement must be true for all subsets of the span population for a
sampler to be considered an unbiased probability sampler.

It is easier to define probability sampling by what it is not. Here
are several samplers that should be categorized as non-probability
samplers because they cannot record unbiased adjusted counts:

- A traditional form of "leaky-bucket" sampler applies a rate limit to
the starting of new sampled traces. When the configured limit is
not exceeded, all spans pass through with adjusted count 1. When
the configured rate limit is exceeded, it is impossible to set
adjusted count without introducing bias because future arrivals are
not known.
- A "every-N" sampler records spans on a regular interval, but instead
of making a probabilistic decision it makes an exact decision
(e.g., every 10,000 spans). This sampler knows the representivity
of the spans it samples, but the selection process is biased.
- A "at least once per time period" sampler remembers the last time
each distinct span name exported a span. When a span occurs after
more than the specified interval, it samples one (e.g., to ensure
that receivers know about these spans). This sampler introduces
bias because spans that happen between the intervals do not receive
consideration.
- The "always off" sampler is biased by definition. Since it exports
no spans, the sum of adjusted count is always zero.
```

### Proposed `Sampler` composition rules

When combining multiple Samplers, the natural outcome is that a span
will be recorded and sampled if any one of the Samplers says to record
or sample the span. To combine Samplers in a way that preserves
adjusted count requires first classifying Samplers into one of the
following categories:

1. Parent-based (`ParentBased`)
2. Known non-zero probability (`TraceIDRatio`, `AlwaysOn`)
3. Non-probability based (`AlwaysOff`, all other Samplers)

The Parent-based sampler always reduces into one of other two at
runtime, based on whether the parent context includes known head
probability or not.

Here are the rules for combining Sampler decisions from each of these
categories that may be used to construct composite samplers.

#### Composing two consistent probability samplers

When two consistent probability samplers are used, the Sampler with
the larger probability by definition includes every span the smaller
probability sampler would select. The result is a consistent sampler
with the minimum p-value.

#### Composing a probability sampler and a non-probability sampler

When a probability sampler is composed with a non-probability sampler,
the effect is to change an unknown probability into a known
probability. When the probability sampler selects the span, its
adjusted count will be used. When the probability sampler does not
select a span, zero adjusted count will be used.

The use of zero adjusted count allows recording spans that an unbiased
probability sampler did not select, allowing those spans to be
received at the backend without introducing statistical bias.

#### Composition rules summary

To create a composite Sampler, first express the result of each
Sampler in terms of the p-value and `sampled` flag. Note that
p-values fall into three categories:

1. Unknown p-value (`p=0`) indicates unknown adjusted count
2. Known non-zero p-value (in the range `[1,62]`) indicates known non-zero adjusted count
3. Known zero p-value (`p=63`) indicates known zero adjusted count

While non-probability samplers are always return `p=0` and may set
`sampled=true` or `sampled=false`, a probability sampler is restricted
to returning either `p∈[1,62]` with `sampled=true` or to returning
`p=63` with `sampled=false`. No individual sampler can return `p=63`
with `sampled=true`, but this condition MAY result from composition of
`p=63` and `p=0`.

A composite sampler can be computed by starting with an initial state
(`p=0`, `sampled=false`) and updating the composite result to the
value in the table below, where columns are the input state and rows
are the Sampler decision.

| Composition input<br> \ <br>Sampler decision | Unknown adjusted count<br>(p<sub>0</sub>=0, sampled<sub>1</sub>∈{true,false}) | Known adjusted count<br>(p<sub>0</sub>∈[1,63], sampled<sub>1</sub>∈{true,false}) |
| -- | -- | -- |
| <b>Non-probability sampler<br>(p<sub>1</sub>=0, sampled<sub>1</sub>∈{true,false})</b> | p=0, <br>sampled=logicalOr(sampled<sub>0</sub>, sampled<sub>1</sub>) | p=p<sub>0</sub><br>sampled=logicalOr(sampled<sub>0</sub>, sampled<sub>1</sub>) |
| <b>Probability sampler<br>(p<sub>1</sub>∈[1,62], sampled<sub>1</sub>=true) or<br>(p<sub>1</sub>=63, sampled<sub>1</sub>=false) | p=p<sub>1</sub><br>sampled=logicalOr(sampled<sub>0</sub>, sampled<sub>1</sub>) | p=min(p<sub>0</sub>, p<sub>1</sub>)<br>sampled=logicalOr(sampled<sub>0</sub>, sampled<sub>1</sub>)</b> |
| | | |
jmacd marked this conversation as resolved.
Show resolved Hide resolved

### Proposed `Span` field documentation

Expand Down