Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

Specify how to propagate consistent head sampling probability #168

Merged
merged 46 commits into from
Sep 29, 2021
Merged
Changes from 4 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
14bd54e
Specify how to propagate head sampling probability
Jul 23, 2021
1d5d60a
edit
Jul 23, 2021
c741f7e
version
Jul 23, 2021
6adbd1a
links to OTEP 148 are TODOs
Jul 23, 2021
11206d7
rename
Jul 27, 2021
4085972
Add a tracestate variation
Jul 27, 2021
5cd3b9a
redraft using tracestate and two values
Jul 28, 2021
5aedc9c
edits
Jul 28, 2021
32544ea
Drop mention of inflationary
Jul 28, 2021
aa22609
detail about samplers
Jul 28, 2021
73f3b6f
edit
Jul 29, 2021
2fbcb30
change the format to otel=k1:v;k2:v; explain geometric distribution
Aug 10, 2021
695025c
followup from feedback and this week's SIG
Aug 20, 2021
fb75d9c
edits
Aug 20, 2021
8f7ad73
Let 2^61 be the min probability; leaves one unused value to represent…
Aug 23, 2021
765bd12
worked example (draft)
Sep 3, 2021
56910bd
corner cases
Sep 8, 2021
e06a7cf
corner case edits
Sep 8, 2021
0804649
corner case edits
Sep 8, 2021
cb068a2
edit
Sep 8, 2021
c9fa24f
from @oertl feedback especially
Sep 8, 2021
1b3ae23
clarify
Sep 8, 2021
d0c2697
Apply suggestions from code review
jmacd Sep 9, 2021
98f6403
rewrite explaination for r-value
Sep 9, 2021
16947f7
more
Sep 9, 2021
d9a4d59
example
Sep 9, 2021
34ec604
selection probability -> probabilty of r
Sep 9, 2021
f94c2d5
Merge branch 'main' of github.com:open-telemetry/oteps into jmacd/tra…
Sep 9, 2021
48123fe
typos
Sep 9, 2021
139f248
another example
Sep 10, 2021
2a37c4c
off-by-ones
Sep 10, 2021
a9c7500
discuss naming
Sep 10, 2021
b11f70e
Apply suggestions from code review
jmacd Sep 10, 2021
2a59cfc
off-by-zero
Sep 13, 2021
bb92360
Merge branch 'jmacd/traceprop' of github.com:jmacd/oteps into jmacd/t…
Sep 13, 2021
3097dcb
lint
Sep 15, 2021
0acc729
lint
Sep 15, 2021
fa2ded1
Remove log_head_adjusteed_count; remove the +1 bias for p-values; r n…
Sep 21, 2021
d119c57
Use 7/16
Sep 21, 2021
5ea047e
Use 7/16
Sep 21, 2021
28779fe
Use 7/16
Sep 21, 2021
04b37e4
Merge branch 'main' into jmacd/traceprop
jmacd Sep 21, 2021
32c384e
5%
Sep 28, 2021
efc4bb0
Merge branch 'jmacd/traceprop' of github.com:jmacd/oteps into jmacd/t…
Sep 28, 2021
f6ffd02
mention w3c trace context issue 467 (randomess bit); move issue 463 t…
Sep 28, 2021
0a296b5
whitespace
Sep 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions text/trace/0000-sampling-propagation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Propagate head trace sampling probability

Propose extending the W3C trace context `traceparent` to convey head trace sampling probability.

## Motivation

The head trace probability is useful in child contexts to be able to
record the effective sampling probability in child spans. This is
documented in [OTEP 148](TODO), which establishes semantic conventions
for conveying the adjusted count of a span via attributes recorded
with the span. When a sampling decision is based on the parent's
context, the effective sampling probability, which determines the
child's adjusted count, cannot be recorded without propagating it
through the context.

We propose to propagate the trace sampling probability that is in
effect alongside the [W3C
sampled](https://www.w3.org/TR/trace-context/#sampled-flag) flag by
extending the `traceparent`.

## Explanation

To limit the cost of this extension, to ensure that it is widely
supported, and for statistical reasons documented below, we propose to
limit head tracing probability to powers of two. This limits the
available head sampling probabilities to 1/2, 1/4, 1/8, and so on, and
we can compactly encode these probabilities as small integers using
the negative base-2 logarithm of the effective probability.

For example, the value 2 corresponds with 1-in-4 sampling, the value
10 corresponds with 1-in-1024 sampling.

Wheres the [version-0 W3C trace context `traceparent`
header](https://www.w3.org/TR/trace-context/#examples-of-http-traceparent-headers)
is a concatenation of four fields,

```
traceparent: (version)-(trace_id)-(span_id)-(flags)
```

This proposal would upgrade `traceparent` to version 1 with a new
field named `log-count`,

```
traceparent: (version)-(trace_id)-(span_id)-(flags)-(log-count)
```

where `log-count` is the encoded negative base-2 logarithm of
sampling probability, which is the base-2 logarithm of the adjusted
count of a child span created in this context (i.e., the logarithm of
the effective count, thus "log-count"). To compute the adjusted count
of a child span created in this context, use `2^log-count`. A
log-count of `0` corresponds with `(2^0)=1`, thus 0 conveys a context
with probability 1.

The sampling probability of a context is independent from whether it
is sampled. We consider it [useful to convey sampling probability
even when unsampled]() as it can be used to estimate the potential
overhead of starting new sampled traces.

### Examples

These are extended [from the W3C
examples](https://www.w3.org/TR/trace-context/#examples-of-http-traceparent-headers):

```
Value = 01-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01-05
base16(version) = 01
base16(trace-id) = 4bf92f3577b34da6a3ce929d0e0e4736
base16(parent-id) = 00f067aa0ba902b7
base16(trace-flags) = 01 // sampled
base16(log-count) = 05 // head probability is 2^-5.
```

```
Value = 01-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-00-00
base16(version) = 01
base16(trace-id) = 4bf92f3577b34da6a3ce929d0e0e4736
base16(parent-id) = 00f067aa0ba902b7
base16(trace-flags) = 00 // not sampled
base16(log-count) = 11 // head probability is 2^-17
```

We are able to express sampling probabilities as small as 2^-255 using
just 3 bytes per `traceparent`.

## Internal details

The reasoning behind restricting the set of sampling rates is that it:

- Lowers the cost of propagating head sampling probability
- Makes math involving partial traces tractable.

A use known as "inflationary sampling" from Google's Dapper system is
documented in [OTEP 148](TODO). This is is used to justify
propagating the head sampling probability even when unsampled.

[An algorithm for making statistical inferance from partially-sampled
traces has been published](https://arxiv.org/pdf/2107.07703.pdf) that
explains how to work with power-of-2 sampling rates.

## Trade-offs and mitigations

Restricting head sampling rates to powers of two does not limit tail
Samplers from using arbitrary probabilities.

Restricting head sampling rates to powers of two does not limit
Samplers from using arbitrary effective probabilities over a period of
time. For example, choosing 1/2 sampling half of the time and 1/4
sampling half of the time leads to an effective sampling rate of 3/8.

## Prior art and alternatives

Google's Dapper system propagated a field in its trace context called
"inverse_probability", which is equivalent to adjusted count. This
proposal uses the base-2 logarithm of adjusted count to save space

## Open questions

This OTEP suggests how to modify the W3C trace context to accomodate
sampling in OpenTelemetry. [OTEP 148](TODO) suggests semantic
conventions for encoding adjusted count in a Span, but neither text
specifies how to modify the built-in Samplers to produce the proposed
new `traceparent` field so that the `ParentBased` Sampler can
correctly set the proposed `sampler.adjusted_count` attribute. This
will be future work.