Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenTelemetry sampling conventions #793

Closed
wants to merge 32 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
e767013
Add OpenTelemetry sampling conventions
jmacd Mar 5, 2024
0126c1d
chlog
jmacd Mar 5, 2024
8646a41
lint
jmacd Mar 5, 2024
157e07b
wip
jmacd Mar 6, 2024
f3c5da1
move into registry
jmacd Mar 6, 2024
1f1ca45
address intended user for each attribute
jmacd Mar 6, 2024
b5a65c4
address term implementations
jmacd Mar 6, 2024
a4c2068
give user perspective
jmacd Mar 6, 2024
b1574bd
clarify attributes can be used for spans
jmacd Mar 6, 2024
42e47f9
finish sentence
jmacd Mar 6, 2024
9badfa4
remove some bits
jmacd Mar 6, 2024
7e56498
apply suggestion
jmacd Mar 6, 2024
e466de8
yamllint
jmacd Mar 7, 2024
c50081e
Update .chloggen/793.yaml
jmacd Mar 7, 2024
73b0571
merge
jmacd Mar 7, 2024
4e8870d
add a tail-sampler example
jmacd Mar 7, 2024
4a1b1df
toc lint
jmacd Mar 7, 2024
21d9b99
Merge branch 'main' into jmacd/sampling_convs
jmacd Mar 7, 2024
e317e02
toc lint
jmacd Mar 7, 2024
e451ed2
Merge branch 'main' of github.com:open-telemetry/semantic-conventions…
jmacd Mar 11, 2024
e114c50
OTEP 235 ref
jmacd Mar 11, 2024
a28f48e
expand on sampling.priority
jmacd Mar 11, 2024
a55667f
expand on sampling.randomness
jmacd Mar 11, 2024
79275a3
expand on logs interpreting tracestate from span context
jmacd Mar 11, 2024
c3650d2
be more generic: sampler not tracer
jmacd Mar 11, 2024
9a7c46b
Apply suggestions from code review
jmacd Mar 11, 2024
c1d6c78
Merge branch 'main' into jmacd/sampling_convs
joaopgrassi Mar 13, 2024
9304f29
remove sampling priority
jmacd Mar 25, 2024
1cb4153
Merge branch 'jmacd/sampling_convs' of github.com:jmacd/semantic-conv…
jmacd Mar 25, 2024
8db652a
all the way removed
jmacd Mar 27, 2024
9ccd8d7
Update docs/sampling/README.md
jmacd May 31, 2024
157fdca
Update docs/sampling/README.md
jmacd May 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .chloggen/793.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Use this changelog template to create an entry for release notes.
#
# If your change doesn't affect end users you should instead start
# your pull request title with [chore] or use the "Skip Changelog" label.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: new_component

# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db)
component: sampling

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Introduce attributes describing priority sampling, probability sampling.

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
# The values here must be integers.
issues: [793]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
Adds `sampling.randomness` and `sampling.threshold` from [OTEP 235][OTEP235].
[OTEP235]: https://github.com/open-telemetry/oteps/blob/main/text/trace/0235-sampling-threshold-in-trace-state.md
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Semantic Conventions are defined for the following areas:
* [Messaging](messaging/README.md): Semantic Conventions for messaging operations and systems.
* [Object Stores](object-stores/README.md): Semantic Conventions for object stores operations.
* [RPC](rpc/README.md): Semantic Conventions for RPC client and server operations.
* [Sampling](sampling/README.md): Sampling Semantic Conventions.
* [System](system/README.md): System Semantic Conventions.

Semantic Conventions by signals:
Expand Down
1 change: 1 addition & 0 deletions docs/attributes-registry/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ Currently, the following namespaces exist:
* [OS](os.md)
* [Process](process.md)
* [RPC](rpc.md)
* [Sampling](sampling.md)
* [Server](server.md)
* [Source](source.md)
* [Thread](thread.md)
Expand Down
20 changes: 20 additions & 0 deletions docs/attributes-registry/sampling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<!--- Hugo front matter used to generate the website version of this page:
linkTitle: Sampling
--->

# Sampling

## Sampling attributes

The following attributes are recognized for telemetry in general.

<!-- semconv registry.sampling(omit_requirement_level) -->
| Attribute | Type | Description | Examples |
|---|---|---|---|
| `sampling.randomness` | string | The source of randomness for making probability sampling decisions, when it is not otherwise recorded. [1] | `ce929d0e0e4736` |
| `sampling.threshold` | string | Sampling probability as specified by OpenTelemetry. [2] | `c`; `ff8` |

**[1]:** This attribute is an optional way to express trace randomness, especially for cases where the TraceID is missing or known to be not random. Sampler components set and consume this value. The value is a hex-coded string containing 14 hex digits (56 bits) of randomness. Setting this attribute indicates the source of randomness that was used (and may be used again) for probability sampling. This field is taken to have the same meaning as the OpenTelemetry tracestate "R-value" for probability sampling, which is an alternative to deriving trace randomness from the TraceID. For details, see OTEP 235.

**[2]:** This attribute is set to convey sampling probability. Sampler components set and consume this value, which is taken to have the same meaning as the OpenTelemetry tracestate "T-value" for probability sampling. This attribute contains a hexadecimal-coded value containing 1 to 14 hex digits of precision, defining the threshold used to reject, depending on the random variable. This value can be converted into sampling probability. For details, see OTEP 235.
<!-- endsemconv -->
262 changes: 262 additions & 0 deletions docs/sampling/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
<!--- Hugo front matter used to generate the website version of this page:
linkTitle: Sampling
--->

# Semantic Conventions for Sampling

**Status**: [Experimental][DocumentStatus]

<!-- toc -->

- [Probability sampling](#probability-sampling)
- [Overriding sampling decisions](#overriding-sampling-decisions)
- [Overriding sampling randomness](#overriding-sampling-randomness)
- [Sampling threshold](#sampling-threshold)
- [Sampling randomness](#sampling-randomness)
- [No definition for Scope and Resource attributes](#no-definition-for-scope-and-resource-attributes)
- [Span sampling attributes](#span-sampling-attributes)
- [Logs sampling attributes](#logs-sampling-attributes)
- [Examples](#examples)
- [Head sampling](#head-sampling)
- [Tail sampling](#tail-sampling)

<!-- tocstop -->

These attributes reflect the effect of sampling in a telemetry
collection pipeline. These attributes describe how items of telemetry
were collected, making it possible for Span-to-Metrics pipelines
and Logs-to-Metrics pipelines to accurately count Spans and Log
Records of telemetry, before sampling, in a probabilistic sense.

These attributes MAY be modified by components in a collection
pipeline to convey successive sampling that has been carried out for a
particular item of telemetry, using the conventions for consistent
sampling described here. In that sense, telemetry consumers
should see these attributes as telemetry metadata.

## Probability sampling

The OpenTelemetry sampling decision is defined in terms of a Threshold
value and a Randomness value, each containing 56 bits of information.

A constant known as _maximum adjusted count_ (`MaxAdjustedCount`),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be just me, but I think that max-something suggests inclusiveness, so this can be confusing. How about AdjustedCountLimit?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with MaxAdjustedCount. It is inclusive with respect to the adjusted count. However, I understand that it can be a little confusing as it is also used as an exclusive upper limit for the threshold and the random value.

with value `0x100000000000000`, (which can also be expressed
`0x1p+56`, `math.Pow(2, 56)`, or `math.Ldexp(1, 56)`), defines the
exclusive upper limit of these values.

Logically, both Threshold (`T`) and Randomness (`R`) are represented
as unsigned integers in the range `0` through `0xffffffffffffff` or
`MaxAdjustedCount - 1`. Items of telemetry are selected (i.e.,
"sampled") when their Threshold value is less than or equal to their
Randomness value, or `T <= R`.

Sampling probability is defined by the following expression:

```
Probability = (MaxAdjustedCount - Threshold) / MaxAdjustedCount
```

In a Span-to-Metrics or Logs-to-Metrics pipeline, each item of
telemetry is representative of an _adjusted count_ number of items in
the original population. Adjusted count is the inverse of sampling
probability, and `MaxAdjustedCount` (defined above) is the inverse of
the smallest supported sampling probability (which can also be
represented as `0x1p-56`, `math.Pow(2, -56)`, or `math.Ldexp(1,
-56)`).
jmacd marked this conversation as resolved.
Show resolved Hide resolved

For the tracing signal, Threshold and Randomness propagate via W3C
Trace Context `tracestate`. When they appear in the `tracestate`, the
Threshold and Randomness properties are called "T-value" and
"R-value"; they are represented in the OpenTelemetry section of the
`tracestate` (having vendor tag `ot`), using properties named `th` and
`rv`, respectively.

For the logs signal, which generally does not record the W3C Trace
Context `tracestate`, sampling attributes are meant to be expressed
using log record attributes with the same definition as T-value and
R-value.

For more information on how to perform and interpret probability
sampling based on these properties, [consult OTEP 235][OTEP235].

### Overriding sampling decisions

Samplers and sampler delgates are encouraged to force a sampling
decision by configuring a 100% sampling threshold, rather than bypass
sampling logic. Forced sampling decisions will have T-value "0",
indicating 100% sampling.

To force sampling decisions in the other direction, a threshold
corresponding with zero probability can be used. However, since the
"do not sample" threshold indicates a record should not be exported,
there is no specified way to encode "zero probability". Ultimately,
the decision not to sample is not a probabilistic decision.

### Overriding sampling randomness

Sampling system designers are able to override sampling randomness,
which may be done for several reasons, including situations where
there is no TraceID defined.

When a tracing system purposely uses TraceIDs that do not follow the
W3C Trace Context Level 2 specification for TraceID randomness, and
they wish to use OpenTelemetry sampling components, they can insert
explicit randomness to prevent erroneously taking randomness from the
TraceID.

Another use for overriding sampling randomness is to configure a
different unit of sampling consistency. For example, multiple traces
can be given the same randomness value to ensure that either all or
none of them are sampled consistently.

In the tracing signal, sampling randomness can be overridden by
setting an R-value in the tracestate, however the R-value should not
be modified, once set, in a Context. In the logging signal, sampling
randomness can be overridden by setting the `sampling.randomness`
attribute.

### Sampling threshold

When determining the Threshold value from an item of telemetry,
sampler implementations SHOULD evaluate the following in order:

- use the OpenTelemetry T-value field (`th`) in `tracestate` from a live SpanContext (in context)
- use the OpenTelemetry T-value field (`th`) in `tracestate` from a Span tracestate (spans data)
- use the `sampling.threshold` attribute value, if present in the record attributes (logs data)

In both cases, the Threshold value is represented by one to 14
hexadecimal digits, allowing the use of variable-precision sampling
probability. When fewer than 14 digits are input, the string is
padded with trailing zeros to make the correct number of bits (56).

The zero Threshold value (encoded by a single `0`) corresponds with
100% sampling.

When Threshold is not provided, no information about probability
sampling is available.

### Sampling randomness

When determining the Randomness value from an item of telemetry,
sampler implementations SHOULD evaluate the following in order:

- use the OpenTelemetry R-value field (`rv`) in `tracestate` from a live SpanContext (in context)
- use the OpenTelemetry R-value field (`rv`) in `tracestate` from a Span tracestate (spans data)
- use the `sampling.randomness` attribute value if it is present (logs data), or
- use the least significant 56 bits of the W3C Trace Context TraceID, as described in the W3C Trace Context Level 2 specification (in context, spans data, and logs data)

In the first two cases, where Randomness is explicitly encoded, the
value is represented by exactly 14 hexadecimal digits.

Sampler implementations SHOULD NOT require trace flags to have the Trace
Context Level 2 Random flag set, in case the Trace ID is used as the
source of randomness. Because the Random flag is not widely available
at this time, and because the W3C Trace Context Level 2 specification
was designed for widespread compliance with existing systems, it is
recommended to assume there are 56 bits of randomness.

In case a system knowingly uses TraceIDs that do not conform to the
W3C Trace Context Level 2 specification and they wish to perform
sampling with OpenTelemetry components, they SHOULD synthesize a
random R-value and store it in the `tracestate` (Spans) or the
`sampling.randomness` (Log Records) attribute value.

### No definition for Scope and Resource attributes

We recognize that in some configurations, sampling probability and
even sampling randomness may be set to a constant value.

The `sampling.threshold` and `sampling.randomness` attributes are not
defined for use as Scope or Resource attributes in the present
specification, because there is an existing need to encode per-item
sampling probability, stemming from prioritization schemes.

## Span sampling attributes

No attributes are recognized for Spans. The `tracestate` fields should
be used to convey sampling for Span data.

## Logs sampling attributes

The following attributes are recognized for Logs.

<!-- semconv logs.sampling(full) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| [`sampling.randomness`](../attributes-registry/sampling.md) | string | The source of randomness for making probability sampling decisions, when it is not otherwise recorded. [1] | `ce929d0e0e4736` | Conditionally Required: [2] |
| [`sampling.threshold`](../attributes-registry/sampling.md) | string | Sampling probability as specified by OpenTelemetry. [3] | `c`; `ff8` | Conditionally Required: [4] |

**[1]:** This attribute is an optional way to express trace randomness, especially for cases where the TraceID is missing or known to be not random. Sampler components set and consume this value. The value is a hex-coded string containing 14 hex digits (56 bits) of randomness. Setting this attribute indicates the source of randomness that was used (and may be used again) for probability sampling. This field is taken to have the same meaning as the OpenTelemetry tracestate "R-value" for probability sampling, which is an alternative to deriving trace randomness from the TraceID. For details, see OTEP 235.

**[2]:** When a `sampling.threshold` is provided, the corresponding 56-bit randomness value is also recorded.

**[3]:** This attribute is set to convey sampling probability. Sampler components set and consume this value, which is taken to have the same meaning as the OpenTelemetry tracestate "T-value" for probability sampling. This attribute contains a hexadecimal-coded value containing 1 to 14 hex digits of precision, defining the threshold used to reject, depending on the random variable. This value can be converted into sampling probability. For details, see OTEP 235.

**[4]:** When a 56-bit consistent probability sampler is used.
<!-- endsemconv -->

## Examples

### Head sampling

For example, a span that was selected by a 25% probability sampler
using randomness from the TraceID, has selected field values like:

```
trace_id: 4bf92f3577b34da6a3ce929d0e0e4736
tracestate: ot=th:c
```

We can verify that the sampling decision was made correctly as follows.

The trailing 14 hex-digits of randomness are extracted from the
TraceID, forming the Randomness value `0xce929d0e0e4736`. The T-value
`c` is extended with 13 zeros, forming the Threshold value
`0xc0000000000000`. Since `T <= R` is true, the span was correctly sampled.

For a log record, which does not include the `tracestate` field, the
same can be expressed as:

```
trace_id: 4bf92f3577b34da6a3ce929d0e0e4736
attributes:
sampling.threshold: c
```

A log record that does not define the trace_id and was sampled by a
probability sampler requires explicit randomness. For example:

```
attributes:
sampling.threshold: c
sampling.randomness: ce929d0e0e4736
```

### Tail sampling

A span is received with no sampling information (i.e., no `tracestate`
field) is selected by a tail sampler at 10% probability. A
`tracestate` entry is created.

```
trace_id: 4bf92f3577b34da6a3fe929d0e0e4736
tracestate: ot=th:e66
```

A log record containing a TraceID is received with no sampling
attributes and is selected by a tail sampler at 10% probability. A
sampling threshold is inserted:

```
trace_id: 4bf92f3577b34da6a3fe929d0e0e4736
attributes:
sampling.threshold: e66
```

In both cases, the Threshold value e66 corresponds with rejecting a
fraction equal to `0xe66 / 0x1000` or 0.10009765625. Had 5 digits of
precision been used (`e6666`), the exact sampling probability would be
0.10000038147.

[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.26.0/specification/document-status.md
[OTEP235]: https://github.com/open-telemetry/oteps/blob/main/text/trace/0235-sampling-threshold-in-trace-state.md
11 changes: 11 additions & 0 deletions model/logs/sampling.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
groups:
- id: logs.sampling
type: attribute_group
brief: 'Semantic convention describing trace attributes related to sampling.'
attributes:
- ref: sampling.randomness
requirement_level:
conditionally_required: When a `sampling.threshold` is provided, the corresponding 56-bit randomness value is also recorded.
- ref: sampling.threshold
requirement_level:
conditionally_required: When a 56-bit consistent probability sampler is used.
37 changes: 37 additions & 0 deletions model/registry/sampling.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
groups:
- id: registry.sampling
type: attribute_group
brief: Semantic conventions describing Sampling for telemetry data.
prefix: sampling
attributes:
- id: randomness
type: string
brief: The source of randomness for making probability sampling decisions, when it is not otherwise recorded.
note: >
This attribute is an optional way to express trace
randomness, especially for cases where the TraceID is
missing or known to be not random. Sampler components set
and consume this value. The value is a hex-coded string
containing 14 hex digits (56 bits) of randomness. Setting
this attribute indicates the source of randomness that was
used (and may be used again) for probability sampling. This
field is taken to have the same meaning as the OpenTelemetry
tracestate "R-value" for probability sampling, which is an
alternative to deriving trace randomness from the TraceID.
For details, see OTEP 235.
examples: ["ce929d0e0e4736"]

- id: threshold
type: string
brief: Sampling probability as specified by OpenTelemetry.
note: >
This attribute is set to convey sampling probability.
Sampler components set and consume this value, which is
taken to have the same meaning as the OpenTelemetry
tracestate "T-value" for probability sampling. This
attribute contains a hexadecimal-coded value containing 1 to
14 hex digits of precision, defining the threshold used to
reject, depending on the random variable. This value can be
converted into sampling probability. For details, see OTEP
235.
examples: ["c", "ff8"]