Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New probabilistic sampler processor tests for legacy compatibility #32360

Merged
merged 3 commits into from
Apr 19, 2024

Conversation

jmacd
Copy link
Contributor

@jmacd jmacd commented Apr 12, 2024

Description: Adds new hard-coded test of the TraceID hashing function in the probabilisticsamplerprocessor. This will ensure that changes do not inadvertently modify the hashing function or associated logic for spans. Note that the Logs sampler logic includes a test with exact counts of sampled log records, which serves the same purpose.

Link to tracking Issue: #31918

Testing: This is a test added ahead of #31946, which refactors the hash-based decision but should not change its results.

Documentation: n/a

@jmacd jmacd requested a review from jpkrohling as a code owner April 12, 2024 21:46
@jmacd jmacd requested a review from a team April 12, 2024 21:46
@jmacd jmacd added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Apr 12, 2024
@github-actions github-actions bot added the processor/probabilisticsampler Probabilistic Sampler processor label Apr 12, 2024
@jmacd jmacd changed the title Jmacd/newtests New probabilistic sampler processor tests for legacy compatibility Apr 12, 2024
dmitryax pushed a commit that referenced this pull request Apr 16, 2024
**Description:** This component has only one maintainer, and I have
become very familiar with the component's code.


#32360

#31946

#31894

**Documentation:** n/a
@jpkrohling jpkrohling merged commit 38c77fb into open-telemetry:main Apr 19, 2024
170 checks passed
@github-actions github-actions bot added this to the next release milestone Apr 19, 2024
jpkrohling pushed a commit that referenced this pull request May 15, 2024
…ation, prepare for OTEP 235 support (#31946)

**Description:**

Refactors the probabilistic sampling processor to prepare it for more
OTEP 235 support.

This clarifies existing inconsistencies between tracing and logging
samplers, see the updated README. The tracing priority mechanism applies
a 0% or 100% sampling override (e.g., "1" implies 100% sampling),
whereas the logging sampling priority mechanism supports
variable-probability override (e.g., "1" implies 1% sampling).

This pins down cases where no randomness is available, and organizes the
code to improve readability. A new type called `randomnessNamer` carries
the randomness information (from the sampling pacakge) and a name of the
policy that derived it. When sampling priority causes the effective
sampling probability to change, the value "sampling.priority" replaces
the source of randomness, which is currently limited to "trace_id_hash"
or the name of the randomess-source attribute, for logs.

While working on #31894, I discovered that some inputs fall through to
the hash function with zero bytes of input randomness. The hash
function, computed on an empty input (for logs) or on 16 bytes of zeros
(which OTel calls an invalid trace ID), would produce a fixed random
value. So, for example, when logs are sampled and there is no TraceID
and there is no randomness attribute value, the result will be sampled
at approximately 82.9% and above.

In the refactored code, an error is returned when there is no input
randomness. A new boolean configuration field determines the outcome
when there is an error extracting randomness from an item of telemetry.
By default, items of telemetry with errors will not pass through the
sampler. When `FailClosed` is set to false, items of telemetry with
errors will pass through the sampler.

The original hash function, which uses 14 bits of information, is
structured as an "acceptance threshold", ultimately the test for
sampling translated into a positive decision when `Randomness <
AcceptThreshold`. In the OTEP 235 scheme, thresholds are rejection
thresholds--this PR modifies the original 14-bit accept threshold into a
56-bit reject threshold, using Threshold and Randomness types from the
sampling package. Reframed in this way, in the subsequent PR (i.e.,
#31894) the effective sampling probability will be seamlessly conveyed
using OTEP 235 semantic conventions.

Note, both traces and logs processors are now reduced to a function like
this:

```
				return commonSamplingLogic(
					ctx,
					l,
					lsp.sampler,
					lsp.failClosed,
					lsp.sampler.randomnessFromLogRecord,
					lsp.priorityFunc,
					"logs sampler",
					lsp.logger,
				)
```

which is a generic function that handles the common logic on a per-item
basis and ends in a single metric event. This structure makes it clear
how traces and logs are processed differently and have different
prioritization schemes, currently. This structure also makes it easy to
introduce new sampler modes, as shown in #31894. After this and #31940
merge, the changes in #31894 will be relatively simple to review as the
third part in a series.

**Link to tracking Issue:**

Depends on #31940.
Part of #31918.

**Testing:** Added. Existing tests already cover the exact random
behavior of the current hashing mechanism. Even more testing will be
introduced with the last step of this series. Note that
#32360
is added ahead of this test to ensure refactoring does not change
results.

**Documentation:** Added.

---------

Co-authored-by: Kent Quirk <kentquirk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/probabilisticsampler Probabilistic Sampler processor Skip Changelog PRs that do not require a CHANGELOG.md entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants