Tail sampling processor "count_traces_sampled" metric is inaccurate #25882

lbogdan · 2023-08-18T11:57:10Z

Component(s)

processor/tailsampling

Describe the issue you're reporting

It looks like the "count_traces_sampled" metric of the tail sampling processor is inaccurate, or at least it's not calculated how I would expect.

Specifically, given two policies named "policy1" and "policy2", the metric values for {policy="policy1",sampled="false"} and {policy="policy2",sampled="false"}, and those for {policy="policy1",sampled="true"} and {policy="policy2",sampled="true"} are always equal, although I would expect them to be different, for different policies.

Indeed, looking at the code, they are all calculated based on finalDecision, and not each policy's decision:

opentelemetry-collector-contrib/processor/tailsamplingprocessor/processor.go

Lines 264 to 288 in 06f1f57

    
           for _, p := range tsp.policies { 
        
           	switch finalDecision { 
        
           	case sampling.Sampled: 
        
           		// any single policy that decides to sample will cause the decision to be sampled 
        
           		// the nextConsumer will get the context from the first matching policy 
        
           		if matchingPolicy == nil { 
        
           			matchingPolicy = p 
        
           		} 
        
           		_ = stats.RecordWithTags( 
        
           			p.ctx, 
        
           			[]tag.Mutator{tag.Upsert(tagSampledKey, "true")}, 
        
           			statCountTracesSampled.M(int64(1)), 
        
           		) 
        
           		metrics.decisionSampled++ 
        
           	case sampling.NotSampled: 
        
           		_ = stats.RecordWithTags( 
        
           			p.ctx, 
        
           			[]tag.Mutator{tag.Upsert(tagSampledKey, "false")}, 
        
           			statCountTracesSampled.M(int64(1)), 
        
           		) 
        
           		metrics.decisionNotSampled++ 
        
           	} 
        
           }

Why I think that's a bug, and not intended behavior, is that per-policy decisions are actually stored in trace.Decisions[i] during evaluation, but then never used anywhere (that I could find).

So I think

	for _, p := range tsp.policies {
		switch finalDecision {

should actually be

	for i, p := range tsp.policies {
		switch trace.Decisions[i] {

The text was updated successfully, but these errors were encountered:

github-actions · 2023-08-18T11:57:28Z

Pinging code owners:

processor/tailsampling: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jpkrohling · 2023-08-22T09:13:18Z

Sounds reasonable, would you be open to sending in a PR?

lbogdan · 2023-08-24T21:23:11Z

Sure, I'll get to it this weekend!

jmsnll · 2023-09-13T11:21:01Z

How is the count_traces_sampled metric meant to be interpreted for each policy? Would you expect each "sampling decision" to only be recorded once and by the policy that resulted in the trace being sampled? Or recorded multiple times for each policy that would have resulted in the trace being sampled?

For example with two probabilistic policies like so, assuming perfect distribution:

processors:
  tail_sampling:
    policies:
      - name: 50-percent-probabilistic
        type: probabilistic
        probabilistic:
          sampling_percentage: 50
      - name: 20-percent-probabilistic
        type: probabilistic
        probabilistic:
          sampling_percentage: 20

If 100 traces passed through, would you expect the metric counts to read like so with each decision only counted once?

otelcol_processor_tail_sampling_count_traces_sampled{policy="50-percent-probabilistic",sampled="false"} 50
otelcol_processor_tail_sampling_count_traces_sampled{policy="50-percent-probabilistic",sampled="true"} 50
otelcol_processor_tail_sampling_count_traces_sampled{policy="20-percent-probabilistic",sampled="false"} 40
otelcol_processor_tail_sampling_count_traces_sampled{policy="20-percent-probabilistic",sampled="true"} 10

Or multiple times?

otelcol_processor_tail_sampling_count_traces_sampled{policy="50-percent-probabilistic",sampled="false"} 50
otelcol_processor_tail_sampling_count_traces_sampled{policy="50-percent-probabilistic",sampled="true"} 50
otelcol_processor_tail_sampling_count_traces_sampled{policy="20-percent-probabilistic",sampled="false"} 80
otelcol_processor_tail_sampling_count_traces_sampled{policy="20-percent-probabilistic",sampled="true"} 20

lbogdan · 2023-09-14T09:33:11Z

With my suggested code change, it would be the latter - every decision would increment every policy, either the sampled="true" or the sampled="false" metric, depending on the outcome.

Also, thinking about it, a total metric (without a policy tag, or with policy="final-decision" or similar) would be helpful, too, as this cannot be inferred from the existing metrics. Currently all policy metrics equal this total metric.

edwintye · 2023-09-14T16:56:20Z

While we (me and @jmsnll) were trying to work out a solution internally (to see the various output of the metric), we realized a potential bug in that the tail sampler does not check for uniqueness in the policy name. More concretely, if we use a config as below, where someone accidentally use identical name for two different policies

receivers:
  otlp:
    protocols:
      grpc:
processors:
  tail_sampling:
    decision_wait: 5s
    policies:
      - name: status-code-policy
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: status-code-policy
        type: probabilistic
        probabilistic:
          sampling_percentage: 50
exporters:
  logging:
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling]
      exporters: [logging]

then firing off 100 traces with STATUS_CODE_UNSET result in

otelcol_processor_tail_sampling_count_traces_sampled{policy="status-code-policy",sampled="true",service_name="otelcontribcol",service_version="0.84.0-dev"} 49

which is true as per the config but deceiving to anyone without looking at the config. As the label is taken from the policy context in stats.RecordWithTags and there is no check during the initial unmarshal, we can in theory give every single policy the same name which result in the metric being incorrect. This issue with non-unique policy name remains regardless of whether this issue is fixed (and may have wider impact).

One would argue that no engineer should make the mistake of having non-unique policy name. However, if a new (silence) tag say final-decision is introduced, it may clash with a config that is using that exact policy name final-decision. I am not sure how likely this scenario is but I thought it is worth flagging as we have already made that mistake ourselves.

jpkrohling · 2023-09-20T08:42:08Z

bug in that the tail sampler does not check for uniqueness in the policy name

😱 , I'll send a quick PR to get that fixed. Thanks for pointing this out.

One would argue that no engineer should make the mistake of having non-unique policy name.

That should not be an assumption.

…ampled` metric (#26724) **Description:** Previously, the calculation of `count_traces_sampled` was incorrect, resulting in all per-policy metric counts being the same. ``` count_traces_sampled{policy="policy-a", sampled="false"} 50 count_traces_sampled{policy="policy-a", sampled="true"} 50 count_traces_sampled{policy="policy-b", sampled="false"} 50 count_traces_sampled{policy="policy-b", sampled="true"} 50 ``` This issue stemmed from the metrics being incremented based on the final decision of trace sampling, rather than being attributed to the policies causing the sampling. With the fix implemented, the total sampling count can no longer be inferred from the metrics alone. To address this, a new metric, `global_count_traces_sampled`, was introduced to accurately capture the global count of sampled traces. ``` count_traces_sampled{policy="policy-a", sampled="false"} 50 count_traces_sampled{policy="policy-a", sampled="true"} 50 count_traces_sampled{policy="policy-b", sampled="false"} 24 count_traces_sampled{policy="policy-b", sampled="true"} 76 global_count_traces_sampled{sampled="false"} 24 global_count_traces_sampled{sampled="true"} 76 ``` Reusing the `count_traces_sampled` policy metric for this purpose would have either meant: 1. Leaving the policy field empty, which didn't feel very explicit. 2. Setting a global placeholder policy name (such as `final-decision`), which could then potentially collide with existing user-configured policy names without validation. **Link to tracking Issue:** Fixes #25882 **Testing:** Tested with various combinations of `probabilistic`, `latency`, `status_code` and attribute policies (including combinations of `and` and `composite` policies). No unit tests were added, open to suggestions for adding tests for metrics but I couldn't find any examples of it being done elsewhere. **Documentation:** No documentation changes.

…ampled` metric (open-telemetry#26724) **Description:** Previously, the calculation of `count_traces_sampled` was incorrect, resulting in all per-policy metric counts being the same. ``` count_traces_sampled{policy="policy-a", sampled="false"} 50 count_traces_sampled{policy="policy-a", sampled="true"} 50 count_traces_sampled{policy="policy-b", sampled="false"} 50 count_traces_sampled{policy="policy-b", sampled="true"} 50 ``` This issue stemmed from the metrics being incremented based on the final decision of trace sampling, rather than being attributed to the policies causing the sampling. With the fix implemented, the total sampling count can no longer be inferred from the metrics alone. To address this, a new metric, `global_count_traces_sampled`, was introduced to accurately capture the global count of sampled traces. ``` count_traces_sampled{policy="policy-a", sampled="false"} 50 count_traces_sampled{policy="policy-a", sampled="true"} 50 count_traces_sampled{policy="policy-b", sampled="false"} 24 count_traces_sampled{policy="policy-b", sampled="true"} 76 global_count_traces_sampled{sampled="false"} 24 global_count_traces_sampled{sampled="true"} 76 ``` Reusing the `count_traces_sampled` policy metric for this purpose would have either meant: 1. Leaving the policy field empty, which didn't feel very explicit. 2. Setting a global placeholder policy name (such as `final-decision`), which could then potentially collide with existing user-configured policy names without validation. **Link to tracking Issue:** Fixes open-telemetry#25882 **Testing:** Tested with various combinations of `probabilistic`, `latency`, `status_code` and attribute policies (including combinations of `and` and `composite` policies). No unit tests were added, open to suggestions for adding tests for metrics but I couldn't find any examples of it being done elsewhere. **Documentation:** No documentation changes.

lbogdan added the needs triage New item requiring triage label Aug 18, 2023

github-actions bot added the processor/tailsampling Tail sampling processor label Aug 18, 2023

bryan-aguilar added help wanted Extra attention is needed priority:p2 Medium and removed needs triage New item requiring triage labels Aug 23, 2023

This was referenced Sep 18, 2023

[processor/tailsamplingprocessor] Improve accuracy of count_traces_sampled metric #26724

Merged

[processor/tailsamplingprocessor] config allows duplicate policy names resulting in metrics collision #26726

Closed

jpkrohling mentioned this issue Sep 20, 2023

Duplicate policy names should yield an error #27016

Closed

jpkrohling closed this as completed in #26724 Sep 27, 2023

0x006EA1E5 mentioned this issue Oct 11, 2023

metric otelcol_processor_tail_sampling_count_traces_sampled showing wrong values based on policy, sampled labels #27567

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tail sampling processor "count_traces_sampled" metric is inaccurate #25882

Tail sampling processor "count_traces_sampled" metric is inaccurate #25882

lbogdan commented Aug 18, 2023 •

edited

Loading

github-actions bot commented Aug 18, 2023

jpkrohling commented Aug 22, 2023

lbogdan commented Aug 24, 2023

jmsnll commented Sep 13, 2023

lbogdan commented Sep 14, 2023 •

edited

Loading

edwintye commented Sep 14, 2023

jpkrohling commented Sep 20, 2023

Tail sampling processor "count_traces_sampled" metric is inaccurate #25882

Tail sampling processor "count_traces_sampled" metric is inaccurate #25882

Comments

lbogdan commented Aug 18, 2023 • edited Loading

Component(s)

Describe the issue you're reporting

github-actions bot commented Aug 18, 2023

jpkrohling commented Aug 22, 2023

lbogdan commented Aug 24, 2023

jmsnll commented Sep 13, 2023

lbogdan commented Sep 14, 2023 • edited Loading

edwintye commented Sep 14, 2023

jpkrohling commented Sep 20, 2023

lbogdan commented Aug 18, 2023 •

edited

Loading

lbogdan commented Sep 14, 2023 •

edited

Loading