Adjust Cardinality Limit to Accommodate Internal Reserves #5382

rajkumar-rangaraj · 2024-02-23T01:25:57Z

Changes

Currently, From the cardinality limit, we internally reserve one spot for zero tags and another spot for overflow if the emit overflow experimental flag is enabled. A few customers have raised concerns that they receive 1 or 2 dimensions less than the specified cardinality limit. For example, consider a scenario where a user sets the cardinality limit to 10. If the emit overflow experimental flag is enabled, we currently set aside 2 of those spots—one for zero tags and one for overflow cases. This means the user effectively gets to use only 8 of their 10 slots for their dimensions.

This proposed change aims to address concerns regarding the effective reduction of the cardinality limit available to users. The key concept here is that the cardinality limit should apply exclusively to metrics that include dimensions, treating metrics with zero dimensions and overflow metrics as special cases that are exempt from the user's specified limit. Currently, the emit overflow attribute functionality is behind an experimental flag, but it is planned for incorporation into the stable SDK. To simplify future adjustments and to provide users with the full extent of their specified limits, we propose adding two additional spots on top of the cardinality limit.

Note: From the user's perspective, it is important to understand that the cardinality limit they set is intended to apply only to metrics with dimensions. This ensures users can fully utilize their specified limit for dimensioned metrics, without having to factor in special cases into their limit calculations.

Example:

Imagine a user sets a cardinality limit of 10, expecting to track metrics across up to 10 unique dimension combinations. Under the proposed change, to ensure the user's limit is fully applicable to dimensioned metrics, we would internally adjust the limit to 12. This adjustment accounts for one spot reserved for metrics with zero dimensions and another for overflow metrics, ensuring that the user has a full quota of 10 slots available for their dimensioned metrics. This way, regardless of any special cases, the user effectively retains the full capacity of their specified cardinality limit for metrics that include dimensions.

Merge requirement checklist

CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
Unit tests added/updated
[?] Appropriate CHANGELOG.md files updated for non-trivial changes

…specified limit

codecov · 2024-02-23T01:31:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.65%. Comparing base (6250307) to head (8f7e799).
Report is 116 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5382      +/-   ##
==========================================
+ Coverage   83.38%   84.65%   +1.27%     
==========================================
  Files         297      281      -16     
  Lines       12531    12098     -433     
==========================================
- Hits        10449    10242     -207     
+ Misses       2082     1856     -226

Flag	Coverage Δ
unittests	`?`
unittests-Solution-Experimental	`84.64% <100.00%> (?)`
unittests-Solution-Stable	`84.39% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
src/OpenTelemetry/Metrics/AggregatorStore.cs	`85.02% <100.00%> (+4.63%)`	⬆️
src/OpenTelemetry/Metrics/MetricReaderExt.cs	`91.26% <100.00%> (+1.26%)`	⬆️
...OpenTelemetry/Metrics/MetricStreamConfiguration.cs	`88.46% <ø> (+13.46%)`	⬆️

... and 54 files with indirect coverage changes

src/OpenTelemetry/Metrics/AggregatorStore.cs

reyang · 2024-02-23T17:10:22Z

I feel the key here is less about dealing with off-by-one (or off-by-two), it is more about user experience/education/expectation. @rajkumar-rangaraj could you update

opentelemetry-dotnet/src/OpenTelemetry/Metrics/MetricStreamConfiguration.cs

Lines 103 to 114 in f5b7f9c

    
           /// Gets or sets a positive integer value defining the maximum number of 
        
           /// data points allowed for the metric managed by the view. 
        
           /// </summary> 
        
           /// <remarks> 
        
           /// <para><b>WARNING</b>: This is an experimental API which might change or 
        
           /// be removed in the future. Use at your own risk.</para> 
        
           /// <para>Spec reference: <see 
        
           /// href="https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#cardinality-limits">Cardinality 
        
           /// limits</see>.</para> 
        
           /// Note: If not set the default MeterProvider cardinality limit of 2000 
        
           /// will apply. 
        
           /// </remarks>

and the doc in this PR so we can align on what exactly do we want the users to understand/do before making the code change?

rajkumar-rangaraj · 2024-02-27T06:43:27Z

the doc in this PR so we can align on what exactly do we want the users to understand/do before making the code change?

Could you please review the updated PR description? If you agree with the changes, I will proceed to update the code comments.

reyang · 2024-02-27T15:55:20Z

the doc in this PR so we can align on what exactly do we want the users to understand/do before making the code change?

Could you please review the updated PR description? If you agree with the changes, I will proceed to update the code comments.

The direction looks good to me 👍

cijothomas · 2024-02-27T16:11:16Z

@rajkumar-rangaraj
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#overflow-attribute

The SDK MUST create an Aggregator with the overflow attribute set prior to reaching the cardinality limit and use it to aggregate events for which the correct Aggregator could not be created. The maximum number of distinct, non-overflow attributes is one less than the limit, as a result.

If we were to stick with the spec wording (not yet stable, still..), then we should count the overflow attribute as contributing to the overall limit. Maybe just do the zero tag case in this PR, and wait for the spec to finalize.

utpilla · 2024-02-27T19:28:57Z

If we were to stick with the spec wording (not yet stable, still..), then we should count the overflow attribute as contributing to the overall limit. Maybe just do the zero tag case in this PR, and wait for the spec to finalize.

SDKs SHOULD support being configured with a cardinality limit. The number of unique combinations of attributes is called cardinality.

If we were to stick with spec wording, then we should not treat zero tag as a special case either. That should also contribute to the overall limit.

cijothomas · 2024-02-27T19:33:01Z

If we were to stick with the spec wording (not yet stable, still..), then we should count the overflow attribute as contributing to the overall limit. Maybe just do the zero tag case in this PR, and wait for the spec to finalize.

SDKs SHOULD support being configured with a cardinality limit. The number of unique combinations of attributes is called cardinality.

If we were to stick with spec wording, then we should not treat zero tag as a special case either. That should also contribute to the overall limit.

Spec is not saying anything about zero tags, but spec explicitly talks about overflow contributing to the overall limit.

utpilla · 2024-02-27T19:39:09Z

Spec is not saying anything about zero tags, but spec explicitly talks about overflow contributing to the overall limit.

Spec says "The number of unique combinations of attributes is called cardinality."

A measurement with zero tags qualifies as a unique combination.

cijothomas · 2024-02-27T19:57:24Z

Spec is not saying anything about zero tags, but spec explicitly talks about overflow contributing to the overall limit.

Spec says "The number of unique combinations of attributes is called cardinality."

A measurement with zero tags qualifies as a unique combination.

Got it. Our only option is then to break the spec compliance and/or influence the spec to soften the wording there, to allows us to special case. After all, what we are doing is aligned with the OTel mission.

Kielek · 2024-02-28T05:34:10Z

Spec is not saying anything about zero tags, but spec explicitly talks about overflow contributing to the overall limit.

Spec says "The number of unique combinations of attributes is called cardinality."
A measurement with zero tags qualifies as a unique combination.

Got it. Our only option is then to break the spec compliance and/or influence the spec to soften the wording there, to allows us to special case. After all, what we are doing is aligned with the OTel mission.

+1 to adjust specification. We should avoid any differences where possible. Especially if the part of the specification is still unstable and can be easily modified.

cijothomas · 2024-02-28T17:09:49Z

Spec is not saying anything about zero tags, but spec explicitly talks about overflow contributing to the overall limit.

Spec says "The number of unique combinations of attributes is called cardinality."
A measurement with zero tags qualifies as a unique combination.

Got it. Our only option is then to break the spec compliance and/or influence the spec to soften the wording there, to allows us to special case. After all, what we are doing is aligned with the OTel mission.

open-telemetry/opentelemetry-specification#3904 (comment) should give us what we need.

src/OpenTelemetry/Metrics/AggregatorStore.cs

CodeBlanch · 2024-03-04T20:01:05Z

@rajkumar-rangaraj

Hey FYI we have this block of code currently:

opentelemetry-dotnet/src/OpenTelemetry/Metrics/MetricReaderExt.cs

Lines 183 to 190 in 28ead76

    
           if (isEmitOverflowAttributeKeySet) 
        
           { 
        
               // We need at least two metric points. One is reserved for zero tags and the other one for overflow attribute 
        
               if (cardinalityLimit > 1) 
        
               { 
        
                   this.emitOverflowAttribute = true; 
        
               } 
        
           }

If we are going to auto-expand whatever the user chooses for cardinalityLimit you can probably adjust that to just...

-        if (isEmitOverflowAttributeKeySet)
-        {
-            // We need at least two metric points. One is reserved for zero tags and the other one for overflow attribute
-            if (cardinalityLimit > 1)
-            {
-                this.emitOverflowAttribute = true;
-            }
-        }
+        this.emitOverflowAttribute = isEmitOverflowAttributeKeySet;

src/OpenTelemetry/Metrics/AggregatorStore.cs

utpilla · 2024-03-07T01:48:43Z

test/OpenTelemetry.Tests/Metrics/MetricApiTestsBase.cs

@@ -1429,7 +1431,7 @@ int MetricPointCount()
        }

        meterProvider.ForceFlush(MaxTimeToAllowForFlush);
-        Assert.Equal(MeterProviderBuilderSdk.DefaultCardinalityLimit, MetricPointCount());
+        Assert.Equal(MeterProviderBuilderSdk.DefaultCardinalityLimit + additionalReserve, MetricPointCount());


This test should verify that we don't allow the user to exceed the cardinality cap. The right thing to test here now would be to check that count of metric points exported (excluding zero tags and overflow attribute) is equal to MeterProviderBuilderSdk.DefaultCardinalityLimit. We probably need to update MetricPointCount() method to return the count of unreserved MetricPoints instead of all the MetricPoints that were exported in that particular Collect cycle.

The overflow is guarded with an experimental flag. In cases where the overflow experimental attribute is not set, the value will be the cardinality limit + 1.

It looks like you have now updated MetricPointCount() method definition to account for that.

test/OpenTelemetry.Tests/Metrics/MetricOverflowAttributeTestsBase.cs

rajkumar-rangaraj · 2024-03-07T03:00:40Z

Thanks @utpilla for the very detailed review. I appreciate the time you spent on the quality review. I’ve sent an update, please take a look when you get a chance.

test/OpenTelemetry.Tests/Metrics/MetricOverflowAttributeTestsBase.cs

src/OpenTelemetry/Metrics/AggregatorStore.cs

test/OpenTelemetry.Tests/Metrics/MetricOverflowAttributeTestsBase.cs

rajkumar-rangaraj added 2 commits February 22, 2024 17:18

Enhance cardinality handling by adding reserve space to default/user-…

0cff860

…specified limit

merge changes from main.

0837ecd

rajkumar-rangaraj requested a review from a team February 23, 2024 01:25

cijothomas reviewed Feb 23, 2024

View reviewed changes

src/OpenTelemetry/Metrics/AggregatorStore.cs Outdated Show resolved Hide resolved

Fix message in comment.

cda0a5d

CodeBlanch reviewed Feb 23, 2024

View reviewed changes

src/OpenTelemetry/Metrics/AggregatorStore.cs Outdated Show resolved Hide resolved

rajkumar-rangaraj added 2 commits February 27, 2024 12:27

Update code comment on MetricStreamConfiguration's CardinalityLimit.

75050e9

Resolve merge conflicts

d248fb6

reyang reviewed Mar 1, 2024

View reviewed changes

src/OpenTelemetry/Metrics/AggregatorStore.cs Outdated Show resolved Hide resolved

reyang reviewed Mar 1, 2024

View reviewed changes

src/OpenTelemetry/Metrics/AggregatorStore.cs Outdated Show resolved Hide resolved

reyang approved these changes Mar 1, 2024

View reviewed changes

cijothomas reviewed Mar 1, 2024

View reviewed changes

src/OpenTelemetry/Metrics/AggregatorStore.cs Outdated Show resolved Hide resolved

rajkumar-rangaraj added 2 commits March 1, 2024 14:27

pr feedback, rename varaible and drop constant.

18a8fc9

fix tests

2ab0efb

reyang approved these changes Mar 1, 2024

View reviewed changes

Merge branch 'main' into rajrang/enhance-cardinality-limit

153e0f0

rajkumar-rangaraj added 2 commits March 4, 2024 14:32

Remove check in isEmitOverflowAttributeKeySet in MetricReaderExt

910d921

Merge changes from main

fe7551a

CodeBlanch and others added 2 commits March 5, 2024 15:34

Merge from main.

bffc21f

Merge branch 'main' into rajrang/enhance-cardinality-limit

a09fcfc