How is the cardinality limit applied when attributes are being filtered? #3798

MrAlias · 2023-12-19T21:42:49Z

When a view is applied to a metric pipeline that contains an attribute filter, how should the cardinality limit be applied? Prior to filtering attributes, or post?

The answer we come up with will affect the output of user data. For example, if measurements for the following attributes are made:

{path: "/", code: 200, query: ""}
{path: "/", code: 400, query: "user=bob"}
{path: "/admin", code: 200, query: ""}

If an attribute filter is applied so that only the path attribute is retained and cardinality limit of 3 is set, if the filtering is applied prior to checking the cardinality limit the following attributes will be kept on the output metric streams:

{path: "/"}
{path: "admin"}

However, if the cardinality limit is applied prior to filtering the following attributes will be kept on the output metric streams:

{path: "/"}
{otel.metric.overflow: true}

Filter before limiting

Given the cardinality limit feature was added to limit the number of resources and SDK uses during measurements, if filtering is to be applied prior to the cardinality limit being applied it will need to be done in the "hot path" of making a measurement.

Pros:

The correct number of attributes (<= cardinality limit) are ensured to be on the output metric streams

Cons:

Requires filtering for every measurement made

Limit before filtering

Limiting without doing any filtering means the filtering process can be delayed to the collection of metric streams. This is a substantial performance issue given the filtering process will need to run at most M times (for M being the number of distinct attributes recorded) rather than N times (for N being the number of measurements made).

Pros:

Performance on the "hot path" is not impacted by filtering

Cons

It is possible that there will be less than the cardinality limit many distinct attributes exported on the output metric streams (possibly even just otel.metric.overflow attribute in some cases)

The text was updated successfully, but these errors were encountered:

jsuereth · 2023-12-20T15:40:08Z

The limit is intended to be against the amount of in memory storage you have. Given that, I believe the filtering MUST be done on your hot-path. The intention is to avoid Denial-of-Service attacks via high cardinality inputs that impact metrics (as many users will attach attributes that come from requests, even our semconv recommend this).

I don't think we have any rules about forcing filtering to be in the hot path or afterwards.

To me this means you have to limit before you hit any in-memory storage. Whether this means, in Go, you start doing filtering in the hot path, is up to you. I'd argue that the con of having less cardinality than your limit in some cases is better than the alternative of running out of memory :)

MrAlias · 2023-12-20T15:47:13Z

Given that, I believe the filtering MUST be done on your hot-path.

@jsuereth did you mean "Given that, I believe the [limiting] MUST be done on your hot-path"? Not necessarily the filtering, right?

MrAlias · 2023-12-20T15:47:46Z

To me this means you have to limit before you hit any in-memory storage. Whether this means, in Go, you start doing filtering in the hot path, is up to you. I'd argue that the con of having less cardinality than your limit in some cases is better than the alternative of running out of memory :)

👍

jsuereth · 2023-12-20T15:59:19Z

@jsuereth did you mean "Given that, I believe the [limiting] MUST be done on your hot-path"? Not necessarily the filtering, right?

Yep! Limiting MUST be on hot path, you can pick how you filter for best UX + performance.

MrAlias · 2023-12-20T16:08:15Z

@jsuereth what are you thoughts on potential inconsistencies across languages here?

If one implementation filters prior to the limiting and another filters afterwards they could produce different outputs. Is consistency here going to be an issue?

jsuereth · 2023-12-20T16:25:43Z

If one implementation filters prior to the limiting and another filters afterwards they could produce different outputs. Is consistency here going to be an issue?

I think regarding cardinality limits we're talking about error scenarios and worst-case behavior. We already have a lot of inconssitencies in how failures are handled due to runtime limitations. We try to be consistent, but when it comes to extraordinary/error scenarios, I think some inconsistencies between SDKs is ok.

jsuereth · 2023-12-20T16:26:57Z

Another way to phrase it -> I think users, if given a choice, would prefer lower o11y overhead per-language over perfect consistency.

jack-berg · 2023-12-20T18:21:00Z

Java checks whether the cardinality exceeds the limit after attribute filtering has occurred. I think this is the right behavior because its the least surprising to users, e.g. its surprising if I set my cardinality limit to 100 but only see 20 series. Both the attribute filter and cardinality limit are mechanisms to manage cardinality. The should work together and not be at odds with each other.

MrAlias · 2023-12-21T00:14:59Z

Java checks whether the cardinality exceeds the limit after attribute filtering has occurred. I think this is the right behavior because its the least surprising to users, e.g. its surprising if I set my cardinality limit to 100 but only see 20 series. Both the attribute filter and cardinality limit are mechanisms to manage cardinality. The should work together and not be at odds with each other.

Doesn't this mean that Java needs to filter every measurement the user makes though? And do so in the "hot path" of telemetry recording?

jack-berg · 2023-12-21T00:27:06Z

Doesn't this mean that Java needs to filter every measurement the user makes though? And do so in the "hot path" of telemetry recording?

Yes.

Resolve open-telemetry#3798

reyang · 2024-02-27T00:03:13Z

I've clarified the SDK cardinality limit in #3856. The spec now says "For a given metric, the cardinality limit is a hard limit on the number of metric points that can be collected during a collection cycle". I also sent another editorial PR to clean up metric points #3906.

@MrAlias Do you think this issue can be marked as resolved? (I've provided more info regarding how I want to address a set of problems in #3866).

austinlparker · 2024-04-30T20:36:35Z

@MrAlias is this complete?

cijothomas · 2024-09-24T17:28:52Z

I think this can be closed with the conclusion that Filtering of attributes should be done before applying limits. This requires filtering of attributes in hot path. Some languages may vary in implementation (for perf reasons), but that is considered okay.

reyang · 2024-09-25T05:32:45Z

@MrAlias please let me know if you want to reopen it. I haven't seen an update from you after #3798 (comment).

MrAlias · 2024-09-25T13:39:13Z

I think this can be closed with the conclusion that Filtering of attributes should be done before applying limits. This requires filtering of attributes in hot path. Some languages may vary in implementation (for perf reasons), but that is considered okay.

The conclusion that @jsuereth mentioned above is different than this.

This issue may need to be reopened and a change made to the specification if there is still confusion about the permissiveness of what is allowed.

cijothomas · 2024-09-25T16:58:18Z

I think this can be closed with the conclusion that Filtering of attributes should be done before applying limits. This requires filtering of attributes in hot path. Some languages may vary in implementation (for perf reasons), but that is considered okay.

The conclusion that @jsuereth mentioned above is different than this.

This issue may need to be reopened and a change made to the specification if there is still confusion about the permissiveness of what is allowed.

What part do you see misalignment? Is that one of the below, or different?

Filtering of attributes should be done before applying limits
OR
Depending on implementation, some languages maybe deviating from this, and that is okay

(Separately, I am curious to learn more on why filtering in hot-path is considered expensive. If unwanted tags are not dropped in hot path early enough, then one has to process entire set of incoming attributes (sort, de-dup,hash-lookup, all of which have a cost proportional to the number of attributes), and hence it would be less performant...So it feels counter-intuitive how filtering in hot path is more expensive.

We have done some optimizations in OTel .NET to do that (https://github.com/open-telemetry/opentelemetry-dotnet/pull/3864/files/3f47828869a6bc42478bd5468b864382d147cff1#r1013461213), and I am in need of implementing the same in a performant way in OTel Rust too, so want to check all possible ways to get high perf

We can chat about this offline if that is easier.
)

MrAlias · 2024-09-25T17:05:19Z

I mistook your statement as meaning there was a normative recommendation to filter prior to applying limits. I think if you are making a non-normative recommendation there, then things look aligned. If this is supposed to be normative, I think we need to codify that in the specification.

MrAlias · 2024-09-25T17:07:13Z

If unwanted tags are not dropped in hot path early enough, then one has to process entire set of incoming attributes (sort, de-dup,hash-lookup, all of which have a cost proportional to the number of attributes)

This processing is done outside of the hot-path. It is not performed when the attribute/span is recorded. If designed correctly this can be done concurrently while the API continues to record.

cijothomas · 2024-09-25T17:32:22Z

I mistook your statement as meaning there was a normative recommendation to filter prior to applying limits. I think if you are making a non-normative recommendation there, then things look aligned. If this is supposed to be normative, I think we need to codify that in the specification.

Got it. Do you think its best if we make a normative statement in the spec like "If SDK is enforcing cardinality limits, it SHOULD be done after filtering of attributes, if any".

(It felt obvious to me that filtering should be done prior to cardinality capping (maybe because I was only thinking from .NET implementation!) but based on this issue/discussion, it may be worth adding it to spec!)

MrAlias · 2024-09-25T17:55:33Z

I mistook your statement as meaning there was a normative recommendation to filter prior to applying limits. I think if you are making a non-normative recommendation there, then things look aligned. If this is supposed to be normative, I think we need to codify that in the specification.

Got it. Do you think its best if we make a normative statement in the spec like "If SDK is enforcing cardinality limits, it SHOULD be done after filtering of attributes, if any".

(It felt obvious to me that filtering should be done prior to cardinality capping (maybe because I was only thinking from .NET implementation!) but based on this issue/discussion, it may be worth adding it to spec!)

I'm fine either way.

The Go SIG was taking the comments in this issue and the lack of specification language to mean there was a decided and intentional ambiguity. If we want to change that and make a normative recommendation, that would be fine as well. The Go SIG would need to evaluate more objectively the trade-offs and decide if we would want to comply with the recommendation.

I support unifying behavior, but I also support @jsuereth's point that this is an error situation and ambiguity here is also fine.

trask · 2024-09-25T19:32:03Z

Re-opening since seeing so many comments on a closed issue gives me anxiety 😅.

trask · 2024-09-25T20:22:48Z

I would personally prefer this:

make a normative statement in the spec like "If SDK is enforcing cardinality limits, it SHOULD be done after filtering of attributes, if any"

e.g. we're using this behavior in Java instrumentation: #3546 (comment)

I also think it gives users the cleanest way to recover from a cardinality limit error condition, by dropping a (high-cardinality) attribute via metric view and getting full fidelity of all the remaining attributes

cijothomas · 2024-09-25T20:43:27Z

I'll send a PR with that wording, and then close this issue with the PR. (assuming PR gets approved by all!)

reyang · 2024-09-26T00:40:12Z

I'm fine if we want to have additional clarification in the spec.
Do you think there will be confusion at all? I don't see the confusion as we have this statement in the spec "... a hard limit on the number of metric points..." after #3856. Only attributes that are not filtered out will make to metric points during aggregation. @MrAlias do you see any risk that people will interpret it in the opposite way?

MrAlias added the spec:metrics Related to the specification/metrics directory label Dec 19, 2023

github-actions bot assigned jmacd Dec 19, 2023

MrAlias mentioned this issue Dec 19, 2023

Investigate if attribute filtering should be in the instrument or aggregator open-telemetry/opentelemetry-go#3011

Open

reyang added the [label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR label Dec 20, 2023

reyang assigned reyang and unassigned jmacd Dec 20, 2023

MrAlias added a commit to MrAlias/opentelemetry-specification that referenced this issue Jan 4, 2024

Specify limiting and filtering interaction

9ffa45a

Resolve open-telemetry#3798

This was referenced Jan 4, 2024

Specify cardinality limiting and attribute filtering interaction #3803

Closed

Split collection limit out of cardinality limit #3813

Open

cijothomas mentioned this issue Feb 26, 2024

Stabilize Overflow attribute section under Cardinality Limits #3904

Closed

austinlparker added triage:accepted:ready-with-sponsor Ready to be implemented and has a specification sponsor assigned and removed [label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR labels Apr 30, 2024

austinlparker added this to 🔭 Main Backlog Jul 16, 2024

austinlparker moved this to Spec - Accepted in 🔭 Main Backlog Jul 16, 2024

austinlparker moved this from Spec - Accepted to Spec - In Progress in 🔭 Main Backlog Jul 16, 2024

reyang closed this as completed Sep 25, 2024

trask reopened this Sep 25, 2024

cijothomas assigned cijothomas and unassigned reyang Sep 25, 2024

cijothomas mentioned this issue Sep 25, 2024

Clarify cardinality capping should be after filtering #4228

Merged

5 tasks

reyang mentioned this issue Sep 26, 2024

Mark cardinality limits as Stable #4222

Merged

5 tasks

reyang closed this as completed in #4228 Sep 27, 2024

austinlparker moved this from Spec - In Progress to Spec - Closed in 🔭 Main Backlog Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is the cardinality limit applied when attributes are being filtered? #3798

How is the cardinality limit applied when attributes are being filtered? #3798

MrAlias commented Dec 19, 2023 •

edited

Loading

jsuereth commented Dec 20, 2023

MrAlias commented Dec 20, 2023

MrAlias commented Dec 20, 2023

jsuereth commented Dec 20, 2023

MrAlias commented Dec 20, 2023 •

edited

Loading

jsuereth commented Dec 20, 2023

jsuereth commented Dec 20, 2023

jack-berg commented Dec 20, 2023

MrAlias commented Dec 21, 2023

jack-berg commented Dec 21, 2023

reyang commented Feb 27, 2024 •

edited

Loading

austinlparker commented Apr 30, 2024

cijothomas commented Sep 24, 2024

reyang commented Sep 25, 2024

MrAlias commented Sep 25, 2024

cijothomas commented Sep 25, 2024

MrAlias commented Sep 25, 2024

MrAlias commented Sep 25, 2024

cijothomas commented Sep 25, 2024

MrAlias commented Sep 25, 2024

trask commented Sep 25, 2024

trask commented Sep 25, 2024 •

edited

Loading

cijothomas commented Sep 25, 2024

reyang commented Sep 26, 2024 •

edited

Loading

How is the cardinality limit applied when attributes are being filtered? #3798

How is the cardinality limit applied when attributes are being filtered? #3798

Comments

MrAlias commented Dec 19, 2023 • edited Loading

Filter before limiting

Limit before filtering

jsuereth commented Dec 20, 2023

MrAlias commented Dec 20, 2023

MrAlias commented Dec 20, 2023

jsuereth commented Dec 20, 2023

MrAlias commented Dec 20, 2023 • edited Loading

jsuereth commented Dec 20, 2023

jsuereth commented Dec 20, 2023

jack-berg commented Dec 20, 2023

MrAlias commented Dec 21, 2023

jack-berg commented Dec 21, 2023

reyang commented Feb 27, 2024 • edited Loading

austinlparker commented Apr 30, 2024

cijothomas commented Sep 24, 2024

reyang commented Sep 25, 2024

MrAlias commented Sep 25, 2024

cijothomas commented Sep 25, 2024

MrAlias commented Sep 25, 2024

MrAlias commented Sep 25, 2024

cijothomas commented Sep 25, 2024

MrAlias commented Sep 25, 2024

trask commented Sep 25, 2024

trask commented Sep 25, 2024 • edited Loading

cijothomas commented Sep 25, 2024

reyang commented Sep 26, 2024 • edited Loading

MrAlias commented Dec 19, 2023 •

edited

Loading

MrAlias commented Dec 20, 2023 •

edited

Loading

reyang commented Feb 27, 2024 •

edited

Loading

trask commented Sep 25, 2024 •

edited

Loading

reyang commented Sep 26, 2024 •

edited

Loading