Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is the cardinality limit applied when attributes are being filtered? #3798

Closed
MrAlias opened this issue Dec 19, 2023 · 24 comments · Fixed by #4228
Closed

How is the cardinality limit applied when attributes are being filtered? #3798

MrAlias opened this issue Dec 19, 2023 · 24 comments · Fixed by #4228
Assignees
Labels
spec:metrics Related to the specification/metrics directory triage:accepted:ready-with-sponsor Ready to be implemented and has a specification sponsor assigned

Comments

@MrAlias
Copy link
Contributor

MrAlias commented Dec 19, 2023

When a view is applied to a metric pipeline that contains an attribute filter, how should the cardinality limit be applied? Prior to filtering attributes, or post?

The answer we come up with will affect the output of user data. For example, if measurements for the following attributes are made:

  1. {path: "/", code: 200, query: ""}
  2. {path: "/", code: 400, query: "user=bob"}
  3. {path: "/admin", code: 200, query: ""}

If an attribute filter is applied so that only the path attribute is retained and cardinality limit of 3 is set, if the filtering is applied prior to checking the cardinality limit the following attributes will be kept on the output metric streams:

  • {path: "/"}
  • {path: "admin"}

However, if the cardinality limit is applied prior to filtering the following attributes will be kept on the output metric streams:

  • {path: "/"}
  • {otel.metric.overflow: true}

Filter before limiting

Given the cardinality limit feature was added to limit the number of resources and SDK uses during measurements, if filtering is to be applied prior to the cardinality limit being applied it will need to be done in the "hot path" of making a measurement.

Pros:

  • The correct number of attributes (<= cardinality limit) are ensured to be on the output metric streams

Cons:

  • Requires filtering for every measurement made

Limit before filtering

Limiting without doing any filtering means the filtering process can be delayed to the collection of metric streams. This is a substantial performance issue given the filtering process will need to run at most M times (for M being the number of distinct attributes recorded) rather than N times (for N being the number of measurements made).

Pros:

  • Performance on the "hot path" is not impacted by filtering

Cons

  • It is possible that there will be less than the cardinality limit many distinct attributes exported on the output metric streams (possibly even just otel.metric.overflow attribute in some cases)
@jsuereth
Copy link
Contributor

The limit is intended to be against the amount of in memory storage you have. Given that, I believe the filtering MUST be done on your hot-path. The intention is to avoid Denial-of-Service attacks via high cardinality inputs that impact metrics (as many users will attach attributes that come from requests, even our semconv recommend this).

I don't think we have any rules about forcing filtering to be in the hot path or afterwards.

To me this means you have to limit before you hit any in-memory storage. Whether this means, in Go, you start doing filtering in the hot path, is up to you. I'd argue that the con of having less cardinality than your limit in some cases is better than the alternative of running out of memory :)

@MrAlias
Copy link
Contributor Author

MrAlias commented Dec 20, 2023

Given that, I believe the filtering MUST be done on your hot-path.

@jsuereth did you mean "Given that, I believe the [limiting] MUST be done on your hot-path"? Not necessarily the filtering, right?

@MrAlias
Copy link
Contributor Author

MrAlias commented Dec 20, 2023

To me this means you have to limit before you hit any in-memory storage. Whether this means, in Go, you start doing filtering in the hot path, is up to you. I'd argue that the con of having less cardinality than your limit in some cases is better than the alternative of running out of memory :)

👍

@jsuereth
Copy link
Contributor

@jsuereth did you mean "Given that, I believe the [limiting] MUST be done on your hot-path"? Not necessarily the filtering, right?

Yep! Limiting MUST be on hot path, you can pick how you filter for best UX + performance.

@MrAlias
Copy link
Contributor Author

MrAlias commented Dec 20, 2023

@jsuereth what are you thoughts on potential inconsistencies across languages here?

If one implementation filters prior to the limiting and another filters afterwards they could produce different outputs. Is consistency here going to be an issue?

@jsuereth
Copy link
Contributor

If one implementation filters prior to the limiting and another filters afterwards they could produce different outputs. Is consistency here going to be an issue?

I think regarding cardinality limits we're talking about error scenarios and worst-case behavior. We already have a lot of inconssitencies in how failures are handled due to runtime limitations. We try to be consistent, but when it comes to extraordinary/error scenarios, I think some inconsistencies between SDKs is ok.

@jsuereth
Copy link
Contributor

Another way to phrase it -> I think users, if given a choice, would prefer lower o11y overhead per-language over perfect consistency.

@reyang reyang added the [label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR label Dec 20, 2023
@reyang reyang assigned reyang and unassigned jmacd Dec 20, 2023
@jack-berg
Copy link
Member

Java checks whether the cardinality exceeds the limit after attribute filtering has occurred. I think this is the right behavior because its the least surprising to users, e.g. its surprising if I set my cardinality limit to 100 but only see 20 series. Both the attribute filter and cardinality limit are mechanisms to manage cardinality. The should work together and not be at odds with each other.

@MrAlias
Copy link
Contributor Author

MrAlias commented Dec 21, 2023

Java checks whether the cardinality exceeds the limit after attribute filtering has occurred. I think this is the right behavior because its the least surprising to users, e.g. its surprising if I set my cardinality limit to 100 but only see 20 series. Both the attribute filter and cardinality limit are mechanisms to manage cardinality. The should work together and not be at odds with each other.

Doesn't this mean that Java needs to filter every measurement the user makes though? And do so in the "hot path" of telemetry recording?

@jack-berg
Copy link
Member

Doesn't this mean that Java needs to filter every measurement the user makes though? And do so in the "hot path" of telemetry recording?

Yes.

@reyang
Copy link
Member

reyang commented Feb 27, 2024

I've clarified the SDK cardinality limit in #3856. The spec now says "For a given metric, the cardinality limit is a hard limit on the number of metric points that can be collected during a collection cycle". I also sent another editorial PR to clean up metric points #3906.

@MrAlias Do you think this issue can be marked as resolved? (I've provided more info regarding how I want to address a set of problems in #3866).

@austinlparker
Copy link
Member

@MrAlias is this complete?

@austinlparker austinlparker added triage:accepted:ready-with-sponsor Ready to be implemented and has a specification sponsor assigned and removed [label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR labels Apr 30, 2024
@austinlparker austinlparker moved this to Spec - Accepted in 🔭 Main Backlog Jul 16, 2024
@austinlparker austinlparker moved this from Spec - Accepted to Spec - In Progress in 🔭 Main Backlog Jul 16, 2024
@cijothomas
Copy link
Member

I think this can be closed with the conclusion that Filtering of attributes should be done before applying limits. This requires filtering of attributes in hot path. Some languages may vary in implementation (for perf reasons), but that is considered okay.

@reyang
Copy link
Member

reyang commented Sep 25, 2024

@MrAlias please let me know if you want to reopen it. I haven't seen an update from you after #3798 (comment).

@reyang reyang closed this as completed Sep 25, 2024
@MrAlias
Copy link
Contributor Author

MrAlias commented Sep 25, 2024

I think this can be closed with the conclusion that Filtering of attributes should be done before applying limits. This requires filtering of attributes in hot path. Some languages may vary in implementation (for perf reasons), but that is considered okay.

The conclusion that @jsuereth mentioned above is different than this.

This issue may need to be reopened and a change made to the specification if there is still confusion about the permissiveness of what is allowed.

@cijothomas
Copy link
Member

I think this can be closed with the conclusion that Filtering of attributes should be done before applying limits. This requires filtering of attributes in hot path. Some languages may vary in implementation (for perf reasons), but that is considered okay.

The conclusion that @jsuereth mentioned above is different than this.

This issue may need to be reopened and a change made to the specification if there is still confusion about the permissiveness of what is allowed.

What part do you see misalignment? Is that one of the below, or different?

Filtering of attributes should be done before applying limits
OR
Depending on implementation, some languages maybe deviating from this, and that is okay

(Separately, I am curious to learn more on why filtering in hot-path is considered expensive. If unwanted tags are not dropped in hot path early enough, then one has to process entire set of incoming attributes (sort, de-dup,hash-lookup, all of which have a cost proportional to the number of attributes), and hence it would be less performant...So it feels counter-intuitive how filtering in hot path is more expensive.

We have done some optimizations in OTel .NET to do that (https://github.com/open-telemetry/opentelemetry-dotnet/pull/3864/files/3f47828869a6bc42478bd5468b864382d147cff1#r1013461213), and I am in need of implementing the same in a performant way in OTel Rust too, so want to check all possible ways to get high perf

We can chat about this offline if that is easier.
)

@MrAlias
Copy link
Contributor Author

MrAlias commented Sep 25, 2024

I mistook your statement as meaning there was a normative recommendation to filter prior to applying limits. I think if you are making a non-normative recommendation there, then things look aligned. If this is supposed to be normative, I think we need to codify that in the specification.

@MrAlias
Copy link
Contributor Author

MrAlias commented Sep 25, 2024

If unwanted tags are not dropped in hot path early enough, then one has to process entire set of incoming attributes (sort, de-dup,hash-lookup, all of which have a cost proportional to the number of attributes)

This processing is done outside of the hot-path. It is not performed when the attribute/span is recorded. If designed correctly this can be done concurrently while the API continues to record.

@cijothomas
Copy link
Member

I mistook your statement as meaning there was a normative recommendation to filter prior to applying limits. I think if you are making a non-normative recommendation there, then things look aligned. If this is supposed to be normative, I think we need to codify that in the specification.

Got it. Do you think its best if we make a normative statement in the spec like "If SDK is enforcing cardinality limits, it SHOULD be done after filtering of attributes, if any".

(It felt obvious to me that filtering should be done prior to cardinality capping (maybe because I was only thinking from .NET implementation!) but based on this issue/discussion, it may be worth adding it to spec!)

@MrAlias
Copy link
Contributor Author

MrAlias commented Sep 25, 2024

I mistook your statement as meaning there was a normative recommendation to filter prior to applying limits. I think if you are making a non-normative recommendation there, then things look aligned. If this is supposed to be normative, I think we need to codify that in the specification.

Got it. Do you think its best if we make a normative statement in the spec like "If SDK is enforcing cardinality limits, it SHOULD be done after filtering of attributes, if any".

(It felt obvious to me that filtering should be done prior to cardinality capping (maybe because I was only thinking from .NET implementation!) but based on this issue/discussion, it may be worth adding it to spec!)

I'm fine either way.

The Go SIG was taking the comments in this issue and the lack of specification language to mean there was a decided and intentional ambiguity. If we want to change that and make a normative recommendation, that would be fine as well. The Go SIG would need to evaluate more objectively the trade-offs and decide if we would want to comply with the recommendation.

I support unifying behavior, but I also support @jsuereth's point that this is an error situation and ambiguity here is also fine.

@trask
Copy link
Member

trask commented Sep 25, 2024

Re-opening since seeing so many comments on a closed issue gives me anxiety 😅.

@trask trask reopened this Sep 25, 2024
@trask
Copy link
Member

trask commented Sep 25, 2024

I would personally prefer this:

make a normative statement in the spec like "If SDK is enforcing cardinality limits, it SHOULD be done after filtering of attributes, if any"

e.g. we're using this behavior in Java instrumentation: #3546 (comment)

I also think it gives users the cleanest way to recover from a cardinality limit error condition, by dropping a (high-cardinality) attribute via metric view and getting full fidelity of all the remaining attributes

@cijothomas cijothomas assigned cijothomas and unassigned reyang Sep 25, 2024
@cijothomas
Copy link
Member

I'll send a PR with that wording, and then close this issue with the PR. (assuming PR gets approved by all!)

@reyang
Copy link
Member

reyang commented Sep 26, 2024

I'm fine if we want to have additional clarification in the spec.
Do you think there will be confusion at all? I don't see the confusion as we have this statement in the spec "... a hard limit on the number of metric points..." after #3856. Only attributes that are not filtered out will make to metric points during aggregation. @MrAlias do you see any risk that people will interpret it in the opposite way?

@austinlparker austinlparker moved this from Spec - In Progress to Spec - Closed in 🔭 Main Backlog Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:metrics Related to the specification/metrics directory triage:accepted:ready-with-sponsor Ready to be implemented and has a specification sponsor assigned
Projects
Status: Spec - Closed
8 participants