Define low/high-cardinality #2996

lmolkova · 2022-11-30T22:44:59Z

What are you trying to achieve?

Currently, we recommend using low-cardinality span names in all trace conventions.

It would be great to have a definition of cardinality and the idea of what low and high mean so we can refer to it from different semantic conventions.

Additional context.

It's partially explained today in metrics supplemental guidelines and trace API

    Now thinking (not related but), "low/high" cardinality is a topic that comes often and not everyone understands it. Would be cool if the spec defined this "once and for all".

Originally posted by @joaopgrassi in #2957 (comment)

The text was updated successfully, but these errors were encountered:

joaopgrassi · 2022-12-01T10:38:52Z

Thanks for creating the issue @lmolkova ! Some context on why I commented that in the PR:

I often speak with previous colleagues I worked with while I was a "full-time back-end developer". I ask them to try OTel, tell me their pains and their general idea of the spec and etc. One thing that always comes up is cardinality. None of them had much idea what it was and even worse, how they know the things they are instrumenting/recording are suffering from high cardinality.

Plus, during the messaging SIG meetings, the topic of high-cardinality has come up multiple times, for ex where we discussed span names and what to use for it. I remember we going through the usual "can't use this because it's high-cardinality" and then immediately after, people asking but why not? Why/where is the problem with x approach?

I thought about it and have some ideas, so I will just "dump" them here. What I thought would be either a complete new page for it or a section somewhere (e.g., glossary) with a structure like this:

Cardinality

Goals:

Explain "Cardinality" in a general and "easy to grasp" way. For ex, I found this one for SQL well structured
and maybe we could take some ideas from it https://en.wikipedia.org/wiki/Cardinality_(SQL_statements).

I would try to refrain from using complex, mathematical definitions as that doesn't help newcomers understand it.

Why high-cardinality is a problem?

Goals:

Explain what having high cardinality will cause for users in the end. With clear and easy to understand examples.
For ex, their queries/dashboards will provide less useful output, they will have high costs etc.

High-cardinality in traces

Goals: Explain with examples why it's a problem for traces

High-cardinality in metrics

Goals: Explain with examples why it's a problem for metrics

How do I achieve low-cardinality

Goals:

Here we can give best-practices on how to achieve this. For example, mentioning one should consider
using bounded values for attributes (categories, enums). Again, the goal is to provide guidance
with easy-to-understand language and with as much of real world examples as possible,
so folks actually using OTel and adding instrumentation have a solid foundation to base their instrumentation from

Curious to see what the community think about this. :)

lmolkova · 2023-08-10T23:10:08Z

Related: open-telemetry/semantic-conventions#205 (comment)

Low cardinality requirements apply to collection and storage, but query-time cardinality could also be important for user experience (for example, http.route has low-ish cardinality within one service, but could be much higher across all services in the system).

joaopgrassi · 2023-10-18T09:55:18Z

The TAG Observability white paper has definitions/explanations of metric cardinality https://github.com/cncf/tag-observability/blob/whitepaper-v1.0.0/whitepaper.md#metric-cardinality. Maybe we could borrow things from there, to finally fix this?

lmolkova added the spec:miscellaneous For issues that don't match any other spec label label Nov 30, 2022

github-actions bot assigned jmacd Nov 30, 2022

lmolkova mentioned this issue Nov 30, 2022

Refactor messaging attributes and specify per-message attributes #2957

Merged

potiuk mentioned this issue Feb 18, 2023

Emit DataDog statsd metrics with metadata tags apache/airflow#28961

Merged

lmolkova mentioned this issue Jun 6, 2023

Clarify cardinality requirements of a span name #3534

Open

dyladan added the triage:deciding:community-feedback label May 21, 2024

dyladan unassigned jmacd May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define low/high-cardinality #2996

Define low/high-cardinality #2996

lmolkova commented Nov 30, 2022

joaopgrassi commented Dec 1, 2022 •

edited

Loading

lmolkova commented Aug 10, 2023

joaopgrassi commented Oct 18, 2023

Define low/high-cardinality #2996

Define low/high-cardinality #2996

Comments

lmolkova commented Nov 30, 2022

joaopgrassi commented Dec 1, 2022 • edited Loading

Cardinality

Why high-cardinality is a problem?

High-cardinality in traces

High-cardinality in metrics

How do I achieve low-cardinality

lmolkova commented Aug 10, 2023

joaopgrassi commented Oct 18, 2023

joaopgrassi commented Dec 1, 2022 •

edited

Loading