Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define low/high-cardinality #2996

Open
lmolkova opened this issue Nov 30, 2022 · 3 comments
Open

Define low/high-cardinality #2996

lmolkova opened this issue Nov 30, 2022 · 3 comments
Labels
spec:miscellaneous For issues that don't match any other spec label triage:deciding:community-feedback

Comments

@lmolkova
Copy link
Contributor

What are you trying to achieve?

Currently, we recommend using low-cardinality span names in all trace conventions.

It would be great to have a definition of cardinality and the idea of what low and high mean so we can refer to it from different semantic conventions.

Additional context.

It's partially explained today in metrics supplemental guidelines and trace API

    Now thinking (not related but), "low/high" cardinality is a topic that comes often and not everyone understands it. Would be cool if the spec defined this "once and for all".

Originally posted by @joaopgrassi in #2957 (comment)

@lmolkova lmolkova added the spec:miscellaneous For issues that don't match any other spec label label Nov 30, 2022
@joaopgrassi
Copy link
Member

joaopgrassi commented Dec 1, 2022

Thanks for creating the issue @lmolkova ! Some context on why I commented that in the PR:

I often speak with previous colleagues I worked with while I was a "full-time back-end developer". I ask them to try OTel, tell me their pains and their general idea of the spec and etc. One thing that always comes up is cardinality. None of them had much idea what it was and even worse, how they know the things they are instrumenting/recording are suffering from high cardinality.

Plus, during the messaging SIG meetings, the topic of high-cardinality has come up multiple times, for ex where we discussed span names and what to use for it. I remember we going through the usual "can't use this because it's high-cardinality" and then immediately after, people asking but why not? Why/where is the problem with x approach?

I thought about it and have some ideas, so I will just "dump" them here. What I thought would be either a complete new page for it or a section somewhere (e.g., glossary) with a structure like this:


Cardinality

Goals:

Explain "Cardinality" in a general and "easy to grasp" way. For ex, I found this one for SQL well structured
and maybe we could take some ideas from it https://en.wikipedia.org/wiki/Cardinality_(SQL_statements).

I would try to refrain from using complex, mathematical definitions as that doesn't help newcomers understand it.

Why high-cardinality is a problem?

Goals:

Explain what having high cardinality will cause for users in the end. With clear and easy to understand examples.
For ex, their queries/dashboards will provide less useful output, they will have high costs etc.

High-cardinality in traces

Goals: Explain with examples why it's a problem for traces

High-cardinality in metrics

Goals: Explain with examples why it's a problem for metrics

How do I achieve low-cardinality

Goals:

Here we can give best-practices on how to achieve this. For example, mentioning one should consider
using bounded values for attributes (categories, enums). Again, the goal is to provide guidance
with easy-to-understand language and with as much of real world examples as possible,
so folks actually using OTel and adding instrumentation have a solid foundation to base their instrumentation from


Curious to see what the community think about this. :)

@lmolkova
Copy link
Contributor Author

Related: open-telemetry/semantic-conventions#205 (comment)

Low cardinality requirements apply to collection and storage, but query-time cardinality could also be important for user experience (for example, http.route has low-ish cardinality within one service, but could be much higher across all services in the system).

@joaopgrassi
Copy link
Member

The TAG Observability white paper has definitions/explanations of metric cardinality https://github.com/cncf/tag-observability/blob/whitepaper-v1.0.0/whitepaper.md#metric-cardinality. Maybe we could borrow things from there, to finally fix this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:miscellaneous For issues that don't match any other spec label triage:deciding:community-feedback
Projects
None yet
Development

No branches or pull requests

4 participants