Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promtail generates very high cardinality prometheus metrics #5553

Closed
m1keil opened this issue Mar 7, 2022 · 8 comments · Fixed by #5603
Closed

Promtail generates very high cardinality prometheus metrics #5553

m1keil opened this issue Mar 7, 2022 · 8 comments · Fixed by #5603

Comments

@m1keil
Copy link
Contributor

m1keil commented Mar 7, 2022

Describe the bug
We use consulagent_sd_configs scrape config to automatically detect and tail logs in our instances. These run Nomad clients that schedule all the workloads.
While inspecting the metrics exported by Promtail via /metrics I can see large sets of data points that are labelled with the file path that is being tracked by Promtail:

$ curl --silent 172.30.61.148:26017/metrics | grep "^promtail_" | sort | grep -E 'path|filename' | cut -d{ -f1 | uniq -c
     22 promtail_file_bytes_total
    153 promtail_log_entries_bytes_bucket
     17 promtail_log_entries_bytes_count
     17 promtail_log_entries_bytes_sum
     22 promtail_read_bytes_total
     17 promtail_read_lines_total
      2 promtail_stream_lag_seconds

I have only 5 services running on this node, but this generates up to 153 different labels sets (due to the metric being a summary/histogram).
Each service can have:

  • stdout & stderr log file
  • multiple log files (log rotation)
  • different allocation id (a unique ID to Nomad scheduling and execution)

Here's a verbose example:

promtail_read_bytes_total{path="/opt/nomad/alloc/3d2e6c91-b769-38be-8910-094fb6510e36/alloc/logs/app.stderr.10"} 1.0478718e+07
promtail_read_bytes_total{path="/opt/nomad/alloc/3d2e6c91-b769-38be-8910-094fb6510e36/alloc/logs/app.stderr.11"} 1.0482271e+07
promtail_read_bytes_total{path="/opt/nomad/alloc/3d2e6c91-b769-38be-8910-094fb6510e36/alloc/logs/app.stderr.12"} 2.391382e+06
promtail_read_bytes_total{path="/opt/nomad/alloc/3d2e6c91-b769-38be-8910-094fb6510e36/alloc/logs/app.stderr.3"} 1.0482602e+07
promtail_read_bytes_total{path="/opt/nomad/alloc/3d2e6c91-b769-38be-8910-094fb6510e36/alloc/logs/app.stderr.4"} 1.0476908e+07
promtail_read_bytes_total{path="/opt/nomad/alloc/3d2e6c91-b769-38be-8910-094fb6510e36/alloc/logs/app.stderr.5"} 1.048399e+07
promtail_read_bytes_total{path="/opt/nomad/alloc/3d2e6c91-b769-38be-8910-094fb6510e36/alloc/logs/app.stderr.6"} 1.0485677e+07
promtail_read_bytes_total{path="/opt/nomad/alloc/3d2e6c91-b769-38be-8910-094fb6510e36/alloc/logs/app.stderr.7"} 1.04807e+07
promtail_read_bytes_total{path="/opt/nomad/alloc/3d2e6c91-b769-38be-8910-094fb6510e36/alloc/logs/app.stderr.8"} 1.0482844e+07
promtail_read_bytes_total{path="/opt/nomad/alloc/3d2e6c91-b769-38be-8910-094fb6510e36/alloc/logs/app.stderr.9"} 1.0482479e+07
promtail_read_bytes_total{path="/opt/nomad/alloc/3d2e6c91-b769-38be-8910-094fb6510e36/alloc/logs/app.stdout.0"} 0

To make matters worse, the final number of the possible label values is amplified by N nodes that run Nomad.
This can generate so many labels that it can bring a decent Prometheus cluster to its knees.

As far as I can tell, there is no way to disable this detailed monitoring and the only option I have at the moment would be to drop these metrics at scrape time. It would be useful to have such detailed monitoring enabled by demand and not by default (---server.detailed-instrumentation).

@dannykopping
Copy link
Contributor

Thanks for reporting this @m1keil.

It's a fair point, but I don't think we will make the change you're suggesting and export metrics of different verbosity based on a flag. It adds some complexity and begs the unanswerable question: "what is a detailed metric?". We can't answer this for everyone in every scenario, so we'd prefer to export everything and have users filter out what they don't need.

On that note:
I would suggest adding a match stage to drop the metrics for rotated files.

How does that sound?

@m1keil
Copy link
Contributor Author

m1keil commented Mar 7, 2022

We use match stage but I don't see how can it help here given that:

  • The log rotation implementation in nomad is implemented by creating a new file and raising the index by 1. I.e the latest log file is the one with the highest index, not necessery 0.
  • Doesn't address the high cardinality issue with allocation ID (the UUID in the path).

If there's something I'm missing here let me know.

We can't answer this for everyone in every scenario, so we'd prefer to export everything and have users filter out what they don't need.

I guess it's fair enough but I do think you need to consider sane defaults from promtail first. High cardinality prometheus metrics is a known issue that should be taken with care. Some people might enable this without realising this is a problem.
There is no harm done by exposing less by default and only exposing everything by explicit demand but the other way isn't true.

@dannykopping
Copy link
Contributor

dannykopping commented Mar 7, 2022

OK, then I think you need to drop the metrics at scrape time; I can't think of an alternative solution for you.

There is no harm done by exposing less by default and only exposing everything by explicit demand but the other way isn't true.

The problem is choosing what to cut out. You never know what metric would be critically useful to an operator.

In this case, Promtail could not expose any fewer metrics without understanding the specifics of your system.
I think it is the responsibility of the operator to ensure the health of their own Prometheus instance.

@m1keil
Copy link
Contributor Author

m1keil commented Mar 7, 2022

The problem is choosing what to cut out.

I think in this case the answer can be simple, detailed = potentially unbound cardinality.

This is a common idiom in Prometheus world, for e.g, cAdvisor and node_exporter won't enable all of the metadata or metrics by default.

I think it is the responsibility of the operator to ensure the health of their own Prometheus instance.

This is true regardless of how promtail decided to run by default. But sane defaults can prevent mistakes that operators (will) make. The same way we don't ship software to prod with DEBUG logging.

@dannykopping
Copy link
Contributor

We're missing each other on a fundamental point:
If promtail didn't expose metrics with potential unbounded cardinality, we wouldn't be able to expose any metrics with file paths.

@m1keil
Copy link
Contributor Author

m1keil commented Mar 7, 2022

Well, yes, but I'm not saying don't allow exposing at all, all I'm saying is "maybe you shouldn't do that by default".

Looking at the existing metric per file:

  • promtail_file_bytes_total: docs say it's the number of bytes read but I think it's the actual file size ?
  • promtail_read_bytes_total: bytes read (per file)
  • promtail_read_lines_total: line read (per file)
  • promtail_stream_lag_seconds: undocumented
  • promtail_log_entries_bytes: I guess this is a distribution of log line sizes per file (?)

Now, are these metrics helpful? Probably. Are they so important to be on? Maybe a subset. It would be nice if
promtail_read_bytes_total, promtail_read_lines_total or promtail_stream_lag_seconds had "total" (non per-file) version. But I do understand this adds troubles in code.

Maybe as a compromise, we can at least add a note about the potential high cardinality in the docs?

@dannykopping
Copy link
Contributor

Seems like a reasonable compromise to me.

Can you submit the PR? I want to give you credit for this suggestion.

@dannykopping
Copy link
Contributor

Closing this one, we can reference it in the PR 👍

KMiller-Grafana added a commit that referenced this issue May 20, 2022
* Hint about potential high cardinality metrics

Add warning about potential high cardinality metrics due to the file path being included in some of the metrics. 
Depending on the setup, such config can result in a very large amount of label values, which is not recommended by Prometheus.
This can result in increased storage, slow down of queries, extra costs and so on.

Address what has been discussed in #5553.

* Reword observability.md

Co-authored-by: Karen Miller <84039272+KMiller-Grafana@users.noreply.github.com>

Co-authored-by: Karen Miller <84039272+KMiller-Grafana@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants