Change the instance name for standard pod scraping to be unique #261

beorn7 · 2020-05-15T16:42:31Z

@tomwilkie @woodsaj @malcolmholmes Please have a careful look here. This is a biggie. It changes almost every metric we have. I went through all the code underneath deployment_tools/ksonnet and tried to find any code that depends on the current instance naming. I found only grafana/loki#2080 , but of course, this is subtle enough that there might be many more code paths that break due to this change. However, we have to do something about it, and I think what I propose here is the way to go.

Commit description:

Any of the potentially many containers in a pod can expose one or more
ports with Prometheus metrics. However, with our current target
labels, all of these targets get the same instance label (just the pod
name), which leads to the dreaded PrometheusOutOfOrderTimestamps
alert, see https://github.com/grafana/deployment_tools/issues/3441 .

(In fact, if we get the alert, we are already lucky, because the
problem can go unnoticed until someone actually needs one of the time
series that receive samples from different targets, rendering them
useless.)

In practice, we rarely have more than one port to scrape per pod, but
it does happen, and it's totally within the intended usage pattern of
K8s, which means it can happen more at any time.

The two examples I'm aware of:

Kube-state-metrics (KSM) has only one container it its pod, but that
container exposes two metrics ports (http-metrics and self-metrics).
Consul pods run a container with the consul-exporter and a container
with the statsd-exporter, each exposing their metrics on a different
port. Both ports are named http-metrics, which is possible because
they are exposed by different containers. (This is the case that
triggered the above linked issue.)

To avoid the metric duplication, we could add a container and a port
label, but it is a Prometheus convention that the instance label alone
should be unique within a job.

Which brings us to what I'm proposing in this commit: Create the
instance label by joining pod name, container name, and port name with
: in between. In most cases, the resulting instance value will
appear redundant, but I believe the consistency has some
value. Applying same magic to shorten the instance label when possible
would add complexity and remove the consistency.

Any of the potentially many containers in a pod can expose one or more ports with Prometheus metrics. However, with our current target labels, all of these targets get the same instance label (just the pod name), which leads to the dreaded `PrometheusOutOfOrderTimestamps` alert, see grafana/deployment_tools#3441 . (In fact, if we get the alert, we are already lucky, because the problem can go unnoticed until someone actually needs one of the time series that receive samples from different targets, rendering them useless.) In practice, we rarely have more than one port to scrape per pod, but it does happen, and it's totally within the intended usage pattern of K8s, which means it can happen more at any time. The two examples I'm aware of: - Kube-state-metrics (KSM) has only one container it its pod, but that container exposes two metrics ports (http-metrics and self-metrics). - Consul pods run a container with the consul-exporter and a container with the statsd-exporter, each exposing their metrics on a different port. Both ports are named http-metrics, which is possible because they are exposed by different containers. (This is the case that triggered the above linked issue.) To avoid the metric duplication, we could add a container and a port label, but it is a Prometheus convention that the instance label alone should be unique within a job. Which brings us to what I'm proposing in this commit: Create the instance label by joining pod name, container name, and port name with `:` in between. In most cases, the resulting instance value will appear redundant, but I believe the consistency has some value. Applying same magic to shorten the instance label when possible would add complexity and remove the consistency.

gouthamve · 2020-05-15T17:53:09Z

Hrm, having hit this before, I see the value. But damn, it can potentially break things in subtle ways.

Joins would be difficult b/w cAdvisor data and metrics, if we ever want to do that. Hrm. Not sure, but we should fix this imo.

"but it is a Prometheus convention that the instance label alone should be unique within a job."

Just thinking out loud: we could break this but realised scrape_samples_scraped would be broken anyways (or would it be?). Like can we afford to break this convention? And whats the reasoning behind it?

beorn7 · 2020-05-15T18:17:38Z

Joins would be difficult b/w cAdvisor data and metrics, if we ever want to do that. Hrm. Not sure, but we should fix this imo.

That's a good point. Joins with cAdvisor metrics isn't properly possible at the moment anyway because we do not attach the container name anywhere, i.e. in the consul case, you couldn't join because cAdvisor would give you metrics for the statsd-exporter and the consul-exporter container, both with the same pod name label. The "join with cAdvisor" use case is actually a reason to have a pod and container target label. I would still not "abuse" the instance label to be the same as the pod name.

About the convention: I guess it is often helpful in grouping and label matching to know that any_given_metric_name{instance="something"} will only ever have at most one match. Otherwise, you have to know what combination of labels creates a unique match, and if everyone does it differently, it gets harder to share rules and dashboards.

beorn7 · 2020-05-15T18:19:55Z

I'll add a commit that also adds container and pod target labels, just for demonstration. We can rip it out of this PR if you don't like it.

This allows joining with cAdvisor metrics.

tomwilkie · 2020-05-18T12:48:00Z

@beorn7 can you follow up to ensure the Loki scrape config is consistent?

beorn7 · 2020-05-18T12:56:26Z

I did a quick check that we don't have any regular application metrics that have a container or pod label on their own.

beorn7 · 2020-05-18T13:03:27Z

can you follow up to ensure the Loki scrape config is consistent?

Working on it.

beorn7 · 2020-05-18T13:04:09Z

As this needs a vendor update to push it to production, I merge this one already.

The big and scary change will be the vendoring update for this and the Loki changes.

This is triggered by grafana/jsonnet-libs#261 . The above PR changes the `instance` label to be actually unique within a scrape config. It also adds a `pod` and a `container` target label so that metrics can easily be joined with metrics from cAdvisor, KSM, and the Kubelet. This commit adds the same to the Loki scrape config. It also removes the `container_name` label. It is the same as the `container` label and was already added to Loki previously. However, the `container_name` label is deprecated and has disappeared in K8s 1.16, so that it will soon become useless for direct joining.

beorn7 · 2020-05-18T14:34:45Z

@tomwilkie Follow-up for Loki: grafana/loki#2091

This is triggered by grafana/jsonnet-libs#261 . The above PR removes the `instance` label. As it has turned out (see PR linked above), a sane `instance` label in Prometheus has to be unique, and that includes the case where a single container exposes metrics on two different endpoints. However, that scenario would still only result in one log stream for Loki to scrape. Therefore, Loki and Prometheus need to sync via target labels uniquely identifying a container (rather than a metrics endpoint). Those labels are namespace, pod, container, also added here. This commit removes the `container_name` label. It is the same as the `container` label and was already added to Loki previously. However, the `container_name` label is deprecated and has disappeared in K8s 1.16, so that it will soon become useless for direct joining.

Fix typo introduced in #261

This is triggered by grafana/jsonnet-libs#261 . The above PR removes the `instance` label. As it has turned out (see PR linked above), a sane `instance` label in Prometheus has to be unique, and that includes the case where a single container exposes metrics on two different endpoints. However, that scenario would still only result in one log stream for Loki to scrape. Therefore, Loki and Prometheus need to sync via target labels uniquely identifying a container (rather than a metrics endpoint). Those labels are namespace, pod, container, also added here. This commit removes the `container_name` label. It is the same as the `container` label and was already added to Loki previously. However, the `container_name` label is deprecated and has disappeared in K8s 1.16, so that it will soon become useless for direct joining.

grafana/jsonnet-libs#261 updates labels to make instance labels unique. This commit sycnes with that change, but subsequently makes an overdue change by going through all the dashboards and fixing various issues with them. The new dashboards are compatible with the new labeling scheme, but also fix some problems: 1. Make sure the unloved Agent and Agent Prometheus Remote Write run correct queries and account for the instance_name labels 2. Use proper graph label values in Agent Operational 3. Allow to filter Agent Operational graph by container As part of making the Agent dashboard useful, a new metric has been added to track samples added to the WAL over time. Closes #73.

This is triggered by grafana/jsonnet-libs#261 . The above PR removes the `instance` label. As it has turned out (see PR linked above), a sane `instance` label in Prometheus has to be unique, and that includes the case where a single container exposes metrics on two different endpoints. However, that scenario would still only result in one log stream for Loki to scrape. Therefore, Loki and Prometheus need to sync via target labels uniquely identifying a container (rather than a metrics endpoint). Those labels are namespace, pod, container, also added here. This commit removes the `container_name` label. It is the same as the `container` label and was already added to Loki previously. However, the `container_name` label is deprecated and has disappeared in K8s 1.16, so that it will soon become useless for direct joining.

grafana/jsonnet-libs#261 updates labels to make instance labels unique. This commit sycnes with that change, but subsequently makes an overdue change by going through all the dashboards and fixing various issues with them. The new dashboards are compatible with the new labeling scheme, but also fix some problems: 1. Make sure the unloved Agent and Agent Prometheus Remote Write run correct queries and account for the instance_name labels 2. Use proper graph label values in Agent Operational 3. Allow to filter Agent Operational graph by container As part of making the Agent dashboard useful, a new metric has been added to track samples added to the WAL over time. Closes #73.

This is triggered by grafana/jsonnet-libs#261 . The above PR removes the `instance` label. As it has turned out (see PR linked above), a sane `instance` label in Prometheus has to be unique, and that includes the case where a single container exposes metrics on two different endpoints. However, that scenario would still only result in one log stream for Loki to scrape. Therefore, Loki and Prometheus need to sync via target labels uniquely identifying a container (rather than a metrics endpoint). Those labels are namespace, pod, container, also added here. This commit removes the `container_name` label. It is the same as the `container` label and was already added to Loki previously. However, the `container_name` label is deprecated and has disappeared in K8s 1.16, so that it will soon become useless for direct joining.

beorn7 requested review from tomwilkie, woodsaj and malcolmholmes May 15, 2020 16:42

Add a container and pod target label

0ccca40

This allows joining with cAdvisor metrics.

beorn7 force-pushed the beorn7/prom-config branch from 5c63aff to 0ccca40 Compare May 15, 2020 22:33

tomwilkie approved these changes May 18, 2020

View reviewed changes

beorn7 mentioned this pull request May 18, 2020

mixin: Accept suffixes to pod name in instance labels grafana/loki#2080

Merged

beorn7 merged commit 811ccb0 into master May 18, 2020

beorn7 deleted the beorn7/prom-config branch May 18, 2020 13:06

beorn7 mentioned this pull request May 18, 2020

Keep scrape config in line with the new Prometheus scrape config grafana/loki#2091

Merged

beorn7 added a commit that referenced this pull request May 22, 2020

Fix typo introduced in #261

6953043

beorn7 added a commit that referenced this pull request May 26, 2020

Merge pull request #262 from grafana/beorn7/prom-config

51b87a1

Fix typo introduced in #261

rfratto mentioned this pull request May 28, 2020

update labels to current jsonnet-libs standard and fix dashboards grafana/agent#98

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the instance name for standard pod scraping to be unique #261

Change the instance name for standard pod scraping to be unique #261

beorn7 commented May 15, 2020

gouthamve commented May 15, 2020

beorn7 commented May 15, 2020

beorn7 commented May 15, 2020

tomwilkie commented May 18, 2020

beorn7 commented May 18, 2020

beorn7 commented May 18, 2020

beorn7 commented May 18, 2020

beorn7 commented May 18, 2020

Change the instance name for standard pod scraping to be unique #261

Change the instance name for standard pod scraping to be unique #261

Conversation

beorn7 commented May 15, 2020

gouthamve commented May 15, 2020

beorn7 commented May 15, 2020

beorn7 commented May 15, 2020

tomwilkie commented May 18, 2020

beorn7 commented May 18, 2020

beorn7 commented May 18, 2020

beorn7 commented May 18, 2020

beorn7 commented May 18, 2020