Add initial e2e tests #539

olivierlemasle · 2022-11-09T16:27:26Z

This adds E2E tests to prometheus-adapter.
These tests:

bring up a kind cluster (or re-use an existing k8s cluster)
build the prometheus-adapter image
deploy prometheus-operator and a prometheus instance
deploy prometheus-adapter using the manifests in deploy/manifests

It checks that everything can be deployed, and, as a first test, checks that:

nodes metrics bring positive values
there are some pod metrics

It also prints the prometheus-adapter logs.

test/e2e/e2e_test.go

dgrisonnet · 2022-12-09T15:06:31Z

test/e2e/e2e_test.go

+	return clientSet, metricsClientSet
+}
+
+func waitForPrometheusReady(ctx context.Context, t *testing.T, client dynamic.Interface, namespace string, name string) error {


what do you think about reusing the prometheus-operator e2e test framework for that? https://github.com/prometheus-operator/prometheus-operator/blob/1dfecab7c2706f3591d6bb6d5eb8f6b2518f1b45/test/framework/prometheus.go#L309-L320

That's a good idea. I tried but got stuck in dependency hells, as prometheus-operator brings Kubernetes, prometheus and opentelemetry dependencies in different versions than prometheus-adapter.
However, I could use prometheus-operator client (which is a different module with fewer dependencies) to make this function less verbose, if you think it's better.

That's a good idea. I tried but got stuck in dependency hells, as prometheus-operator brings Kubernetes, prometheus and opentelemetry dependencies in different versions than prometheus-adapter.

Oh that's unfortunate, I am fine with the current approach, just though that it could maybe be simplified by importing existing code.

However, I could use prometheus-operator client (which is a different module with fewer dependencies) to make this function less verbose, if you think it's better.

Yeah that's a good idea

I refactored the tests to use the prometheus-operator client.

test/e2e/e2e_test.go

dgrisonnet · 2022-12-09T15:12:18Z

test/prometheus-manifests/service-monitor-kubelet.yaml

@@ -0,0 +1,101 @@
+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor


Could you make this ServiceMonitor more minimal? We shouldn't need all the relabeling rules here and the only path that we want to scrape on Kubelet is /metrics/resources.

I made the ServiceMonitor more minimal. However, I'm still scraping /metrics/cadvisor, not /metrics/resource, which do not provide container_cpu_usage_seconds_total{id='/'}.

It was renamed to container_cpu_usage_seconds in kubernetes/kubernetes#86282. Since /metrics/resource is a more lightweight version of /metrics/cadvisor, it would be better to use the new endpoint and the new metric.

Actually, kubernetes/kubernetes#86282 was kind of reverted by kubernetes/kubernetes#89540 in Kubernetes 1.19, so the metric is named container_cpu_usage_seconds_total (for k8s < 1.18 and k8s >= 1.19), and that's the metric documented.

What I meant is that, using the path /metrics/resource, querycontainer_cpu_usage_seconds_total{id='/'} returns nothing, because the label id is never empty (however it works on path /metrics/cadvisor).

So I've updated the configmap manifest to use node_cpu_usage_seconds_total and node_memory_working_set_bytes instead. I can make a separate PR for that if you prefer. I was unsure because you suggested yourself in #531 (comment) to use {id='/'} for node queries.

What I meant is that, using the path /metrics/resource, querycontainer_cpu_usage_seconds_total{id='/'} returns nothing, because the label id is never empty (however it works on path /metrics/cadvisor

Oh that's interesting, I wasn't aware that id wasn't included in the metric exposed /metrics/resource. I guess an alternative could then be to group by node label since it is injected by prometheus-operator, but at this point I am not really sure what's best between that and depending on node-exporter. For sure we should move away from /metrics/cadvisor since it is bound to disappear.

test/prometheus-manifests/prometheus.yaml

dgrisonnet · 2023-01-19T09:58:39Z

Thinking about it again, let's move away from /metrics/cadvisor for node metrics and rely on node-exporter instead since we can't get the node-level information from /metrics/resource. That's what most of the community is using anyway and we've already had asks to switch: #516

So for the tests we would want:

ServiceMonitor on kubelet /metrics/resource for CPU queries
ServiceMonitor on node-exporter for Node queries

olivierlemasle · 2023-01-19T11:00:23Z

@dgrisonnet I'm not sure I understand the issue with the way it works in this PR.

node_cpu_usage_seconds_total and node_memory_working_set_bytes are node metrics provided by the kubelet itself (cf https://github.com/kubernetes/kubernetes/blob/8f94681cd294aa8cfd3407b8191f6c70214973a4/pkg/kubelet/metrics/collectors/resource_metrics.go#L30-L42), so we dont't need node-exporter, do we?

That being said, I guess that prometheus-adapter is often used in conjunction with node-exporter (in kube-prometheus or openshift's cluster-monitoring-operator), so I'm not against using node-exporter in default manifests and in E2E.

dgrisonnet · 2023-01-19T11:24:12Z

😩 I totally forgot that node-level metrics were introduced sorry about that.

IIRC node-exporter and kubelet are getting the data from the same place so reducing the dependencies to just kubelet would definitely be better.

dgrisonnet

Thank you for bearing with me on this one. As a follow-up we should enable the tests in CI.

/lgtm
/approve

k8s-ci-robot · 2023-01-19T11:30:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrisonnet, olivierlemasle

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dgrisonnet]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

olivierlemasle · 2023-01-19T11:39:29Z

Thank you @dgrisonnet 🎉

I've just updated kubernetes/test-infra#27948

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 9, 2022

k8s-ci-robot requested a review from dgrisonnet November 9, 2022 16:27

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 9, 2022

olivierlemasle mentioned this pull request Nov 9, 2022

Add e2e tests for prometheus-adapter kubernetes/test-infra#27948

Merged

olivierlemasle force-pushed the e2e branch from c7d81e2 to bc9b9eb Compare November 10, 2022 17:03

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 10, 2022

olivierlemasle force-pushed the e2e branch from bc9b9eb to 6175376 Compare November 29, 2022 16:10

olivierlemasle force-pushed the e2e branch 2 times, most recently from dede4e4 to 594e8a5 Compare December 8, 2022 22:07

dgrisonnet reviewed Dec 9, 2022

View reviewed changes

olivierlemasle force-pushed the e2e branch from 594e8a5 to 7f47f63 Compare December 16, 2022 11:11

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 16, 2022

olivierlemasle force-pushed the e2e branch 2 times, most recently from 7ea504c to 33e5bc9 Compare December 18, 2022 21:45

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 18, 2022

olivierlemasle force-pushed the e2e branch from 33e5bc9 to 0d2bf41 Compare December 18, 2022 21:52

olivierlemasle added 2 commits January 19, 2023 12:20

Add initial e2e tests

1145dbf

manifests: use node_ metrics

7bdb7f1

olivierlemasle force-pushed the e2e branch from 0d2bf41 to 7bdb7f1 Compare January 19, 2023 11:20

dgrisonnet approved these changes Jan 19, 2023

View reviewed changes

k8s-ci-robot assigned dgrisonnet Jan 19, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 19, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 19, 2023

k8s-ci-robot merged commit f607905 into kubernetes-sigs:master Jan 19, 2023

olivierlemasle deleted the e2e branch January 19, 2023 11:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial e2e tests #539

Add initial e2e tests #539

olivierlemasle commented Nov 9, 2022 •

edited

Loading

dgrisonnet Dec 9, 2022

olivierlemasle Dec 16, 2022

dgrisonnet Dec 16, 2022

olivierlemasle Dec 18, 2022

dgrisonnet Dec 9, 2022

olivierlemasle Dec 16, 2022

dgrisonnet Dec 16, 2022

olivierlemasle Dec 18, 2022

dgrisonnet Jan 19, 2023

dgrisonnet commented Jan 19, 2023 •

edited

Loading

olivierlemasle commented Jan 19, 2023 •

edited

Loading

dgrisonnet commented Jan 19, 2023

dgrisonnet left a comment

k8s-ci-robot commented Jan 19, 2023

olivierlemasle commented Jan 19, 2023

		@@ -0,0 +1,101 @@
		apiVersion: monitoring.coreos.com/v1
		kind: ServiceMonitor

Add initial e2e tests #539

Add initial e2e tests #539

Conversation

olivierlemasle commented Nov 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgrisonnet commented Jan 19, 2023 • edited Loading

olivierlemasle commented Jan 19, 2023 • edited Loading

dgrisonnet commented Jan 19, 2023

dgrisonnet left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jan 19, 2023

olivierlemasle commented Jan 19, 2023

olivierlemasle commented Nov 9, 2022 •

edited

Loading

dgrisonnet commented Jan 19, 2023 •

edited

Loading

olivierlemasle commented Jan 19, 2023 •

edited

Loading