-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial e2e tests #539
Conversation
dede4e4
to
594e8a5
Compare
test/e2e/e2e_test.go
Outdated
return clientSet, metricsClientSet | ||
} | ||
|
||
func waitForPrometheusReady(ctx context.Context, t *testing.T, client dynamic.Interface, namespace string, name string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about reusing the prometheus-operator e2e test framework for that? https://github.com/prometheus-operator/prometheus-operator/blob/1dfecab7c2706f3591d6bb6d5eb8f6b2518f1b45/test/framework/prometheus.go#L309-L320
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I tried but got stuck in dependency hells, as prometheus-operator brings Kubernetes, prometheus and opentelemetry dependencies in different versions than prometheus-adapter.
However, I could use prometheus-operator client (which is a different module with fewer dependencies) to make this function less verbose, if you think it's better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I tried but got stuck in dependency hells, as prometheus-operator brings Kubernetes, prometheus and opentelemetry dependencies in different versions than prometheus-adapter.
Oh that's unfortunate, I am fine with the current approach, just though that it could maybe be simplified by importing existing code.
However, I could use prometheus-operator client (which is a different module with fewer dependencies) to make this function less verbose, if you think it's better.
Yeah that's a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the tests to use the prometheus-operator client.
@@ -0,0 +1,101 @@ | |||
apiVersion: monitoring.coreos.com/v1 | |||
kind: ServiceMonitor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this ServiceMonitor more minimal? We shouldn't need all the relabeling rules here and the only path that we want to scrape on Kubelet is /metrics/resources
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the ServiceMonitor more minimal. However, I'm still scraping /metrics/cadvisor
, not /metrics/resource
, which do not provide container_cpu_usage_seconds_total{id='/'}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was renamed to container_cpu_usage_seconds
in kubernetes/kubernetes#86282. Since /metrics/resource
is a more lightweight version of /metrics/cadvisor
, it would be better to use the new endpoint and the new metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, kubernetes/kubernetes#86282 was kind of reverted by kubernetes/kubernetes#89540 in Kubernetes 1.19, so the metric is named container_cpu_usage_seconds_total
(for k8s < 1.18 and k8s >= 1.19), and that's the metric documented.
What I meant is that, using the path /metrics/resource
, querycontainer_cpu_usage_seconds_total{id='/'}
returns nothing, because the label id
is never empty (however it works on path /metrics/cadvisor
).
So I've updated the configmap manifest to use node_cpu_usage_seconds_total
and node_memory_working_set_bytes
instead. I can make a separate PR for that if you prefer. I was unsure because you suggested yourself in #531 (comment) to use {id='/'}
for node queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that, using the path /metrics/resource, querycontainer_cpu_usage_seconds_total{id='/'} returns nothing, because the label id is never empty (however it works on path /metrics/cadvisor
Oh that's interesting, I wasn't aware that id
wasn't included in the metric exposed /metrics/resource
. I guess an alternative could then be to group by node
label since it is injected by prometheus-operator, but at this point I am not really sure what's best between that and depending on node-exporter. For sure we should move away from /metrics/cadvisor since it is bound to disappear.
7ea504c
to
33e5bc9
Compare
Thinking about it again, let's move away from /metrics/cadvisor for node metrics and rely on node-exporter instead since we can't get the node-level information from /metrics/resource. That's what most of the community is using anyway and we've already had asks to switch: #516 So for the tests we would want:
|
@dgrisonnet I'm not sure I understand the issue with the way it works in this PR.
That being said, I guess that prometheus-adapter is often used in conjunction with node-exporter (in kube-prometheus or openshift's cluster-monitoring-operator), so I'm not against using node-exporter in default manifests and in E2E. |
😩 I totally forgot that node-level metrics were introduced sorry about that. IIRC node-exporter and kubelet are getting the data from the same place so reducing the dependencies to just kubelet would definitely be better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bearing with me on this one. As a follow-up we should enable the tests in CI.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgrisonnet, olivierlemasle The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thank you @dgrisonnet 🎉 I've just updated kubernetes/test-infra#27948 |
This adds E2E tests to prometheus-adapter.
These tests:
deploy/manifests
It checks that everything can be deployed, and, as a first test, checks that:
It also prints the prometheus-adapter logs.