Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial e2e tests #539

Merged
merged 2 commits into from
Jan 19, 2023
Merged

Conversation

olivierlemasle
Copy link
Member

@olivierlemasle olivierlemasle commented Nov 9, 2022

This adds E2E tests to prometheus-adapter.
These tests:

  • bring up a kind cluster (or re-use an existing k8s cluster)
  • build the prometheus-adapter image
  • deploy prometheus-operator and a prometheus instance
  • deploy prometheus-adapter using the manifests in deploy/manifests

It checks that everything can be deployed, and, as a first test, checks that:

  • nodes metrics bring positive values
  • there are some pod metrics

It also prints the prometheus-adapter logs.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 9, 2022
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 9, 2022
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 10, 2022
@olivierlemasle olivierlemasle force-pushed the e2e branch 2 times, most recently from dede4e4 to 594e8a5 Compare December 8, 2022 22:07
test/e2e/e2e_test.go Outdated Show resolved Hide resolved
return clientSet, metricsClientSet
}

func waitForPrometheusReady(ctx context.Context, t *testing.T, client dynamic.Interface, namespace string, name string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I tried but got stuck in dependency hells, as prometheus-operator brings Kubernetes, prometheus and opentelemetry dependencies in different versions than prometheus-adapter.
However, I could use prometheus-operator client (which is a different module with fewer dependencies) to make this function less verbose, if you think it's better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I tried but got stuck in dependency hells, as prometheus-operator brings Kubernetes, prometheus and opentelemetry dependencies in different versions than prometheus-adapter.

Oh that's unfortunate, I am fine with the current approach, just though that it could maybe be simplified by importing existing code.

However, I could use prometheus-operator client (which is a different module with fewer dependencies) to make this function less verbose, if you think it's better.

Yeah that's a good idea

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the tests to use the prometheus-operator client.

test/e2e/e2e_test.go Outdated Show resolved Hide resolved
test/e2e/e2e_test.go Outdated Show resolved Hide resolved
@@ -0,0 +1,101 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this ServiceMonitor more minimal? We shouldn't need all the relabeling rules here and the only path that we want to scrape on Kubelet is /metrics/resources.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the ServiceMonitor more minimal. However, I'm still scraping /metrics/cadvisor, not /metrics/resource, which do not provide container_cpu_usage_seconds_total{id='/'}.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was renamed to container_cpu_usage_seconds in kubernetes/kubernetes#86282. Since /metrics/resource is a more lightweight version of /metrics/cadvisor, it would be better to use the new endpoint and the new metric.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, kubernetes/kubernetes#86282 was kind of reverted by kubernetes/kubernetes#89540 in Kubernetes 1.19, so the metric is named container_cpu_usage_seconds_total (for k8s < 1.18 and k8s >= 1.19), and that's the metric documented.

What I meant is that, using the path /metrics/resource, querycontainer_cpu_usage_seconds_total{id='/'} returns nothing, because the label id is never empty (however it works on path /metrics/cadvisor).

So I've updated the configmap manifest to use node_cpu_usage_seconds_total and node_memory_working_set_bytes instead. I can make a separate PR for that if you prefer. I was unsure because you suggested yourself in #531 (comment) to use {id='/'} for node queries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that, using the path /metrics/resource, querycontainer_cpu_usage_seconds_total{id='/'} returns nothing, because the label id is never empty (however it works on path /metrics/cadvisor

Oh that's interesting, I wasn't aware that id wasn't included in the metric exposed /metrics/resource. I guess an alternative could then be to group by node label since it is injected by prometheus-operator, but at this point I am not really sure what's best between that and depending on node-exporter. For sure we should move away from /metrics/cadvisor since it is bound to disappear.

test/prometheus-manifests/prometheus.yaml Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 16, 2022
@olivierlemasle olivierlemasle force-pushed the e2e branch 2 times, most recently from 7ea504c to 33e5bc9 Compare December 18, 2022 21:45
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 18, 2022
@dgrisonnet
Copy link
Member

dgrisonnet commented Jan 19, 2023

Thinking about it again, let's move away from /metrics/cadvisor for node metrics and rely on node-exporter instead since we can't get the node-level information from /metrics/resource. That's what most of the community is using anyway and we've already had asks to switch: #516

So for the tests we would want:

  1. ServiceMonitor on kubelet /metrics/resource for CPU queries
  2. ServiceMonitor on node-exporter for Node queries

@olivierlemasle
Copy link
Member Author

olivierlemasle commented Jan 19, 2023

@dgrisonnet I'm not sure I understand the issue with the way it works in this PR.

node_cpu_usage_seconds_total and node_memory_working_set_bytes are node metrics provided by the kubelet itself (cf https://github.com/kubernetes/kubernetes/blob/8f94681cd294aa8cfd3407b8191f6c70214973a4/pkg/kubelet/metrics/collectors/resource_metrics.go#L30-L42), so we dont't need node-exporter, do we?

That being said, I guess that prometheus-adapter is often used in conjunction with node-exporter (in kube-prometheus or openshift's cluster-monitoring-operator), so I'm not against using node-exporter in default manifests and in E2E.

@dgrisonnet
Copy link
Member

😩 I totally forgot that node-level metrics were introduced sorry about that.

IIRC node-exporter and kubelet are getting the data from the same place so reducing the dependencies to just kubelet would definitely be better.

Copy link
Member

@dgrisonnet dgrisonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for bearing with me on this one. As a follow-up we should enable the tests in CI.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 19, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrisonnet, olivierlemasle

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 19, 2023
@k8s-ci-robot k8s-ci-robot merged commit f607905 into kubernetes-sigs:master Jan 19, 2023
@olivierlemasle olivierlemasle deleted the e2e branch January 19, 2023 11:30
@olivierlemasle
Copy link
Member Author

Thank you @dgrisonnet 🎉

I've just updated kubernetes/test-infra#27948

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants