-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: fix app label of metrics svc for ServiceMonitor discovery #229
feat: fix app label of metrics svc for ServiceMonitor discovery #229
Conversation
Signed-off-by: Leo Tomas <tomasl@rcvs.org.uk>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: leotomas837 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @leotomas837. Thanks for your PR. I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @leotomas837
This looks ok to me, but could you add some instructions to the PR description explaining how to recreate the problem and how to test this improvement.
/ok-to-test
Instructions added |
Did you get the chance to have a look on this ? I should be straight forward. Let me know if you need anything else from me. |
@leotomas837 Sorry for the delay. After a few failed attempts to get the Prometheus operator installed I finally got it working this morning and was able to recreate the problem you described. As you can probably tell, I don't know much about Prometheus, but it seemed strange to me that the ServiceMonitor should show the duplicate endpoints, despite us specifying So I tried changing $ git diff
diff --git a/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml b/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml
index 43f1751..2e72caa 100644
--- a/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml
+++ b/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml
@@ -20,7 +20,7 @@ spec:
matchNames:
- {{ .Release.Namespace }}
endpoints:
- - targetPort: {{ .Values.app.metrics.port }}
+ - port: metrics
path: "/metrics"
interval: {{ .Values.app.metrics.service.servicemonitor.interval }}
scrapeTimeout: {{ .Values.app.metrics.service.servicemonitor.scrapeTimeout }} What do you think? Are there any downsides to that approach? |
More discussion here about confusing behaviour of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we change targetPort
to port: metrics
, which has the same effect.
It seems to allow prometheus operator servicemonitor controller to select only the services that refer to ports with the name metrics
.
@@ -5,7 +5,7 @@ metadata: | |||
name: {{ include "cert-manager-approver-policy.name" . }}-metrics | |||
namespace: {{ .Release.Namespace | quote }} | |||
labels: | |||
app: {{ include "cert-manager-approver-policy.name" . }} | |||
app: {{ include "cert-manager-approver-policy.name" . }}-metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The disadvantage of this, is that I can no longer easily discover all the resources related to this app
. E.g.
$ kubectl get all --namespace cert-manager --selector app=cer
t-manager-approver-policy
NAME READY STATUS RESTARTS AGE
pod/cert-manager-approver-policy-c68dc44c9-5785f 1/1 Running 0 107m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cert-manager-approver-policy ClusterIP 10.96.10.146 <none> 443/TCP 107m
service/cert-manager-approver-policy-metrics ClusterIP 10.96.133.139 <none> 9402/TCP 107m
NAME DESIRED CURRENT READY AGE
replicaset.apps/cert-manager-approver-policy-c68dc44c9 1 1 1 107m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at the same GitHub issue, and indeed you are right: targetPort
setup the Prometheus's scrape config to scrape the container directly (for each service discovered, hence the duplicated target).
So 2 solutions: tightening the service discovery (my first approach), or targeting the service (by name) directly and not the container (your approach).
Happy to implement your approach, I like to think of a ServiceMonitor
to actually be setup with service attributes and not container ones...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Then please modify the PR to use the the port: metrics
approach and ping me for another review.
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Supplanted by #471 |
The ServiceMonitor targets both the web hook service and the metrics service. Yet, only the metrics service must be scraped (the probe could be scraped too via the blackbox-exporter, but that is a subject for another PR). The webhook service must not be discovered by Prometheus.
Prerequisites
Having a Kubernetes cluster running with the cert-manager/approver-policy chart installed and a prometheus instance ready to scrape metrics.
How to reproduce
I have 2 replicas, hence the (4/4 up) in the target, but the point is that for a given replica, there is currently 2 services scraped: the webhook Service and the metrics Service. The metrics service should obviously be scraped, but not the webhook service (not even listening on port
9402
but on port443
) as it is not serving any metrics.That is happening because the ServiceMonitor here is scraping on the following app label:
Yet, both the metrics Service and the webhook Service have this label, check here and here.
How to test
By changing the
app
label of the metrics Service and setting the ServiceMonitor to select the new sameapp
label, only the metrics Service gets scraped and you should see the following in your Prometheus UI (2/2 up) for 2 replicas:Simply apply the following JSON Patches to your chart installed in namespace
cert-manager
(I am using Helmfile so this is Helmfile syntax, but any tool applying the following JSON Patches obviously works):