feat: fix app label of metrics svc for ServiceMonitor discovery #229

leotomas837 · 2023-04-26T13:36:15Z

The ServiceMonitor targets both the web hook service and the metrics service. Yet, only the metrics service must be scraped (the probe could be scraped too via the blackbox-exporter, but that is a subject for another PR). The webhook service must not be discovered by Prometheus.

Prerequisites

Having a Kubernetes cluster running with the cert-manager/approver-policy chart installed and a prometheus instance ready to scrape metrics.

How to reproduce

Make your Prometheus instance is scrape your approver-policy metrics. For this purpose, set the following values:

nameOverride: cert-manager-approver-policy

app:
  metrics:
    port: 9402
    service:
      enabled: true
      servicemonitor:
        enabled: true

Open the Prometheus UI, you should see something similar:

I have 2 replicas, hence the (4/4 up) in the target, but the point is that for a given replica, there is currently 2 services scraped: the webhook Service and the metrics Service. The metrics service should obviously be scraped, but not the webhook service (not even listening on port 9402 but on port 443) as it is not serving any metrics.

That is happening because the ServiceMonitor here is scraping on the following app label:

spec:
  selector:
    matchLabels:
      app: {{ include "cert-manager-approver-policy.name" . }}

Yet, both the metrics Service and the webhook Service have this label, check here and here.

How to test

By changing the app label of the metrics Service and setting the ServiceMonitor to select the new same app label, only the metrics Service gets scraped and you should see the following in your Prometheus UI (2/2 up) for 2 replicas:

Simply apply the following JSON Patches to your chart installed in namespace cert-manager (I am using Helmfile so this is Helmfile syntax, but any tool applying the following JSON Patches obviously works):

jsonPatches:
  # The ServiceMonitor currently targets both the metrics endpoint and the webhook endpoint
  - target:
      group: ""
      version: v1
      kind: Service
      name: cert-manager-approver-policy-metrics
      namespace: cert-manager
    patch:
      - op: replace
        path: "/metadata/labels/app"
        value: cert-manager-approver-policy-metrics
  - target:
      group: monitoring.coreos.com
      version: v1
      kind: ServiceMonitor
      name: cert-manager-approver-policy
      namespace: cert-manager
    patch:
      - op: replace
        path: "/spec/selector/matchLabels/app"
        value: cert-manager-approver-policy-metrics

Signed-off-by: Leo Tomas <tomasl@rcvs.org.uk>

jetstack-bot · 2023-04-26T13:36:20Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: leotomas837
Once this PR has been reviewed and has the lgtm label, please assign sgtcodfish for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jetstack-bot · 2023-04-26T13:36:25Z

Hi @leotomas837. Thanks for your PR.

I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wallrj

Thanks @leotomas837

This looks ok to me, but could you add some instructions to the PR description explaining how to recreate the problem and how to test this improvement.

/ok-to-test

leotomas837 · 2023-04-28T05:52:03Z

@wallrj

Instructions added

leotomas837 · 2023-05-05T18:50:58Z

@wallrj

Did you get the chance to have a look on this ? I should be straight forward. Let me know if you need anything else from me.

wallrj · 2023-05-10T09:15:53Z

@leotomas837 Sorry for the delay. After a few failed attempts to get the Prometheus operator installed I finally got it working this morning and was able to recreate the problem you described.

As you can probably tell, I don't know much about Prometheus, but it seemed strange to me that the ServiceMonitor should show the duplicate endpoints, despite us specifying targetPort: 9402. Surely, the prometheus operator should be able to see that only one of the matching services targets that port?

So I tried changing targetPort: 9402 to port: metrics, and that seems to result in only the metrics target appearing in Prometheus

$ git diff
diff --git a/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml b/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml
index 43f1751..2e72caa 100644
--- a/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml
+++ b/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml
@@ -20,7 +20,7 @@ spec:
     matchNames:
       - {{ .Release.Namespace }}
   endpoints:
-  - targetPort: {{ .Values.app.metrics.port }}
+  - port: metrics
     path: "/metrics"
     interval: {{ .Values.app.metrics.service.servicemonitor.interval }}
     scrapeTimeout: {{ .Values.app.metrics.service.servicemonitor.scrapeTimeout }}

What do you think? Are there any downsides to that approach?

wallrj · 2023-05-10T10:15:45Z

More discussion here about confusing behaviour of targetPort :

ServiceMonitor endpoints port and targetPort ambiguity prometheus-operator/prometheus-operator#2515

wallrj

I suggest we change targetPort to port: metrics, which has the same effect.
It seems to allow prometheus operator servicemonitor controller to select only the services that refer to ports with the name metrics.

wallrj · 2023-05-10T10:18:55Z

deploy/charts/approver-policy/templates/metrics-service.yaml

@@ -5,7 +5,7 @@ metadata:
 name: {{ include "cert-manager-approver-policy.name" . }}-metrics
 namespace: {{ .Release.Namespace | quote }}
 labels:
- app: {{ include "cert-manager-approver-policy.name" . }}
+ app: {{ include "cert-manager-approver-policy.name" . }}-metrics


The disadvantage of this, is that I can no longer easily discover all the resources related to this app. E.g.

$ kubectl get all --namespace cert-manager --selector app=cer t-manager-approver-policy NAME READY STATUS RESTARTS AGE pod/cert-manager-approver-policy-c68dc44c9-5785f 1/1 Running 0 107m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/cert-manager-approver-policy ClusterIP 10.96.10.146 <none> 443/TCP 107m service/cert-manager-approver-policy-metrics ClusterIP 10.96.133.139 <none> 9402/TCP 107m NAME DESIRED CURRENT READY AGE replicaset.apps/cert-manager-approver-policy-c68dc44c9 1 1 1 107m

@wallrj

I was looking at the same GitHub issue, and indeed you are right: targetPort setup the Prometheus's scrape config to scrape the container directly (for each service discovered, hence the duplicated target).
So 2 solutions: tightening the service discovery (my first approach), or targeting the service (by name) directly and not the container (your approach).

Happy to implement your approach, I like to think of a ServiceMonitor to actually be setup with service attributes and not container ones...

Great. Then please modify the PR to use the the port: metrics approach and ping me for another review.

jetstack-bot · 2024-02-27T02:58:57Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wallrj · 2024-07-25T11:06:51Z

Supplanted by #471

feat: fix app label of metrics svc for ServiceMonitor discovery

1254ddf

Signed-off-by: Leo Tomas <tomasl@rcvs.org.uk>

jetstack-bot added the dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. label Apr 26, 2023

jetstack-bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 26, 2023

jetstack-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 26, 2023

wallrj reviewed Apr 26, 2023

View reviewed changes

jetstack-bot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 26, 2023

wallrj requested changes May 10, 2023

View reviewed changes

jetstack-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 27, 2024

wallrj mentioned this pull request Jul 25, 2024

BUGFIX: Avoid duplicate Prometheus scrape targets by using a named port in the ServiceMonitor #471

Merged

wallrj closed this Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fix app label of metrics svc for ServiceMonitor discovery #229

feat: fix app label of metrics svc for ServiceMonitor discovery #229

leotomas837 commented Apr 26, 2023 •

edited

Loading

jetstack-bot commented Apr 26, 2023

jetstack-bot commented Apr 26, 2023

wallrj left a comment

leotomas837 commented Apr 28, 2023

leotomas837 commented May 5, 2023

wallrj commented May 10, 2023

wallrj commented May 10, 2023

wallrj left a comment

wallrj May 10, 2023

leotomas837 May 10, 2023 •

edited

Loading

wallrj May 10, 2023

jetstack-bot commented Feb 27, 2024

wallrj commented Jul 25, 2024

feat: fix app label of metrics svc for ServiceMonitor discovery #229

feat: fix app label of metrics svc for ServiceMonitor discovery #229

Conversation

leotomas837 commented Apr 26, 2023 • edited Loading

Prerequisites

How to reproduce

How to test

jetstack-bot commented Apr 26, 2023

jetstack-bot commented Apr 26, 2023

wallrj left a comment

Choose a reason for hiding this comment

leotomas837 commented Apr 28, 2023

leotomas837 commented May 5, 2023

wallrj commented May 10, 2023

wallrj commented May 10, 2023

wallrj left a comment

Choose a reason for hiding this comment

wallrj May 10, 2023

Choose a reason for hiding this comment

leotomas837 May 10, 2023 • edited Loading

Choose a reason for hiding this comment

wallrj May 10, 2023

Choose a reason for hiding this comment

jetstack-bot commented Feb 27, 2024

wallrj commented Jul 25, 2024

leotomas837 commented Apr 26, 2023 •

edited

Loading

leotomas837 May 10, 2023 •

edited

Loading