"no values found for nginx metric request-success-rate" with Prometheus Operator and nginx provider #421

mkorejo · 2020-02-01T02:17:46Z

I installed Flagger with Flux as follows:

apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: flagger
  namespace: nginx-ingress
spec:
  releaseName: flagger
  chart:
    repository: https://flagger.app
    name: flagger
    version: 0.22.0
  values:
    crd:
      create: true
    meshProvider: nginx
    metricsServer: http://prometheus-operator-prometheus.prometheus-operator:9090

nginx-ingress is installed as follows:

apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: nginx-ingress
  namespace: nginx-ingress
spec:
  releaseName: nginx-ingress
  chart:
    repository: https://kubernetes-charts.storage.googleapis.com/
    name: nginx-ingress
    version: 1.24.4
  values:
    controller:
      extraArgs:
        publish-service: nginx-ingress/nginx-ingress-controller
        default-ssl-certificate: k1analyzer/k1analyzer-clusterwide-letsencrypt-secret
      metrics:
        enabled: true
        serviceMonitor:
          enabled: true
          additionalLabels:
            release: prometheus-operator
      service:
        externalTrafficPolicy: "Local"
  valuesFrom:
  - configMapKeyRef:
      name: nginx-ingress-config
      optional: true

We are also using the prometheus-operator. I can confirm from Prometheus dashboard that nginx metrics are being collected, and also confirmed flagger can connect to the metricsServer endpoint specified in the HelmRelease:

I made some changes to the podinfo helm chart to support creating an ingress and providing this to the canary spec. My canary spec:

> kg canary -o yaml 
apiVersion: v1
items:
- apiVersion: flagger.app/v1alpha3
  kind: Canary
  metadata:
    annotations:
      flux.weave.works/antecedent: flagger:helmrelease/podinfo-frontend
    creationTimestamp: "2020-01-30T22:13:32Z"
    generation: 8
    labels:
      app: frontend
      chart: podinfo-3.1.0
      heritage: Tiller
      release: podinfo-frontend
    name: podinfo-frontend
    namespace: flagger
    resourceVersion: "4526699"
    selfLink: /apis/flagger.app/v1alpha3/namespaces/flagger/canaries/podinfo-frontend
    uid: c09d2cd6-43ad-11ea-b8b2-7222c6d53b77
  spec:
    canaryAnalysis:
      interval: 15s
      maxWeight: 50
      metrics:
      - interval: 1m
        name: request-success-rate
        threshold: 99
      - interval: 1m
        name: request-duration
        threshold: 500
      stepWeight: 5
      threshold: 10
      webhooks:
      - metadata:
          cmd: curl -sd 'test' http://podinfo-frontend-canary.flagger:9898/token |
            grep token
          type: bash
        name: acceptance-test
        timeout: 30s
        type: pre-rollout
        url: http://flagger-loadtester.flagger/
      - metadata:
          cmd: hey -z 1m -q 5 -c 2 http://podinfo-frontend.flagger:9898
        name: load-test-get
        timeout: 5s
        url: http://flagger-loadtester.flagger/
      - metadata:
          cmd: 'hey -z 1m -q 5 -c 2 -m POST -d ''{"test": true}'' http://podinfo-frontend.flagger:9898/echo'
        name: load-test-post
        timeout: 5s
        url: http://flagger-loadtester.flagger/
    ingressRef:
      apiVersion: extensions/v1beta1
      kind: Ingress
      name: podinfo-frontend
    progressDeadlineSeconds: 60
    provider: nginx
    service:
      port: 9898
    targetRef:
      apiVersion: apps/v1
      kind: Deployment
      name: podinfo-frontend
  status:
    canaryWeight: 0
    conditions:
    - lastTransitionTime: "2020-02-01T01:42:13Z"
      lastUpdateTime: "2020-02-01T01:42:13Z"
      message: Canary analysis failed, deployment scaled to zero.
      reason: Failed
      status: "False"
      type: Promoted
    failedChecks: 0
    iterations: 0
    lastAppliedSpec: "3118295861456058183"
    lastTransitionTime: "2020-02-01T01:42:13Z"
    phase: Failed
    trackedConfigs:
      configmap/podinfo-frontend: 270c8d855a0c1374
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

podinfo-frontend:

apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: podinfo-frontend
  namespace: flagger
spec:
  forceUpgrade: true
  rollback:
    enable: true
    force: true
    wait: true
  releaseName: podinfo-frontend
  chart:
    # repository: https://flagger.app
    # name: podinfo
    # version: 3.1.0
    git: https://github.com/mkorejo/flagger.git
    path: charts/podinfo
    ref: master
  values:
    # backend: http://podinfo-backend:9898/echo
    canary:
      enabled: true
      provider: nginx
      acceptancetest:
        enabled: true
        url: http://flagger-loadtester.flagger/
      loadtest:
        enabled: true
        url: http://flagger-loadtester.flagger/
    hpa:
      enabled: false
      minReplicas: 2
      maxReplicas: 4
      cpu: 80
      memory: 512Mi
    image:
      tag: 3.1.1
    ingress:
      enabled: true
      hostname: podinfo.poc.k1analyzer-nonprod.com
      annotations:
        kubernetes.io/ingress.class: "nginx"
        cert-manager.io/cluster-issuer: "az-60e-letsencrypt"
      tls:
        - hosts:
            - podinfo.poc.k1analyzer-nonprod.com
          secretName: podinfo.poc.k1analyzer-nonprod.com-tls
    nameOverride: frontend

My issue: every canary progression fails with:
Halt advancement no values found for nginx metric request-success-rate probably podinfo-frontend.flagger is not receiving traffic

I confirmed the hey loadtesting is working from the flagger-loadtester pod. Any thoughts as to what's going on? Thanks very much.

The text was updated successfully, but these errors were encountered:

stefanprodan · 2020-02-01T08:50:19Z

You need to do the load test against the public address so that traffic goes over nginx, see https://docs.flagger.app/usage/nginx-progressive-delivery

stefanprodan · 2020-02-01T08:51:38Z

Or use the ClusterIP address of your nginx ingress and set the Host header in hey.

mkorejo · 2020-02-01T20:21:19Z

Hi @stefanprodan, thanks for the quick reply. Also great work on Flagger!

Unfortunately, still having issues even when switching the load tests to hit the public IP/hostname:

> kd canary
Name:         podinfo-frontend
Namespace:    flagger
Labels:       app=frontend
              chart=podinfo-3.1.0
              heritage=Tiller
              release=podinfo-frontend
Annotations:  flux.weave.works/antecedent: flagger:helmrelease/podinfo-frontend
API Version:  flagger.app/v1alpha3
Kind:         Canary
Metadata:
  Creation Timestamp:  2020-01-30T22:13:32Z
  Generation:          10
  Resource Version:    4708962
  Self Link:           /apis/flagger.app/v1alpha3/namespaces/flagger/canaries/podinfo-frontend
  UID:                 c09d2cd6-43ad-11ea-b8b2-7222c6d53b77
Spec:
  Canary Analysis:
    Interval:    15s
    Max Weight:  50
    Metrics:
      Interval:   1m
      Name:       request-success-rate
      Threshold:  99
      Interval:   1m
      Name:       request-duration
      Threshold:  500
    Step Weight:  5
    Threshold:    10
    Webhooks:
      Metadata:
        Cmd:    curl -sd 'test' http://podinfo-frontend-canary.flagger:9898/token | grep token
        Type:   bash
      Name:     acceptance-test
      Timeout:  30s
      Type:     pre-rollout
      URL:      http://flagger-loadtester.flagger/
      Metadata:
        Cmd:    hey -z 1m -q 5 -c 2 http://podinfo.poc.k1analyzer-nonprod.com
      Name:     load-test-get
      Timeout:  5s
      URL:      http://flagger-loadtester.flagger/
      Metadata:
        Cmd:    hey -z 1m -q 5 -c 2 -m POST -d '{"test": true}' http://podinfo.poc.k1analyzer-nonprod.com/echo
      Name:     load-test-post
      Timeout:  5s
      URL:      http://flagger-loadtester.flagger/
  Ingress Ref:
    API Version:              extensions/v1beta1
    Kind:                     Ingress
    Name:                     podinfo-frontend
  Progress Deadline Seconds:  60
  Provider:                   nginx
  Service:
    Port:  9898
    Traffic Policy:
      Tls:
        Mode:  DISABLE
  Target Ref:
    API Version:  apps/v1
    Kind:         Deployment
    Name:         podinfo-frontend
Status:
  Canary Weight:  0
  Conditions:
    Last Transition Time:  2020-02-01T18:10:43Z
    Last Update Time:      2020-02-01T18:10:43Z
    Message:               Canary analysis failed, deployment scaled to zero.
    Reason:                Failed
    Status:                False
    Type:                  Promoted
  Failed Checks:           0
  Iterations:              0
  Last Applied Spec:       3118295861456058183
  Last Transition Time:    2020-02-01T18:10:43Z
  Phase:                   Failed
  Tracked Configs:
    configmap/podinfo-frontend:  270c8d855a0c1374
Events:
  Type     Reason  Age                   From     Message
  ----     ------  ----                  ----     -------
  Warning  Synced  11m (x3 over 16h)     flagger  Rolling back podinfo-frontend.flagger failed checks threshold reached 10
  Warning  Synced  11m (x3 over 16h)     flagger  Canary failed! Scaling down podinfo-frontend.flagger
  Normal   Synced  3m59s (x4 over 16h)   flagger  New revision detected! Scaling up podinfo-frontend.flagger
  Normal   Synced  3m44s (x13 over 16h)  flagger  Starting canary analysis for podinfo-frontend.flagger
  Normal   Synced  3m44s (x3 over 16h)   flagger  Pre-rollout check acceptance-test passed
  Normal   Synced  3m44s (x3 over 16h)   flagger  Advance podinfo-frontend.flagger canary weight 5
  Warning  Synced  119s (x27 over 16h)   flagger  Halt advancement no values found for nginx metric request-success-rate probably podinfo-frontend.flagger is not receiving traffic

I updated the Flagger HelmRelease to install another prometheus (prometheus.install=true) and this seems to be working. Need to dig into how to get this to work with Prometheus Operator.

stefanprodan · 2020-02-22T07:53:36Z

Any news on this?

grzegdl · 2020-03-11T20:45:35Z

I have the same issue when using prometheus-operator and recent nginx-ingress. After a quick look it seems that flagger is using different namespace label schema.

i.e. in case of podinfo test:

sum(rate(nginx_ingress_controller_requests{namespace="test",ingress="podinfo",status!~"5.*"}[1m]))/sum(rate(nginx_ingress_controller_requests{namespace="test",ingress="podinfo"}[1m]))*100

instead of:

sum(rate(nginx_ingress_controller_requests{exported_namespace="test",ingress="podinfo",status!~"5.*"}[1m]))/sum(rate(nginx_ingress_controller_requests{exported_namespace="test",ingress="podinfo"}[1m]))*100

In short: namespace -> exported_namespace

stefanprodan · 2020-03-12T20:27:52Z

I guess prometheus-operator changes that label since Flagger e2e tests for NGINX are passing #489.

The solution is to use metric templates e.g.:

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: error-rate
  namespace: ingress-nginx
spec:
  provider:
    type: prometheus
    address: http://promethues.monitoring:9090
  query: |
    100 - sum(
      rate(
        nginx_ingress_controller_requests{
          exported_namespace="{{ namespace }}",
          ingress="{{ ingress }}",
          status!~"5.*"
        }[{{ interval }}]
      )
    ) 
    / 
    sum(
      rate(
        nginx_ingress_controller_requests{
          exported_namespace="{{ namespace }}",
          ingress="{{ ingress }}"
        }[{{ interval }}]
      )
    ) 
    * 100

Replace request-success-rate with:

    metrics:
    - name: error-rate
      templateRef:
        name: error-rate
        namespace: ingress-nginx
      thresholdRange:
        max: 1
      interval: 1m

grzegdl · 2020-03-13T10:32:35Z

Yea, that did the trick. That's probably a common use-case that people will be using flagger along with prometheus-operator. Maybe issue should be documented somewhere.

stefanprodan · 2020-04-03T08:13:05Z

This has been documented here https://docs.flagger.app/v/master/tutorials/prometheus-operator

davidriskified · 2021-02-11T12:24:08Z

Link is broken :-)

L3o-pold · 2021-03-31T13:09:12Z

https://docs.flagger.app/tutorials/prometheus-operator

stefanprodan added the question Further information is requested label Feb 22, 2020

stefanprodan closed this as completed Apr 3, 2020

jiri-pinkava mentioned this issue Apr 2, 2023

Blue Green Deployment | Error: "no values found for kubernetes metric request-success-rate" #1401

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"no values found for nginx metric request-success-rate" with Prometheus Operator and nginx provider #421

"no values found for nginx metric request-success-rate" with Prometheus Operator and nginx provider #421

mkorejo commented Feb 1, 2020

stefanprodan commented Feb 1, 2020

stefanprodan commented Feb 1, 2020

mkorejo commented Feb 1, 2020 •

edited

Loading

stefanprodan commented Feb 22, 2020

grzegdl commented Mar 11, 2020

stefanprodan commented Mar 12, 2020 •

edited

Loading

grzegdl commented Mar 13, 2020

stefanprodan commented Apr 3, 2020

davidriskified commented Feb 11, 2021

L3o-pold commented Mar 31, 2021

"no values found for nginx metric request-success-rate" with Prometheus Operator and nginx provider #421

"no values found for nginx metric request-success-rate" with Prometheus Operator and nginx provider #421

Comments

mkorejo commented Feb 1, 2020

stefanprodan commented Feb 1, 2020

stefanprodan commented Feb 1, 2020

mkorejo commented Feb 1, 2020 • edited Loading

stefanprodan commented Feb 22, 2020

grzegdl commented Mar 11, 2020

stefanprodan commented Mar 12, 2020 • edited Loading

grzegdl commented Mar 13, 2020

stefanprodan commented Apr 3, 2020

davidriskified commented Feb 11, 2021

L3o-pold commented Mar 31, 2021

mkorejo commented Feb 1, 2020 •

edited

Loading

stefanprodan commented Mar 12, 2020 •

edited

Loading