Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.27.1 breaks integration with Datadog #5022

Closed
juniorz opened this issue Feb 5, 2020 · 9 comments · Fixed by #5023
Closed

0.27.1 breaks integration with Datadog #5022

juniorz opened this issue Feb 5, 2020 · 9 comments · Fixed by #5023
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@juniorz
Copy link

juniorz commented Feb 5, 2020

NGINX Ingress controller version:

0.27.1

Kubernetes version (use kubectl version):

v1.14.9-eks-c0eccc

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

What happened:

After upgrading to 0.27.1, ingress-nginx doesn't expose the individual values of the nginx_ingress_controller_ingress_upstream_latency_seconds metric anymore.

What you expected to happen:

Individual values of the nginx_ingress_controller_ingress_upstream_latency_seconds metric exposed on http://127.0.0.1:10254/metrics.

How to reproduce it:

See: DataDog/integrations-core#5577

Anything else we need to know:

/kind bug

@juniorz juniorz added the kind/bug Categorizes issue or PR as related to a bug. label Feb 5, 2020
@aledbf
Copy link
Member

aledbf commented Feb 6, 2020

@juniorz please update to 0.28.0. There is an update to the prometheus client_go library.

Edit: also, I do see such metrics in the mentioned version

@juniorz
Copy link
Author

juniorz commented Feb 6, 2020

I noticed the new version right before reporting the issue, and I have tested it with no success.

$ kubectl exec -it -n kube-ingress ingress-nginx-7855754764-lxw8x -c nginx-ingress-controller -- curl http://127.0.0.1:10254/metrics | grep nginx_ingress_controller_ingress_upstream_latency_seconds && sleep 1
# HELP nginx_ingress_controller_ingress_upstream_latency_seconds Upstream service latency per Ingress
# TYPE nginx_ingress_controller_ingress_upstream_latency_seconds summary
nginx_ingress_controller_ingress_upstream_latency_seconds_sum{controller_class="nginx",controller_namespace="kube-ingress",controller_pod="ingress-nginx-7855754764-lxw8x",ingress="hello-world",namespace="default",service="hello-world"} 2.4480000000000017
nginx_ingress_controller_ingress_upstream_latency_seconds_count{controller_class="nginx",controller_namespace="kube-ingress",controller_pod="ingress-nginx-7855754764-lxw8x",ingress="hello-world",namespace="default",service="hello-world"} 10742
$ kubectl exec -it -n kube-ingress ingress-nginx-7855754764-lxw8x -c nginx-ingress-controller -- curl http://127.0.0.1:10254/metrics | grep nginx_ingress_controller_ingress_upstream_latency_seconds && sleep 1
# HELP nginx_ingress_controller_ingress_upstream_latency_seconds Upstream service latency per Ingress
# TYPE nginx_ingress_controller_ingress_upstream_latency_seconds summary
nginx_ingress_controller_ingress_upstream_latency_seconds_sum{controller_class="nginx",controller_namespace="kube-ingress",controller_pod="ingress-nginx-7855754764-lxw8x",ingress="hello-world",namespace="default",service="hello-world"} 2.4680000000000017
nginx_ingress_controller_ingress_upstream_latency_seconds_count{controller_class="nginx",controller_namespace="kube-ingress",controller_pod="ingress-nginx-7855754764-lxw8x",ingress="hello-world",namespace="default",service="hello-world"} 11041
$ kubectl exec -it -n kube-ingress ingress-nginx-7855754764-lxw8x -c nginx-ingress-controller -- curl http://127.0.0.1:10254/metrics | grep nginx_ingress_controller_ingress_upstream_latency_seconds && sleep 1
# HELP nginx_ingress_controller_ingress_upstream_latency_seconds Upstream service latency per Ingress
# TYPE nginx_ingress_controller_ingress_upstream_latency_seconds summary
nginx_ingress_controller_ingress_upstream_latency_seconds_sum{controller_class="nginx",controller_namespace="kube-ingress",controller_pod="ingress-nginx-7855754764-lxw8x",ingress="hello-world",namespace="default",service="hello-world"} 2.4680000000000017
nginx_ingress_controller_ingress_upstream_latency_seconds_count{controller_class="nginx",controller_namespace="kube-ingress",controller_pod="ingress-nginx-7855754764-lxw8x",ingress="hello-world",namespace="default",service="hello-world"} 11171
$ kubectl get pod -n kube-ingress ingress-nginx-7855754764-lxw8x -o json | jq ".spec.containers[1].image"
"quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.28.0"

@juniorz
Copy link
Author

juniorz commented Feb 6, 2020

If I use the same Deployment but change the version to 0.26.1 and runAsUser: 33 I have the missing metrics (note the multiple additional nginx_ingress_controller_ingress_upstream_latency_seconds and not only the _sum and _count):

$ kubectl exec -it -n kube-ingress ingress-nginx-757cb8c49-fm8gq -c nginx-ingress-controller -- curl http://127.0.0.1:10254/metrics | grep nginx_ingress_controller_ingress_upstream_latency_seconds && sleep 1
# HELP nginx_ingress_controller_ingress_upstream_latency_seconds Upstream service latency per Ingress
# TYPE nginx_ingress_controller_ingress_upstream_latency_seconds summary
nginx_ingress_controller_ingress_upstream_latency_seconds{controller_class="nginx",controller_namespace="kube-ingress",controller_pod="ingress-nginx-757cb8c49-fm8gq",ingress="hello-world",namespace="default",service="hello-world",quantile="0.5"} 0
nginx_ingress_controller_ingress_upstream_latency_seconds{controller_class="nginx",controller_namespace="kube-ingress",controller_pod="ingress-nginx-757cb8c49-fm8gq",ingress="hello-world",namespace="default",service="hello-world",quantile="0.9"} 0
nginx_ingress_controller_ingress_upstream_latency_seconds{controller_class="nginx",controller_namespace="kube-ingress",controller_pod="ingress-nginx-757cb8c49-fm8gq",ingress="hello-world",namespace="default",service="hello-world",quantile="0.99"} 0.004
nginx_ingress_controller_ingress_upstream_latency_seconds_sum{controller_class="nginx",controller_namespace="kube-ingress",controller_pod="ingress-nginx-757cb8c49-fm8gq",ingress="hello-world",namespace="default",service="hello-world"} 0.17600000000000007
nginx_ingress_controller_ingress_upstream_latency_seconds_count{controller_class="nginx",controller_namespace="kube-ingress",controller_pod="ingress-nginx-757cb8c49-fm8gq",ingress="hello-world",namespace="default",service="hello-world"} 2129
$ kubectl get pod -n kube-ingress ingress-nginx-757cb8c49-fm8gq -o json | jq ".spec.containers[1].image"
"quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1"

My first thought was 0.27.1+ requires some additional config (via CLI parameter or configmap) to report those metrics, but I wasn't able to find anything related in the changelog.

@aledbf
Copy link
Member

aledbf commented Feb 6, 2020

@juniorz please check #5023

Until the next release, you can use the image
quay.io/kubernetes-ingress-controller/nginx-ingress-controller-amd64:prometheus

@juniorz
Copy link
Author

juniorz commented Feb 10, 2020

FYI: we gave this image a try, but it seems to have more than the fix for the metrics. It also enables mirroring:

# nginx-ingress-controller-amd64:prometheus
$ kubectl exec -it -n kube-ingress ingress-nginx-596c798c4f-djv25 -c nginx-ingress-controller -- cat /etc/nginx/nginx.conf | grep mirror
			mirror /_mirror-912c49ba-3d2c-11ea-901f-12ee5ca50c5b;
			mirror_request_body on;
# nginx-ingress-controller:0.28.0
$ kubectl exec -it -n kube-ingress ingress-nginx-7855754764-5t469 -c nginx-ingress-controller -- cat /etc/nginx/nginx.conf | grep mirror

We are going to wait until this fix is released. Thank you!

@aledbf
Copy link
Member

aledbf commented Feb 10, 2020

FYI: we gave this image a try, but it seems to have more than the fix for the metrics. It also enables mirroring:

If you see that in the generated nginx.conf file it means at least one of your ingress definitions uses the mirror annotation. This output is not going to change in 0.29. I suggest you check that

@juniorz
Copy link
Author

juniorz commented Feb 11, 2020

I double checked that. All the verifications are from a sandbox environment, and we have never done any work related to mirroring.

We have no idea why this is defaulting to /_mirror-UUID, we only noticed because we started to see lots of 404s.

As we could not find any prometheus branch we were not able to track down where the image tagged as prometheus points to. We assumed it might be the case that it was pushed with some other change, and decided to report in case others decide to give it a try.

We also noticed a huge increase in the custom metrics usage during the weekend this image was deployed:

image

We are not sure if this is related to the mirror config being added OR if this is related to the way metrics are now reported. For this reason, we don't think it restores the existing behavior on 0.26.1.

Thank you

@juniorz
Copy link
Author

juniorz commented Feb 11, 2020

Here is the only ingress, if that helps.

kubectl get ingress -o yaml --all-namespaces
apiVersion: v1
items:
- apiVersion: extensions/v1beta1
  kind: Ingress
  metadata:
    annotations:
      kubernetes.io/ingress.class: nginx
      nginx.ingress.kubernetes.io/enable-modsecurity: "true"
      nginx.ingress.kubernetes.io/modsecurity-snippet: |
        Include /etc/nginx/owasp-modsecurity-crs/nginx-modsecurity.conf
        SecRuleEngine On
    creationTimestamp: 2020-01-22T15:33:41Z
    generation: 1
    name: hello-world
    namespace: default
    resourceVersion: "26628933"
    selfLink: /apis/extensions/v1beta1/namespaces/default/ingresses/hello-world
    uid: 912c49ba-3d2c-11ea-901f-12ee5ca50c5b
  spec:
    rules:
    - host: hello-world.sandbox.xxx-nonprod.com
      http:
        paths:
        - backend:
            serviceName: hello-world
            servicePort: 80
  status:
    loadBalancer:
      ingress:
      - hostname: a92cc68dff10411e9ab8112661aa5380-2132736598.us-east-1.elb.amazonaws.com
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

And the deployment

kubectl get deployment -n kube-ingress ingress-nginx -o yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "30"
  creationTimestamp: 2019-10-17T17:35:55Z
  generation: 302
  labels:
    k8s-addon: ingress-nginx.addons.k8s.io
    k8s-app: nginx-ingress-controller
  name: ingress-nginx
  namespace: kube-ingress
  resourceVersion: "26630618"
  selfLink: /apis/extensions/v1beta1/namespaces/kube-ingress/deployments/ingress-nginx
  uid: 92ce9154-f104-11e9-ab81-12661aa53802
spec:
  progressDeadlineSeconds: 2147483647
  replicas: 1
  revisionHistoryLimit: 2147483647
  selector:
    matchLabels:
      app: ingress-nginx
      k8s-addon: ingress-nginx.addons.k8s.io
      k8s-app: nginx-ingress-controller
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      annotations:
        ad.datadoghq.com/modsec-logger.logs: '[{ "source": "modsec", "service": "external-ingress-controller"
          }]'
        ad.datadoghq.com/nginx-ingress-controller.check_names: '["nginx","nginx_ingress_controller"]'
        ad.datadoghq.com/nginx-ingress-controller.init_configs: '[{},{}]'
        ad.datadoghq.com/nginx-ingress-controller.instances: '[{"nginx_status_url":
          "http://%%host%%/nginx_status"},{"prometheus_url": "http://%%host%%:10254/metrics"}]'
        ad.datadoghq.com/nginx-ingress-controller.logs: '[{ "source": "nginx-ingress-controller",
          "service": "external-ingress-controller" }]'
      creationTimestamp: null
      labels:
        app: ingress-nginx
        date: "1576252923"
        k8s-addon: ingress-nginx.addons.k8s.io
        k8s-app: nginx-ingress-controller
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - ingress-nginx
              topologyKey: kubernetes.io/hostname
            weight: 100
      containers:
      - args:
        - /bin/sh
        - -c
        - while [ ! -f /var/log/modsec/modsec_audit.log ]; do sleep 1; done; tail
          -f /var/log/modsec/modsec_audit.log
        image: busybox
        imagePullPolicy: Always
        name: modsec-logger
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/log/modsec
          name: modsec-logs
      - args:
        - /nginx-ingress-controller
        - --configmap=$(POD_NAMESPACE)/ingress-nginx
        - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
        - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
        - --publish-service=$(POD_NAMESPACE)/ingress-nginx
        - --annotations-prefix=nginx.ingress.kubernetes.io
        - --default-backend-service=$(POD_NAMESPACE)/nginx-default-backend
        - --v=1
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: HOST_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller-amd64:prometheus
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - /wait-shutdown
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: nginx-ingress-controller
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        - containerPort: 443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          requests:
            cpu: 200m
            memory: 700Mi
        securityContext:
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - ALL
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/log/modsec
          name: modsec-logs
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsUser: 101
      serviceAccount: nginx-ingress-controller
      serviceAccountName: nginx-ingress-controller
      terminationGracePeriodSeconds: 180
      volumes:
      - emptyDir: {}
        name: modsec-logs
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: 2020-02-11T15:21:25Z
    lastUpdateTime: 2020-02-11T15:21:25Z
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 302
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

And there is the mirror

$ kubectl get pods -n kube-ingress
NAME                                    READY   STATUS    RESTARTS   AGE
ingress-nginx-596c798c4f-qfh2k          2/2     Running   0          14m
nginx-default-backend-6cb6858cf-xvdr7   1/1     Running   0          55d
$ kubectl exec -it -n kube-ingress ingress-nginx-596c798c4f-qfh2k -c nginx-ingress-controller -- cat /etc/nginx/nginx.conf | grep mirror
			mirror /_mirror-912c49ba-3d2c-11ea-901f-12ee5ca50c5b;
			mirror_request_body on;

@aledbf
Copy link
Member

aledbf commented Feb 11, 2020

@juniorz please use quay.io/kubernetes-ingress-controller/nginx-ingress-controller-amd64:fixmirror
This image contains #5055

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants