Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics for EBS CSI Controller is not available #1993

Closed
ujala-singh opened this issue Apr 1, 2024 · 3 comments
Closed

Metrics for EBS CSI Controller is not available #1993

ujala-singh opened this issue Apr 1, 2024 · 3 comments

Comments

@ujala-singh
Copy link

ujala-singh commented Apr 1, 2024

What happened?
I am using EBS CSI Drivers as an add-on from AWS EKS Clusters. Somehow by providing some custom values, I am able to expose the port 3301 and metrics endpoint, but while scraping it doesn't give any metrics.

What you expected to happen?
I would expect, it should return all of the metrics which are exposed by the application.

Anything else we need to know?:
Added one arg to ebs-plugin conatiner: --http-endpoint=0.0.0.0:3301 and exposed the port:

- name: metrics
  containerPort: 3301
  protocol: TCP

Later created the service and serviceMonitor for the same as mentioned [here]:

---
apiVersion: v1
kind: Service
metadata:
  name: ebs-csi-controller
  namespace: kube-system
  labels:
    app: ebs-csi-controller
spec:
  selector:
    app: ebs-csi-controller
  ports:
    - name: metrics
      port: 3301
      targetPort: 3301
  type: ClusterIP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: ebs-csi-controller
  namespace: kube-system
  labels:
    app: ebs-csi-controller
spec:
  selector:
    matchLabels:
      app: ebs-csi-controller
  namespaceSelector:
    matchNames:
      - kube-system
  endpoints:
    - targetPort: 3301
      path: /metrics
      interval: 15s

Vmagent is able to discover the target but there are no metrics available.

Environment

  • Kubernetes version: v1.26
  • Driver version: v1.26.0-eksbuild.1

Logs

$ k logs -f ebs-csi-controller-647db6b5db-k8lxb -n kube-system                                             
Defaulted container "ebs-plugin" out of: ebs-plugin, csi-provisioner, csi-attacher, csi-snapshotter, csi-resizer, liveness-probe
I0329 08:39:43.740880       1 driver.go:80] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.26.0"
I0329 08:39:43.741381       1 metrics.go:95] "Metric server listening" address="0.0.0.0:3301" path="/metrics"
I0329 08:39:43.746232       1 controller.go:92] "batching" status=true

Debugging

$ k port-forward svc/ebs-csi-controller 3301:3301 -n kube-system                           
Forwarding from 127.0.0.1:3301 -> 3301
Forwarding from [::1]:3301 -> 3301
Handling connection for 3301
Handling connection for 3301

$ curl http://localhost:3301/metrics

Curl Request is not giving any metrics.

@torredil
Copy link
Member

torredil commented Apr 1, 2024

Hi @ujala-singh

I was unable to reproduce this in a similar environment. Could you run through the steps below to check if the metrics endpoint is accessible within the container?

  1. Dynamically provision a volume: kubectl apply -f examples/kubernetes/dynamic-provisioning/manifests
  2. Retrieve leader controller pod: export EBS_CSI_CONTROLLER=$(kubectl get lease -n kube-system external-attacher-leader-ebs-csi-aws-com -o jsonpath="{.spec.holderIdentity}")
  3. Start an ephemeral debug container for the controller pod: kubectl debug -it $EBS_CSI_CONTROLLER -n kube-system --image=busybox:1.28 --target=ebs-plugin (this is necessary due to the use of a minimal base image)
  4. Check if the metrics endpoint is accessible: wget -O - http://localhost:3301/metrics

Here is an example of the expected output:

Connecting to localhost:3301 (127.0.0.1:3301)
# HELP cloudprovider_aws_api_request_duration_seconds [ALPHA] ebs_csi_aws_com metric
# TYPE cloudprovider_aws_api_request_duration_seconds histogram
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.005"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.01"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.025"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.05"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.1"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.25"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="1"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="2.5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="10"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="+Inf"} 1
cloudprovider_aws_api_request_duration_seconds_sum{request="AttachVolume"} 0.325255518
cloudprovider_aws_api_request_duration_seconds_count{request="AttachVolume"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="0.005"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="0.01"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="0.025"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="0.05"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="0.1"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="0.25"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="0.5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="1"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="2.5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="10"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="+Inf"} 1
cloudprovider_aws_api_request_duration_seconds_sum{request="DescribeInstances"} 0.129124013
cloudprovider_aws_api_request_duration_seconds_count{request="DescribeInstances"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="0.005"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="0.01"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="0.025"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="0.05"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="0.1"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="0.25"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="0.5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="1"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="2.5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="10"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="+Inf"} 1
cloudprovider_aws_api_request_duration_seconds_sum{request="DescribeVolumes"} 0.106219663
cloudprovider_aws_api_request_duration_seconds_count{request="DescribeVolumes"} 1
-                    100% |************************************************************************************************************************************************************************************************************************|  3972   0:00:00 ETA

@ujala-singh
Copy link
Author

Thanks @torredil !

@markretallack
Copy link

I am also seeing this, I have used the debug to access the pod, and

/ $ wget -O - http://localhost:3301/metrics
Connecting to localhost:3301 (127.0.0.1:3301)

So does not return anything, I cannot see any errors in the logs. Not sure where else to look....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants