Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEDA scaler error while init Azure Managed Prometheus client http transport #5006

Closed
rmagalla opened this issue Sep 27, 2023 · 7 comments
Closed
Labels
bug Something isn't working

Comments

@rmagalla
Copy link

Report

KEDA scaler not scales with scaled object defined with trigger using workload identity for authentication for prometheus .
I'm following this KEDA Azure Monitor managed service for Prometheus.

KEDA operator error message log:

2023-09-27T00:48:13Z    INFO    Starting workers        {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1}
2023-09-27T00:48:13Z    ERROR   prometheus_scaler       error while init Azure Managed Prometheus client http transport {"type": "ScaledObject", "namespace": "demo", "name": "azure-managed-prometheus-scaler", "error": "sources must contain at least one TokenCredential"}

My scaler objects' definition is as below:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  finalizers:
  - finalizer.keda.sh
  generation: 2
  name: azure-managed-prometheus-trigger-auth
  namespace: demo
spec:
  podIdentity:
    provider: azure-workload
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  finalizers:
  - finalizer.keda.sh
  generation: 1
  labels:
    scaledobject.keda.sh/name: azure-managed-prometheus-scaler
  name: azure-managed-prometheus-scaler
  namespace: demo
spec:
  maxReplicaCount: 20
  minReplicaCount: 1
  scaleTargetRef:
    name: podinfo
  triggers:
  - authenticationRef:
      name: azure-managed-prometheus-trigger-auth
    metadata:
      activationThreshold: "5.5"
      metricName: nginx_ingress_controller_requests
      query: round(sum(irate(nginx_ingress_controller_requests{ingress="podinfo"}[2m]))
        by (ingress) , 0.001)
      serverAddress: https://monitorws-xqrv.eastus.prometheus.monitor.azure.com
      threshold: "10"
    type: prometheus

I'm install KEDA with the following command

helm install keda kedacore/keda --namespace keda \
--set podIdentity.azureWorkload.enabled=true \
--set podIdentity.azureWorkload.clientId=xxxxxxxx-xxxx-xxxx-xxxx-e185edfbcd96 \
--set podIdentity.azureWorkload.tenantId=xxxxxxxx-xxxx-xxxx-xxxx-289fc972da1b

Expected Behavior

The KEDA scaler should have worked fine with the assigned workload identity and access token to perform scaling

Actual Behavior

The KEDA operator could not be able to find the azure identity assigned and scaling fail

Steps to Reproduce the Problem

  • Create the azure identity and bindings for the KEDA
  • Install KEDA with the aadpodidentitybinding (workload identity)
  • Create the scaledobject and triggerauthentication using KEDA workload identity
  • The scaler fails to authenticate and scale

Logs from KEDA operator

2023-09-27T00:47:28Z    INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": ":8080"}
2023-09-27T00:47:28Z    INFO    setup   Starting manager
2023-09-27T00:47:28Z    INFO    setup   KEDA Version: 2.11.2
2023-09-27T00:47:28Z    INFO    setup   Git Commit: 517ed30cc311b0b81e39112cff0fa8e3251aed69
2023-09-27T00:47:28Z    INFO    setup   Go Version: go1.20.5
2023-09-27T00:47:28Z    INFO    setup   Go OS/Arch: linux/amd64
2023-09-27T00:47:28Z    INFO    setup   Running on Kubernetes 1.26      {"version": "v1.26.6"}
2023-09-27T00:47:28Z    INFO    starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
I0927 00:47:28.335011       1 leaderelection.go:245] attempting to acquire leader lease keda/operator.keda.sh...
2023-09-27T00:47:28Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
I0927 00:48:13.003416       1 leaderelection.go:255] successfully acquired lease keda/operator.keda.sh
2023-09-27T00:48:13Z    INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
2023-09-27T00:48:13Z    INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v2.HorizontalPodAutoscaler"}
2023-09-27T00:48:13Z    INFO    Starting Controller     {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
2023-09-27T00:48:13Z    INFO    Starting EventSource    {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
2023-09-27T00:48:13Z    INFO    Starting Controller     {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
2023-09-27T00:48:13Z    INFO    Starting EventSource    {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
2023-09-27T00:48:13Z    INFO    Starting Controller     {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
2023-09-27T00:48:13Z    INFO    cert-rotation   starting cert rotator controller
2023-09-27T00:48:13Z    INFO    Starting EventSource    {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"}
2023-09-27T00:48:13Z    INFO    Starting Controller     {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
2023-09-27T00:48:13Z    INFO    Starting EventSource    {"controller": "cert-rotator", "source": "kind source: *v1.Secret"}
2023-09-27T00:48:13Z    INFO    Starting EventSource    {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2023-09-27T00:48:13Z    INFO    Starting EventSource    {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2023-09-27T00:48:13Z    INFO    Starting Controller     {"controller": "cert-rotator"}
2023-09-27T00:48:13Z    INFO    cert-rotation   no cert refresh needed
2023-09-27T00:48:13Z    INFO    cert-rotation   certs are ready in /certs
2023-09-27T00:48:13Z    INFO    Starting workers        {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "worker count": 1}
2023-09-27T00:48:13Z    INFO    Starting workers        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "worker count": 5}
2023-09-27T00:48:13Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"azure-managed-prometheus-scaler","namespace":"demo"}, "namespace": "demo", "name": "azure-managed-prometheus-scaler", "reconcileID": "0a378299-3ebd-43f8-930a-be9db444d9c2"}
2023-09-27T00:48:13Z    INFO    Starting workers        {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "worker count": 1}
2023-09-27T00:48:13Z    INFO    "metricName" is deprecated and will be removed in v2.12, please do not set it anymore   {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"azure-managed-prometheus-scaler","namespace":"demo"}, "namespace": "demo", "name": "azure-managed-prometheus-scaler", "reconcileID": "0a378299-3ebd-43f8-930a-be9db444d9c2", "trigger.type": "prometheus"}
2023-09-27T00:48:13Z    INFO    Creating a new HPA      {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"azure-managed-prometheus-scaler","namespace":"demo"}, "namespace": "demo", "name": "azure-managed-prometheus-scaler", "reconcileID": "0a378299-3ebd-43f8-930a-be9db444d9c2", "HPA.Namespace": "demo", "HPA.Name": "keda-hpa-azure-managed-prometheus-scaler"}
2023-09-27T00:48:13Z    INFO    Starting workers        {"controller": "cert-rotator", "worker count": 1}
2023-09-27T00:48:13Z    INFO    cert-rotation   Ensuring CA cert        {"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration"}
2023-09-27T00:48:13Z    INFO    cert-rotation   Ensuring CA cert        {"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
2023-09-27T00:48:13Z    INFO    cert-rotation   Ensuring CA cert        {"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration"}
2023-09-27T00:48:13Z    INFO    cert-rotation   Ensuring CA cert        {"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
2023-09-27T00:48:13Z    INFO    Starting workers        {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1}
2023-09-27T00:48:13Z    ERROR   prometheus_scaler       error while init Azure Managed Prometheus client http transport {"type": "ScaledObject", "namespace": "demo", "name": "azure-managed-prometheus-scaler", "error": "sources must contain at least one TokenCredential"}
github.com/kedacore/keda/v2/pkg/scalers.NewPrometheusScaler
        /workspace/pkg/scalers/prometheus_scaler.go:116
github.com/kedacore/keda/v2/pkg/scaling.buildScaler
        /workspace/pkg/scaling/scalers_builder.go:207
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).buildScalers.func1
        /workspace/pkg/scaling/scalers_builder.go:74
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).buildScalers
        /workspace/pkg/scaling/scalers_builder.go:78
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).performGetScalersCache
        /workspace/pkg/scaling/scale_handler.go:360
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScalersCache
        /workspace/pkg/scaling/scale_handler.go:281
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).getScaledObjectMetricSpecs
        /workspace/controllers/keda/hpa.go:209
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).newHPAForScaledObject
        /workspace/controllers/keda/hpa.go:75
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).createAndDeployNewHPA
        /workspace/controllers/keda/hpa.go:48
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).ensureHPAForScaledObjectExists
        /workspace/controllers/keda/scaledobject_controller.go:394
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).reconcileScaledObject
        /workspace/controllers/keda/scaledobject_controller.go:254
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).Reconcile
        /workspace/controllers/keda/scaledobject_controller.go:177
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226
2023-09-27T00:48:13Z    ERROR   scale_handler   error resolving auth params     {"type": "ScaledObject", "namespace": "demo", "name": "azure-managed-prometheus-scaler", "scalerIndex": 0, "error": "sources must contain at least one TokenCredential"}
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).buildScalers

KEDA Version

2.11.2

Kubernetes Version

1.26

Platform

Microsoft Azure

Scaler Details

Prometheus

Anything else?

No response

@rmagalla rmagalla added the bug Something isn't working label Sep 27, 2023
@JorTurFer
Copy link
Member

Hello,
Could you describe KEDA operator pod? I'd like to see if it has the workload identity envs:

AZURE_CLIENT_ID:             88f878cb................c5cc77
AZURE_TENANT_ID:             d.............................18e0385f4c
AZURE_FEDERATED_TOKEN_FILE:  /var/run/secrets/azure/tokens/azure-identity-token
AZURE_AUTHORITY_HOST:        https://login.microsoftonline.com/

@JorTurFer
Copy link
Member

Internally, the code is looking for those values from the options or from env (we don't provide them as options, so it's trying to look for them via envs):
image

If the values aren't provided, we don't initiate the workload identity and it can trigger the error that you see (and maybe we can improve the errors here)

@rmagalla
Copy link
Author

Hello, Could you describe KEDA operator pod? I'd like to see if it has the workload identity envs:

AZURE_CLIENT_ID:             88f878cb................c5cc77
AZURE_TENANT_ID:             d.............................18e0385f4c
AZURE_FEDERATED_TOKEN_FILE:  /var/run/secrets/azure/tokens/azure-identity-token
AZURE_AUTHORITY_HOST:        https://login.microsoftonline.com/

Hello,
The values aren't provided

Name:             keda-operator-5b689687cb-vljrx
Namespace:        keda
Priority:         0
Service Account:  keda-operator
Node:             aks-nodepool1-20862348-vmss000005/10.100.0.33
Start Time:       Tue, 26 Sep 2023 19:47:21 -0500
Labels:           app=keda-operator
                  app.kubernetes.io/component=operator
                  app.kubernetes.io/instance=keda
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=keda-operator
                  app.kubernetes.io/part-of=keda-operator
                  app.kubernetes.io/version=2.11.2
                  azure.workload.identity/use=true
                  helm.sh/chart=keda-2.11.2
                  name=keda-operator
                  pod-template-hash=5b689687cb
Annotations:      kubectl.kubernetes.io/restartedAt: 2023-09-26T19:47:20-05:00
Status:           Running
IP:               10.100.0.56
IPs:
  IP:           10.100.0.56
Controlled By:  ReplicaSet/keda-operator-5b689687cb
Containers:
  keda-operator:
    Container ID:    containerd://d63cee37e7004a369f9f9a502cd0f3e79c8678b2212c06d3799f6b9b15db9490
    Image:           ghcr.io/kedacore/keda:2.11.2
    Image ID:        ghcr.io/kedacore/keda@sha256:aa3946caf4254a2aec0090a4bef8cf85ea9001e145ca6567f472f293f9f26b3b
    Port:            8080/TCP
    Host Port:       0/TCP
    SeccompProfile:  RuntimeDefault
    Command:
      /keda
    Args:
      --leader-elect
      --zap-log-level=info
      --zap-encoder=console
      --zap-time-encoding=rfc3339
      --cert-dir=/certs
      --enable-cert-rotation=true
      --cert-secret-name=kedaorg-certs
      --operator-service-name=keda-operator
      --metrics-server-service-name=keda-operator-metrics-apiserver
      --webhooks-service-name=keda-admission-webhooks
    State:          Running
      Started:      Tue, 26 Sep 2023 19:47:28 -0500
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1000Mi
    Requests:
      cpu:      100m
      memory:   100Mi
    Liveness:   http-get http://:8081/healthz delay=25s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:8081/readyz delay=20s timeout=1s period=10s #success=1 #failure=3
    Environment:
      WATCH_NAMESPACE:
      POD_NAME:                   keda-operator-5b689687cb-vljrx (v1:metadata.name)
      POD_NAMESPACE:              keda (v1:metadata.namespace)
      OPERATOR_NAME:              keda-operator
      KEDA_HTTP_DEFAULT_TIMEOUT:  3000
      KEDA_HTTP_MIN_TLS_VERSION:  TLS12
    Mounts:
      /certs from certificates (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-56fg4 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  certificates:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kedaorg-certs
    Optional:    true
  kube-api-access-56fg4:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

@JorTurFer
Copy link
Member

JorTurFer commented Sep 27, 2023

Have you installed workload-identity webhook too? If you restart KEDA pods, do they have the envs?
We don't restart the pods in case of changes in the service account and maybe even though everything is ready, the restart is pending to mutate the pods

@rmagalla
Copy link
Author

Have you installed workload-identity webhook too? If you restart KEDA pods, do they have the envs? We don't restart the pods in case of changes in the service account and maybe even though everything is ready, the restart is pending to mutate the pods

Great. Thanks for your support. After a restart the pods, the variables are loaded and KEDA started working.
The logs would help for similar cases

Environment:
      WATCH_NAMESPACE:
      POD_NAME:                    keda-operator-74cd8b977c-7m2q8 (v1:metadata.name)
      POD_NAMESPACE:               keda (v1:metadata.namespace)
      OPERATOR_NAME:               keda-operator
      KEDA_HTTP_DEFAULT_TIMEOUT:   3000
      KEDA_HTTP_MIN_TLS_VERSION:   TLS12
      AZURE_CLIENT_ID:             xxxxxxxx-xxxx-xxxx-xxxx-e185edfbcd96
      AZURE_TENANT_ID:             xxxxxxxx-xxxx-xxxx-xxxx-289fc972da1b
      AZURE_FEDERATED_TOKEN_FILE:  /var/run/secrets/azure/tokens/azure-identity-token
      AZURE_AUTHORITY_HOST:        https://login.microsoftonline.com

@JorTurFer
Copy link
Member

JorTurFer commented Sep 27, 2023

The logs would help for similar cases

Yeah, you're right. We merged this commit today for improving them: 089f22f
In the future, you'll see an error saying that azure workload identity provider hasn't been init and the SDK error

@JorTurFer
Copy link
Member

BTW, I close the issue as it's already solved

@JorTurFer JorTurFer closed this as not planned Won't fix, can't repro, duplicate, stale Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

2 participants