Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka Scaler is not scaling up #5043

Closed
chubutin opened this issue Oct 3, 2023 · 2 comments
Closed

Kafka Scaler is not scaling up #5043

chubutin opened this issue Oct 3, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@chubutin
Copy link

chubutin commented Oct 3, 2023

Report

I have created a ScaledObject that scales up correctly under certain configurations, but not consistently under all circumstances.

Expected Behavior

I want KEDA to scale my pods to the maximum possible number, or perhaps receive guidance on where my ScaledObject configuration might be incorrect.

Actual Behavior

When minReplicaCount is set to 1 and maxReplicaCount is set to 10, the service starts with 1 pod, and once messages are enqueued to the Kafka topic, the Horizontal Pod Autoscaler (HPA) scales to 3 pods. However, when minReplicaCount is set to 3 and maxReplicaCount is set to 10, the service still starts with 3 pod, but the HPA does not scale.

Steps to Reproduce the Problem

  1. Created a Kafka Scaler with minReplicas:3 and maxReplicas:10 with current configuration shared below
  2. Send 1500 messages to kafka topic
  3. Check how many pods get created

Logs from KEDA operator

2023-10-03T00:25:04Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"base-application-staging-hpa","namespace":"staging"}, "namespace": "staging", "name": "base-application-staging-hpa", "reconcileID": "4dc3489f-0e40-41c1-82b9-d3044204b196"}
2023-10-03T00:25:19Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"base-application-staging-hpa","namespace":"staging"}, "namespace": "staging", "name": "base-application-staging-hpa", "reconcileID": "ee1d402e-6139-45ae-b3cf-4c5795bab81f"}

KEDA Version

2.12.0

Kubernetes Version

1.25

Platform

Amazon Web Services

Scaler Details

Kafka

Anything else?

Scenario:

I have tested this process with a script that injects 1500 messages into the target_topic. I also have a consumer with the consumer group consumer_group that reads from this topic, and after 5 minutes, all messages get consumed. I have verified that the messages are enqueued using Kafka-UI.

Issue with Metrics Visibility

One peculiar issue I've encountered is the inability to check the metric that triggers the scaling. I don't have a way to access this value, which might be part of the problem.

Configurations on my Server

  • EKS server version: 1.25
  • KEDA Version: 2.12
  • Kubecost installed

External metrics config

kubectl get apiservices v1beta1.external.metrics.k8s.io
NAME                              SERVICE                                      AVAILABLE   AGE
v1beta1.external.metrics.k8s.io   monitoring/keda-operator-metrics-apiserver   True        6h45m

Issue with metrics not visisible

When I attempt to retrieve the metrics that trigger scaling, I don't see any metrics. I should see the KEDA ScaledObject, but it's not there. However, the HPA works correctly for scaling based on these metrics.

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "externalmetrics",
      "singularName": "",
      "namespaced": true,
      "kind": "ExternalMetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}

If I attempt to retrieve the value for the HPA metrics:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/*/s0-kafka-ingress_listing_operations" | jq .    
Error from server: scaledObject name is not specified

In my HPA configuration, I can confirm that the metrics are being read and are triggering scaling.

hpa

Current objects

ScaledObject Configuration:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: base-application-staging-hpa
  namespace: staging
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: base-application-staging
  pollingInterval:  5
  cooldownPeriod:   30
  fallback:
    failureThreshold: 2
    replicas: 2
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
    - type: kafka
      metadata:
        tls: enable
        bootstrapServers: {{api_brokers}}
        consumerGroup: consumer_group
        topic: target_topic
        offsetResetPolicy: latest
        allowIdleConsumers: 'false'
        scaleToZeroOnInvalidOffset: 'false'
        excludePersistentLag: 'false'
        activationLagThreshold: '2'
        lagThreshold: '1'

HPA Configuration

kubectl get hpa keda-hpa-base-application-staging-hpa -n staging -o yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  annotations:
    meta.helm.sh/release-name: base-application-staging
    meta.helm.sh/release-namespace: staging
  creationTimestamp: "2023-10-02T18:12:05Z"
  labels:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: keda-hpa-base-application-staging-hpa
    app.kubernetes.io/part-of: base-application-staging-hpa
    app.kubernetes.io/version: 2.12.0
    scaledobject.keda.sh/name: base-application-staging-hpa
  name: keda-hpa-base-application-staging-hpa
  namespace: staging
  ownerReferences:
  - apiVersion: keda.sh/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: ScaledObject
    name: base-application-staging-hpa
    uid: 992355b0-7230-4546-adaa-72125bb65269
  resourceVersion: "342201543"
  uid: 0f5cec75-af01-4133-be40-deccf9dddab6
spec:
  maxReplicas: 10
  metrics:
  - external:
      metric:
        name: s0-kafka-target_topic
        selector:
          matchLabels:
            scaledobject.keda.sh/name: base-application-staging-hpa
      target:
        averageValue: "1"
        type: AverageValue
    type: External
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: base-application-staging
status:
  conditions:
  - lastTransitionTime: "2023-10-02T18:12:20Z"
    message: recommended size matches current size
    reason: ReadyForNewScale
    status: "True"
    type: AbleToScale
  - lastTransitionTime: "2023-10-02T18:12:20Z"
    message: 'the HPA was able to successfully calculate a replica count from external
      metric s0-kafka-target_topic(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name:
     base-application-staging-hpa,},MatchExpressions:[]LabelSelectorRequirement{},})'
    reason: ValidMetricFound
    status: "True"
    type: ScalingActive
  - lastTransitionTime: "2023-10-03T00:10:32Z"
    message: the desired replica count is less than the minimum replica count
    reason: TooFewReplicas
    status: "True"
    type: ScalingLimited
  currentMetrics:
  - external:
      current:
        averageValue: "0"
        value: "0"
      metric:
        name: s0-kafka_listing_operations
        selector:
          matchLabels:
            scaledobject.keda.sh/name:base-application-staging-hpa
    type: External
  currentReplicas: 1
  desiredReplicas: 1
  lastScaleTime: "2023-10-03T00:10:32Z"

Any insights or assistance in resolving this issue would be greatly appreciated.

@chubutin chubutin added the bug Something isn't working label Oct 3, 2023
@JorTurFer
Copy link
Member

Hello,
Could you share KEDA operator logs?

In the meantime, some answers to your questions xD:

If I attempt to retrieve the value for the HPA metrics:

You need to include the ScaledObject name in the request. This was different in the past, but when we moved the logic from the metrics server to the operator, this changed: https://keda.sh/docs/2.12/operate/metrics-server/

the Horizontal Pod Autoscaler (HPA) scales to 3 pods

I guess that you have 3 partitions in your Kafka topic, right? If yes, this is the default behavior, as it's explained in docs: https://keda.sh/docs/2.12/scalers/apache-kafka/

image

If you have more than 3 partitions, we can check the logs to go deeper.

@chubutin
Copy link
Author

chubutin commented Oct 3, 2023

Hello @JorTurFer . Thanks for the quick response. You are absolutely right, it was not scaling more than 3 because I didn't have more than 3 partitions. I resized the topic and it works perfectly.

You need to include the ScaledObject name in the request. This was different in the past, but when we moved the logic from the metrics server to the operator, this changed: https://keda.sh/docs/2.12/operate/metrics-server/

That's right. I missed that part too. Now I see those metrics. Thank you very much!

@chubutin chubutin closed this as completed Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

2 participants