Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using multiple triggers seems like scaling is not working as expected #550

Closed
schmaldaniel1 opened this issue Jan 19, 2020 · 11 comments
Labels
bug Something isn't working

Comments

@schmaldaniel1
Copy link

schmaldaniel1 commented Jan 19, 2020

A clear and concise description of what the bug is.
We are testing keda as one of our scaler option, while the tests we noticed that when using multiple triggers with prometheus scale is not working as expected.

The scenario is, we have deployment which exposes 2 metrics to prometheus, based on those metrics we would like to scale.

This is the scale object yaml we are using:

kind: ScaledObject
metadata:
  name: test-scaledobject
  namespace: xdr-st
  labels:
    deploymentName: xdr-st-1239495966661-agent-api
spec:
  scaleTargetRef:
    deploymentName: xdr-st-1239495966661-agent-api
  pollingInterval: 5
  cooldownPeriod: 30
  minReplicaCount: 3
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://monitoring-1239495966661-prometheus.monitoring.svc:8080
      metricName: occupied
      threshold: "100"
      query: sum(avg_over_time(xdr_agent_api_waitress_occupied_threads{app="xdr-st-1239495966661-agent-api"}[5m]))
  - type: prometheus
    metadata:
      serverAddress: http://monitoring-1239495966661-prometheus.monitoring.svc:8080
      metricName: queued
      threshold: '2'
      query: sum(avg_over_time(xdr_agent_api_waitress_queued_requests{app="xdr-st-1239495966661-agent-api"}[5m]))

As you can see I set 2 metrics, while the occupied threshold is 100 and the value is 94.
The queued metric treshold is 2 and the value is 0.

For some reason the HPA shows that the value is 94 for the queued metric and as a result it starts to scale the replicas although none of the metrics are above the threshold, as you can see here:

NAME                                      REFERENCE                                   TARGETS                    MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-xdr-st-1239495966661-agent-api   Deployment/xdr-st-1239495966661-agent-api   94/100 (avg), 94/2 (avg)   3         100       3          5d21h

This is the HPA describe after the scale:

Name:               keda-hpa-xdr-st-1239495966661-agent-api
Namespace:          xdr-st
Labels:             <none>
Annotations:        autoscaling.alpha.kubernetes.io/conditions:
                      [{"type":"AbleToScale","status":"True","lastTransitionTime":"2020-01-19T12:20:45Z","reason":"SucceededRescale","message":"the HPA controll...
                    autoscaling.alpha.kubernetes.io/current-metrics:
                      [{"type":"External","external":{"metricName":"occupied","metricSelector":{"matchLabels":{"deploymentName":"xdr-st-1239495966661-agent-api"...
                    autoscaling.alpha.kubernetes.io/metrics:
                      [{"type":"External","external":{"metricName":"occupied","metricSelector":{"matchLabels":{"deploymentName":"xdr-st-1239495966661-agent-api"...
CreationTimestamp:  Mon, 13 Jan 2020 16:57:41 +0200
Reference:          Deployment/xdr-st-1239495966661-agent-api
Min replicas:       3
Max replicas:       100
Deployment pods:    30 current / 60 desired
Events:
  Type     Reason             Age                    From                       Message
  ----     ------             ----                   ----                       -------
  Normal   SuccessfulRescale  20m                    horizontal-pod-autoscaler  New size: 3; reason: All metrics below target
  Normal   SuccessfulRescale  17m (x110 over 5d21h)  horizontal-pod-autoscaler  New size: 6; reason: external metric queued(&LabelSelector{MatchLabels:map[string]string{deploymentName: xdr-st-1239495966661-agent-api,},MatchExpressions:[],}) above target
  Normal   SuccessfulRescale  17m                    horizontal-pod-autoscaler  New size: 12; reason: external metric queued(&LabelSelector{MatchLabels:map[string]string{deploymentName: xdr-st-1239495966661-agent-api,},MatchExpressions:[],}) above target
  Normal   SuccessfulRescale  16m                    horizontal-pod-autoscaler  New size: 24; reason: external metric queued(&LabelSelector{MatchLabels:map[string]string{deploymentName: xdr-st-1239495966661-agent-api,},MatchExpressions:[],}) above target
  Normal   SuccessfulRescale  16m                    horizontal-pod-autoscaler  New size: 48; reason: external metric queued(&LabelSelector{MatchLabels:map[string]string{deploymentName: xdr-st-1239495966661-agent-api,},MatchExpressions:[],}) above target
  Normal   SuccessfulRescale  15m                    horizontal-pod-autoscaler  New size: 100; reason: external metric queued(&LabelSelector{MatchLabels:map[string]string{deploymentName: xdr-st-1239495966661-agent-api,},MatchExpressions:[],}) above target
  Normal   SuccessfulRescale  6m53s (x34 over 16m)   horizontal-pod-autoscaler  New size: 60; reason: external metric queued(&LabelSelector{MatchLabels:map[string]string{deploymentName: xdr-st-1239495966661-agent-api,},MatchExpressions:[],}) above target

This is what I get from the get HPA, you can see that in both of the metrics it applies 94 while only the first one should be 94 and the second should be 0.

✗ kubectl get hpa -n xdr-st keda-hpa-xdr-st-1239495966661-agent-api
NAME                                      REFERENCE                                   TARGETS                    MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-xdr-st-1239495966661-agent-api   Deployment/xdr-st-1239495966661-agent-api   94/100 (avg), 94/2 (avg)   3         100       3          5d21h

This is the occupied metric graph, as you can see it always under 100:
Uploading Screen Shot 2020-01-19 at 14.51.17.png…

This is queued graph, stable on 0:
Uploading Screen Shot 2020-01-19 at 14.51.27.png…

Expected Behavior

No scale will occur since no metric is above the threshold

Actual Behavior

Keda scales my replicas.

Steps to Reproduce the Problem

  1. configure scaled object with multiple prometheus triggers.
  2. set one threshold to be lower than the other
  3. make sue the the higher threshold is higher from the lower threshold but lower from the higher threshold.

Specifications

  • Version: keda:1.0.0, keda-metrics-adapter:1.0.0
  • Platform: GKE
  • Scaler(s): Prometheus
@schmaldaniel1 schmaldaniel1 added the bug Something isn't working label Jan 19, 2020
@schmaldaniel1 schmaldaniel1 changed the title When using multiple triggers seems like scaling is not working as ecpected When using multiple triggers seems like scaling is not working as expected Jan 19, 2020
@tomkerkhove
Copy link
Member

While the spec is already open for it, we don't support multiple triggers yet and is being tracked via #476.

Feel free to head over there and mention you'd like to have this as well.

@tomkerkhove
Copy link
Member

Unless @ahmelsayed corrects me ofcourse.

@schmaldaniel1
Copy link
Author

@tomkerkhove thanks for the response, do you have any estimation when multiple triggers will be supported?

@zroubalik
Copy link
Member

@schmaldaniel1 could you please post here the output of kubectl get hpa KEDA_HPA_NAME -oyaml. The annotations section in the output of describe is cropped, I'd be interested what is specified in there.

@schmaldaniel1
Copy link
Author

@zroubalik I dont have the above setup anymore, but this is from a new one with same setup.

kind: HorizontalPodAutoscaler
metadata:
  annotations:
    autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2020-01-30T15:46:43Z","reason":"SucceededGetScale","message":"the
      HPA controller was able to get the target''s current scale"},{"type":"ScalingActive","status":"False","lastTransitionTime":"2020-01-30T15:46:43Z","reason":"ScalingDisabled","message":"scaling
      is disabled since the replica count of the target is zero"}]'
    autoscaling.alpha.kubernetes.io/metrics: '[{"type":"External","external":{"metricName":"GCPPubSubSubscriptionSize","metricSelector":{"matchLabels":{"deploymentName":"xdr-st-3008160-pipeline-helper"}},"targetAverageValue":"5"}},{"type":"External","external":{"metricName":"GCPPubSubSubscriptionSize","metricSelector":{"matchLabels":{"deploymentName":"xdr-st-3008160-pipeline-helper"}},"targetAverageValue":"5"}}]'
  creationTimestamp: "2020-01-30T15:46:28Z"
  name: keda-hpa-xdr-st-3008160-pipeline-helper
  namespace: xdr-st
  ownerReferences:
  - apiVersion: keda.k8s.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: ScaledObject
    name: pipeline-helper-scaledobject
    uid: ad933214-4377-11ea-a901-42010ab40004
  resourceVersion: "25069"
  selfLink: /apis/autoscaling/v1/namespaces/xdr-st/horizontalpodautoscalers/keda-hpa-xdr-st-3008160-pipeline-helper
  uid: ad97a2d6-4377-11ea-a901-42010ab40004
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: xdr-st-3008160-pipeline-helper
status:
  currentReplicas: 0
  desiredReplicas: 0```

@zroubalik
Copy link
Member

zroubalik commented Jan 30, 2020

@schmaldaniel1 I can see two metrics with the same name: "metricName":"GCPPubSubSubscriptionSize"
Which is not the same as in the example above, where you are using occupied and queued. Would really help if you can provide the similar setup, once you have got it. Thanks

@schmaldaniel1
Copy link
Author

@zroubalik I know, its not the same setup.. the previous setup is deleted and I cant reproduce it now.
in this setup I am using 2 pubsub triggers and not prometheus as the first one.
but the issue is the same.

if it is crucial I can try next week to use prometheus.

@zroubalik
Copy link
Member

The problem is, that this setup is using two metrics, but both have the same name. So I can't see what is wrong with the previous setup where you are using different metrics. Would be great if you can try to provide the original setup once you have some time.

@schmaldaniel
Copy link

schmaldaniel commented Jan 30, 2020

@zroubalik first thanks for the responses.

I guess that the names is being created by scaled object when it creates the HPA, for instance this is the scaled object yaml for this setup and you can see there that there is no place I setup the names and yet in the status you see that same name.

kind: ScaledObject
metadata:
  creationTimestamp: "2020-01-30T15:46:28Z"
  finalizers:
  - finalizer.keda.k8s.io
  generation: 1
  labels:
    deploymentName: xdr-st-3008160-pipeline-helper
  name: pipeline-helper-scaledobject
  namespace: xdr-st
  resourceVersion: "24993"
  selfLink: /apis/keda.k8s.io/v1alpha1/namespaces/xdr-st/scaledobjects/pipeline-helper-scaledobject
  uid: ad933214-4377-11ea-a901-42010ab40004
spec:
  cooldownPeriod: 30
  maxReplicaCount: 10
  minReplicaCount: 0
  pollingInterval: 15
  scaleTargetRef:
    deploymentName: xdr-st-3008160-pipeline-helper
  triggers:
  - metadata:
      credentials: GENERIC_GOOGLE_CREDENTIALS
      subscriptionName: edr-sos-pipeline-3008160-sub
      subscriptionSize: "5"
    type: gcp-pubsub
  - metadata:
      credentials: GENERIC_GOOGLE_CREDENTIALS
      subscriptionName: edr-replay-pipeline-3008160-sub
      subscriptionSize: "5"
    type: gcp-pubsub
status:
  externalMetricNames:
  - GCPPubSubSubscriptionSize
  - GCPPubSubSubscriptionSize```

@schmaldaniel
Copy link

by the way also here we are using different subscriptions for pubsub scale, and yet it gives the external metrics the same name.

@zroubalik
Copy link
Member

@schmaldaniel this problem is solved in this PR #732

The fix will be released as part of v1.4

Closing this issue, please reopen if you happen to have this problem with v1.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants