Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Prometheus Scalers in same scaled object return same metric values #702

Closed
ycabrer opened this issue Mar 26, 2020 · 8 comments · Fixed by #727 or #732
Closed

Multiple Prometheus Scalers in same scaled object return same metric values #702

ycabrer opened this issue Mar 26, 2020 · 8 comments · Fixed by #727 or #732
Labels
bug Something isn't working

Comments

@ycabrer
Copy link
Contributor

ycabrer commented Mar 26, 2020

Hi,

I am using two prometheus scalers on the same scaled object to scale a single deployment based on different queries.

Each metric has a distinct name and query yet both query results are returned on each metric. This is causing strange metrics in the resulting HPA.

Expected Behavior

Each distinct metric / query in a scaled object should only return the data it is requesting

Actual Behavior

Each distinct metric / query is returning all metrics for the all queries.

Steps to Reproduce the Problem

  1. Create a scaled object with two prometheus scalers.
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
  name: multiple-queue-consumer
  namespace: qa
  labels:
    deploymentName: multiple-queue-consumer
spec:
  scaleTargetRef:
    deploymentName: multiple-queue-consumer
  pollingInterval: 30
  cooldownPeriod:  150
  minReplicaCount: 1
  maxReplicaCount: 16
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-server.monitoring.svc.cluster.local
      metricName:  mq_messages_per_consumer_queue_1
      threshold: '6'
      query: >
        rabbitmq_queue_messages{queue="queue-1"}
        / 
        rabbitmq_queue_consumers{queue="queue-1"}
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-server.monitoring.svc.cluster.local
      metricName: mq_messages_per_consumer_queue_2
      threshold: '6'
      query: >
        rabbitmq_queue_messages{queue="queue-2"} 
        / 
        rabbitmq_queue_consumers{queue="queue-2""}

  1. Use the kubectl to verify external metric values
    get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/qa/mq_messages_per_consumer_queue_1?labelSelector=deploymentName%3Dmultiple-queue-consumer" | jq

result for queue 1:

{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/external.metrics.k8s.io/v1beta1/namespaces/qa/mq_messages_per_consumer_queue_1"
  },
  "items": [
    {
      "metricName": "mq_messages_per_consumer_queue_1",
      "metricLabels": null,
      "timestamp": "2020-03-25T22:17:24Z",
      "value": "21"
    },
    {
      "metricName": "mq_messages_per_consumer_queue_1",
      "metricLabels": null,
      "timestamp": "2020-03-25T22:17:24Z",
      "value": "8"
    }
  ]
}

get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/qa/mq_messages_per_consumer_queue_1?labelSelector=deploymentName%3Dmultiple-queue-consumer" | jq
result for queue 2:

{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/external.metrics.k8s.io/v1beta1/namespaces/qa/mq_messages_per_consumer_queue_2"
  },
  "items": [
    {
      "metricName": "mq_messages_per_consumer_queue_2",
      "metricLabels": null,
      "timestamp": "2020-03-25T22:17:24Z",
      "value": "21"
    },
    {
      "metricName": "mq_messages_per_consumer_queue_2",
      "metricLabels": null,
      "timestamp": "2020-03-25T22:17:24Z",
      "value": "8"
    }
  ]
}

The first value of 21 is the metric for queue-1 and the second is the metric for queue-2 in each case

HPA

Name:                                            keda-hpa-multiple-queue-consumer
Namespace:                                       qa
Labels:                                          app.kubernetes.io/managed-by=keda-operator
                                                 app.kubernetes.io/name=keda-hpa-multiple-queue-consumer
                                                 app.kubernetes.io/part-of=multiple-queue-consumer
                                                 app.kubernetes.io/version=1.3.0
Annotations:                                     <none>
CreationTimestamp:                               Wed, 25 Mar 2020 14:51:46 -0600
Reference:                                       Deployment/multiple-queue-consumer
Metrics:                                         ( current / target )
  "mq_messages_per_consumer_queue_1" (target average value):  14500m / 6
  "mq_messages_per_consumer_queue_2" (target average value):   14500m / 6
Min replicas:                                    1
Max replicas:                                    16
Deployment pods:                                 4 current / 4 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from external metric mq_messages_per_consumer_queue_1(&LabelSelector{MatchLabels:map[stringstring{deploymentName: multiple-queue-consumer,},MatchExpressions:,})
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type     Reason                        Age                  From                       Message
  ----     ------                        ----                 ----                       -------
  Normal   SuccessfulRescale             46m                  horizontal-pod-autoscaler  New size: 2; reason: All metrics below target
  Normal   SuccessfulRescale             30m                  horizontal-pod-autoscaler  New size: 3; reason: external metric mq_messages_per_consumer_queue_1(&LabelSelector{MatchLabels:map[stringstring{deploymentName: multiple-queue-consumer,},MatchExpressions:,}) above target
  Normal   SuccessfulRescale             13m (x7 over 109m)   horizontal-pod-autoscaler  New size: 3; reason: All metrics below target
  Normal   SuccessfulRescale             85s (x7 over 87m)    horizontal-pod-autoscaler  New size: 4; reason: external metric mq_messages_per_consumer_queue_1(&LabelSelector{MatchLabels:map[stringstring{deploymentName: multiple-queue-consumer,},MatchExpressions:,}) above target

Scaled Object

Name:         multiple-queue-consumer
Namespace:    qa
Labels:       deploymentName=multiple-queue-consumer
Annotations:  
API Version:  keda.k8s.io/v1alpha1
Kind:         ScaledObject
Metadata:
  Creation Timestamp:  2020-03-25T15:39:31Z
  Finalizers:
    finalizer.keda.k8s.io
  Generation:        8
  Resource Version:  18647747
  Self Link:         /apis/keda.k8s.io/v1alpha1/namespaces/qa/scaledobjects/multiple-queue-consumer
  UID:               d1e01b0d-6eae-11ea-bf96-0ed83b31ac69
Spec:
  Cooldown Period:    150
  Max Replica Count:  16
  Min Replica Count:  1
  Polling Interval:   30
  Scale Target Ref:
    Deployment Name:  multiple-queue-consumer
  Triggers:
    Metadata:
      Metric Name:  mq_messages_per_consumer_queue_1
      Query:        rabbitmq_queue_messages{queue="queue-1"} /  rabbitmq_queue_consumers{queue="queue-1"}

      Server Address:  http://prometheus-server.monitoring.svc.cluster.local
      Threshold:       6
    Type:              prometheus
    Metadata:
      Metric Name:  mq_messages_per_consumer_queue_2
      Query:        rabbitmq_queue_messages{queue="queue-2"} /  rabbitmq_queue_consumers{queue="queue-2"}

      Server Address:  http://prometheus-server.monitoring.svc.cluster.local
      Threshold:       6
    Type:              prometheus
Status:
  External Metric Names:
    mq_messages_per_consumer_queue_1
    mq_messages_per_consumer_queue_2
  Last Active Time:  2020-03-26T02:54:32Z
Events:              <none>

Specifications

  • KEDA Version: 1.3.0
  • Platform & Version: EKS 1.14
  • Kubernetes Version: 1.14
  • Scaler(s): Prometheus
@ycabrer ycabrer added the bug Something isn't working label Mar 26, 2020
@ycabrer ycabrer changed the title Multiple Prometheus Scalers return duplicate metrics Multiple Prometheus Scalers in same scaled object return same metric values Mar 27, 2020
@tomkerkhove
Copy link
Member

Thanks for reporting!

@zroubalik
Copy link
Member

zroubalik commented Mar 31, 2020

@ycabrer thanks for reporting. I have found the potential problem in the Metrics Server implementation. This code is returning all metrics for each ScaledObject.

We have two options how to solve it:

  1. Quick fix: In Metrics Server in the for loop mentioned above, after getting all metrics for a ScaledObject, filter and select only those we are interested in. It will be a simple change on the Metrics Server side. I don't like this one, cause we will be querying scalers we are not interested in.
  2. Proper fix: Because we are not able to get used metrics for the scaler from the code, we will need to add GetMetricName() to Scaler interface, implement this method across all scalers and then use this method in the Metrics Server to distiguish the Scaler we want to query for the metrics. This will result in querying only the scaler we want to konw it's metrics.

@ycabrer
Copy link
Contributor Author

ycabrer commented Apr 1, 2020

@zroubalik I see GetMetricSpecForScaling is used to construct the HPA spec. Couldn't the metric name be extracted from there?

@ycabrer
Copy link
Contributor Author

ycabrer commented Apr 4, 2020

@zroubalik

This seems to be working,

for _, scaler := range scalers {
	metricSpecs := scaler.GetMetricSpecForScaling()

	for _, metricSpec := range metricSpecs {
		// Filter only the desired metric
		if metricSpec.External.MetricName == info.Metric {
			metrics, err := scaler.GetMetrics(context.TODO(), info.Metric, metricSelector)
			if err != nil {
				logger.Error(err, "error getting metric for scaler", "ScaledObject.Namespace", scaledObject.Namespace, "ScaledObject.Name", scaledObject.Name, "Scaler", scaler)
			} else {
				matchingMetrics = append(matchingMetrics, metrics...)
			}
		}
	}

	scaler.Close()
}

if len(matchingMetrics) <= 0 {
	return nil, fmt.Errorf("Error no matching metrics found")
}

provider.go

This should be fine I think since all metrics seem to be external metrics.

I'm not sure if iterating through metricSpecs is needed but the return type of GetMetricSpecForScaling seems to be an array.

Let me know what you think. I can open a pull request.

zroubalik pushed a commit that referenced this issue Apr 6, 2020
Signed-off-by: zhak <zhak@kimonocloud.com>

Co-authored-by: zhak <zhak@kimonocloud.com>
zroubalik added a commit that referenced this issue Apr 7, 2020
@zroubalik
Copy link
Member

zroubalik commented Apr 7, 2020

@ycabrer unfortunately I was forced to revert your commit. There are 2 problems:

  1. The metrics adapter doesn't work as expected, I have tested it with azure-queue and kafka scalers, and for these scaler Metrics server doesn't return any metric, err msg is: No matching metrics found. We need to double check, whether the problem is in the scalers implementation (ie. GetMetricSpecForScaling() is not implemented correctly) or the problem is in your change.
  2. The commit wasn't signed properly, so it breaks DCO check for any other PRs applied after your commit was merged in the master. See the message:

DCO fail

//EDIT: added Kafka scaler note

@zroubalik zroubalik reopened this Apr 7, 2020
@ycabrer
Copy link
Contributor Author

ycabrer commented Apr 7, 2020

@zroubalik I was able to recreate the issue,

It looks like it is an issue with the casing when comparing the metric name with the scaler.

We are getting lagThreshold (camoCase) from the metric spec and lagthreshold (lowercase) from info.Metric

It looks like the kubernetes client changes it to lowercase as explained kubernetes issue # 72996

@ycabrer
Copy link
Contributor Author

ycabrer commented Apr 7, 2020

camoCase:

~/ops !1 ❯ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/qa/lagThreshold/?labelSelector=deploymentName%3Ddashboard" | jq 
{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/external.metrics.k8s.io/v1beta1/namespaces/qa/lagThreshold/"
  },
  "items": [
    {
      "metricName": "lagThreshold",
      "metricLabels": null,
      "timestamp": "2020-04-07T19:26:21Z",
      "value": "5"
    }
  ]
}

lowercase:

~/ops !1 ❯ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/qa/lagthreshold/?labelSelector=deploymentName%3Ddashboard" | jq
Error from server: No matching metrics found

I think another issue is that the metric name is hard coded to lagThreshold. This would cause the same issue with multiple metrics being retrieved. Ideally it would have a unique name. Maybe based on the kafka topic or the broker.

@zroubalik
Copy link
Member

zroubalik commented Apr 8, 2020

@ycabrer excellent investigation. Thanks!

I think another issue is that the metric name is hard coded to lagThreshold. This would cause the same issue with multiple metrics being retrieved. Ideally it would have a unique name. Maybe based on the kafka topic or the broker.

Yeah, that is true, each scaler has hardcoded metric name, therefore multiple scalers with the same metric in one ScaledObject will not work, we are aware of that. We can try to find solution how to fix this as it might be a good idea for upcoming v2.0 release.

SpiritZhou pushed a commit to SpiritZhou/keda that referenced this issue Jul 18, 2023
…ffect rate limiting (kedacore#702)

Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants