Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous update of HPA objects #1531

Closed
surki opened this issue Jan 25, 2021 · 6 comments · Fixed by #1541
Closed

Continuous update of HPA objects #1531

surki opened this issue Jan 25, 2021 · 6 comments · Fixed by #1541
Assignees
Labels
bug Something isn't working

Comments

@surki
Copy link
Contributor

surki commented Jan 25, 2021

Even when there are no changes to ScaledObject or HPA, keda-operator continuously updates the HPA after it "finds" the current HPA that's in apiserver and the new HPA that is computed from ScaledObject (in here). We have hundreds of ScaledObjects. leading to too many unnecessary updates to api server.

Enabled debug logs, the difference seems to boil down to the order of metrics listed in the metrics array (probably equality.Semantic.DeepDerivative expects slices to be in same order?)

(note that Resource types comes first in new whereas the Resource type comes last in the current. If we rearrange it in the ScaledObject YAML, the issue gets resolved, see steps in Steps to Reproduce the Problem )

--- current.json        2021-01-25 15:06:19.664240010 +0530
+++ new.json    2021-01-25 15:16:54.423692540 +0530
@@ -8,6 +8,16 @@
   "maxReplicas": 5,
   "metrics": [
     {
+      "type": "Resource",
+      "resource": {
+        "name": "cpu",
+        "target": {
+          "type": "Utilization",
+          "averageUtilization": 60
+        }
+      }
+    },
+    {
       "type": "External",
       "external": {
         "metric": {
@@ -23,16 +33,6 @@
           "averageValue": "70"
         }
       }
-    },
-    {
-      "type": "Resource",
-      "resource": {
-        "name": "cpu",
-        "target": {
-          "type": "Utilization",
-          "averageUtilization": 60
-        }
-      }
     }
   ],
   "behavior": {

Expected Behavior

No unncessary updates

Actual Behavior

Unnecessary updates

Steps to Reproduce the Problem

  1. Have a scaledObject like below (note the trigger order, first Resource and then Prometheus scaler
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: envoy-blue
  namespace: foo
spec:
  cooldownPeriod: 180
  maxReplicaCount: 5
  minReplicaCount: 2
  pollingInterval: 30
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: envoy-blue
  triggers:
  - metadata:
      type: Utilization
      value: "60"
    type: cpu
  - metadata:
      metricName: container_cpu_usage_seconds_total
      query: sum((sum(rate(container_cpu_usage_seconds_total{region="us-east-1", namespace="foo", pod=~"envoy-blue.*", container="envoy"}[3m])) by (pod,container))/(sum(kube_pod_container_resource_requests_cpu_cores{region="us-east-1", namespace="foo", pod=~"envoy-blue.*", container="envoy"}) by (pod,container))) * 100.0
      serverAddress: http://foo/query
      threshold: "70"
    type: prometheus
  1. Apply this YAML, verify that ScaledObject and HPAs are created
  2. Tail the keda-operator logs, and notice that it continuously updates the HPA object with log line Updated HPA according to ScaledObject
  3. In the YAML given in step 1, rearrange the triggers such that Prometheus comes first and then the Resource scaler and apply the change
  4. Tail the keda-operator logs, the issue should be resolved now

Logs from KEDA operator

2021-01-25T11:00:34.457Z        INFO    controllers.ScaledObject        Reconciling ScaledObject        {"ScaledObject.Namespace": "foo", "ScaledObject.Name": "envoy-green"}
2021-01-25T11:00:34.457Z        DEBUG   controllers.ScaledObject        Parsed Group, Version, Kind, Resource   {"ScaledObject.Namespace": "foo", "ScaledObject.Name": "envoy-green", "GVK": "apps/v1.Deployment", "Resource": "deployments"}
2021-01-25T11:00:34.483Z        DEBUG   controllers.ScaledObject        Found difference in the HPA spec accordint to ScaledObject      {"ScaledObject.Namespace": "foo", "ScaledObject.Name": "envoy-green", "currentHPA": {"scaleTargetRef":{"kind":"Deployment","name":"envoy-green","apiVersion":"apps/v1"},"minReplicas":1,"maxReplicas":5,"metrics":[{"type":"External","external":{"metric":{"name":"prometheus-http---foo-container_cpu_usage_seconds_total","selector":{"matchLabels":{"scaledObjectName":"envoy-green"}}},"target":{"type":"AverageValue","averageValue":"70"}}},{"type":"Resource","resource":{"name":"cpu","target":{"type":"Utilization","averageUtilization":60}}}],"behavior":{"scaleUp":{"stabilizationWindowSeconds":0,"selectPolicy":"Max","policies":[{"type":"Pods","value":4,"periodSeconds":15},{"type":"Percent","value":100,"periodSeconds":15}]},"scaleDown":{"stabilizationWindowSeconds":180,"selectPolicy":"Max","policies":[{"type":"Percent","value":100,"periodSeconds":15}]}}}, "newHPA": {"scaleTargetRef":{"kind":"Deployment","name":"envoy-green","apiVersion":"apps/v1"},"minReplicas":1,"maxReplicas":5,"metrics":[{"type":"Resource","resource":{"name":"cpu","target":{"type":"Utilization","averageUtilization":60}}},{"type":"External","external":{"metric":{"name":"prometheus-http---foo-query-staging-freshworks-edge-container_cpu_usage_seconds_total","selector":{"matchLabels":{"scaledObjectName":"envoy-green"}}},"target":{"type":"AverageValue","averageValue":"70"}}}],"behavior":{"scaleDown":{"stabilizationWindowSeconds":180,"policies":[{"type":"Percent","value":100,"periodSeconds":15}]}}}}
2021-01-25T11:00:34.492Z        INFO    controllers.ScaledObject        Updated HPA according to ScaledObject   {"ScaledObject.Namespace": "foo", "ScaledObject.Name": "envoy-green", "HPA.Namespace": "foo", "HPA.Name": "keda-hpa-envoy-green"}
2021-01-25T11:00:34.492Z        DEBUG   controllers.ScaledObject        ScaledObject is defined correctly and is ready for scaling      {"ScaledObject.Namespace": "foo", "ScaledObject.Name": "envoy-green"}
2021-01-25T11:00:34.505Z        DEBUG   controller      Successfully Reconciled {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "name": "envoy-green", "namespace": "foo"}

Specifications

  • KEDA Version: 2.0.0
  • Platform & Version: Linux x64
  • Kubernetes Version: AWS EKS 1.18.9
  • Scaler(s): CPU and Prometheus
@surki surki added the bug Something isn't working label Jan 25, 2021
@zroubalik
Copy link
Member

zroubalik commented Jan 25, 2021

Thanks a lot @surki for the detailed analysis! Agree, the current behavior is not good :(

To fix this, we can probably append the Resources metrics always at the end of the generated metrics list, in here:

func (r *ScaledObjectReconciler) getScaledObjectMetricSpecs(logger logr.Logger, scaledObject *kedav1alpha1.ScaledObject) ([]autoscalingv2beta2.MetricSpec, error) {

@surki
Copy link
Contributor Author

surki commented Jan 25, 2021

hmm, we don't support other HPA resource types? wondering if should we do something like this

diff --git a/controllers/hpa.go b/controllers/hpa.go
index 3354889..beaf570 100644
--- a/controllers/hpa.go
+++ b/controllers/hpa.go
@@ -3,6 +3,7 @@ package controllers
 import (
        "context"
        "fmt"
+       "sort"

        "github.com/go-logr/logr"
        version "github.com/kedacore/keda/v2/version"
@@ -161,6 +162,10 @@ func (r *ScaledObjectReconciler) getScaledObjectMetricSpecs(logger logr.Logger,
                scaler.Close()
        }

+       sort.Slice(scaledObjectMetricSpecs, func(i, j int) bool {
+               return scaledObjectMetricSpecs[i].Type < scaledObjectMetricSpecs[j].Type
+       })
+
        // store External.MetricNames,Resource.MetricsNames used by scalers defined in the ScaledObject
        status := scaledObject.Status.DeepCopy()
        status.ExternalMetricNames = externalMetricNames

@zroubalik
Copy link
Member

@surki in KEDA we are using Resource metric type for CPU/Memory scaler and External for all the other scaler, no other types are used. Yeah, this could probably do the job :)

Would you mind sending a PR for this?

@zroubalik
Copy link
Member

@surki we plan to release KEDA 2.1 tomorrow, so please let me know if you are willing to send a PR with this, if not, I'll do that :)

@surki
Copy link
Contributor Author

surki commented Jan 26, 2021 via email

@zroubalik
Copy link
Member

@surki no worries, thanks for the heads up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants