Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataDog scaler makes KEDA operator crashing loop back #2625

Closed
toan-hf opened this issue Feb 11, 2022 · 6 comments · Fixed by #2629
Closed

DataDog scaler makes KEDA operator crashing loop back #2625

toan-hf opened this issue Feb 11, 2022 · 6 comments · Fixed by #2629
Labels
bug Something isn't working
Milestone

Comments

@toan-hf
Copy link

toan-hf commented Feb 11, 2022

Report

I recently tried to use DataDog Scaler to scale my deployment. But every time, when i apply the scaledobjects like below, the keda-operator pod will crash loopback with a stack trace error

Keda version 2.6.0

Trying to apply the simple manifest like

kind: ScaledObject
metadata:
  name: keda-demo-4
  namespace: development
spec:
  minReplicaCount: 1
  maxReplicaCount: 20
  scaleTargetRef:
    name: supper-app
  triggers:
  - type: datadog
    metadata:
      # Required: datadog metric query
      query: "max:aws.sqs.approximate_number_of_messages_visible"
      # Required: according to the number of query result, to scale the TargetRef
      queryValue: "10"
      # Optional: (Global or Average). Whether the target value is global or average per pod. Default: Average
      type: "Global"
      # Optional: The time window (in seconds) to retrieve metrics from Datadog. Default: 90
      age: "60"
    authenticationRef:
      name: keda-trigger-auth-datadog-secret

The Keda Operator pod will throw the error and keep crashing loopback

image

Trying to hit API DataDog, the metrics is existing
image

Expected Behavior

DataDog scaler should run properly.

Actual Behavior

Keda Operator will be crashed and hung forever until We delete that scaledObjects (keda-demo-4)

Steps to Reproduce the Problem

  1. Deploy Keda as usual (version 2.60)
  2. Apply scaledObjects like above

Logs from KEDA operator

panic: runtime error: slice bounds out of range [:-1]

goroutine 476 [running]:
github.com/kedacore/keda/v2/pkg/scalers.parseDatadogMetadata(0xc0010d71e0)
        /workspace/pkg/scalers/datadog_scaler.go:130 +0x645
github.com/kedacore/keda/v2/pkg/scalers.NewDatadogScaler({0x3781978, 0xc000669800}, 0x37ca550)
        /workspace/pkg/scalers/datadog_scaler.go:48 +0x2f
github.com/kedacore/keda/v2/pkg/scaling.buildScaler({0x3781978, 0xc000669800}, {0x37ca550, 0xc000e27770}, {0xc000922ba4, 0x7}, 0x37a7860)
        /workspace/pkg/scaling/scale_handler.go:383 +0x258
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).buildScalers.func1()
        /workspace/pkg/scaling/scale_handler.go:322 +0x365
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).buildScalers(0xc0006217a0, {0x3781978, 0xc000669800}, 0xc0000e0500, 0xc00061b200, {0x0, 0x0})
        /workspace/pkg/scaling/scale_handler.go:325 +0x5dc
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScalersCache(0xc0006217a0, {0x3781978, 0xc000669800}, {0x307ede0, 0xc000dce000})
        /workspace/pkg/scaling/scale_handler.go:194 +0x2fa
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).getScaledObjectMetricSpecs(0xc001247d40, {0x3781978, 0xc000669800}, {{0x37a7860, 0xc000669830}, 0xc0010d77e0}, 0xc000dce000)
        /workspace/controllers/keda/hpa.go:163 +0x8c
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).newHPAForScaledObject(0xc001247d40, {0x3781978, 0xc000669800}, {{0x37a7860, 0xc000669830}, 0xc000f3b580}, 0xc000dce000, 0xc0010d7a48)
        /workspace/controllers/keda/hpa.go:63 +0x66
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).createAndDeployNewHPA(0xc001247d40, {0x3781978, 0xc000669800}, {{0x37a7860, 0xc000669830}, 0xc0005ef740}, 0xc000dce000, 0x3810b28)
        /workspace/controllers/keda/hpa.go:46 +0x216
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).ensureHPAForScaledObjectExists(0xc001247d40, {0x3781978, 0xc000669800}, {{0x37a7860, 0xc000669830}, 0x37a7860}, 0xc000dce000, 0x0)
        /workspace/controllers/keda/scaledobject_controller.go:361 +0x2d0
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).reconcileScaledObject(0xc001247d40, {0x3781978, 0xc000669800}, {{0x37a7860, 0xc000669830}, 0x0}, 0xc000dce000)
        /workspace/controllers/keda/scaledobject_controller.go:229 +0x1ae
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).Reconcile(0xc001247d40, {0x3781978, 0xc000669800}, {{{0xc000922b40, 0x2f4b9a0}, {0xc000922b30, 0x30}}})
        /workspace/controllers/keda/scaledobject_controller.go:180 +0x38a
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc000fa8dc0, {0x3781978, 0xc000669740}, {{{0xc000922b40, 0x2f4b9a0}, {0xc000922b30, 0x413974}}})
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:114 +0x26f
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000fa8dc0, {0x37818d0, 0xc0012bd180}, {0x2d15e20, 0xc00007ee60})
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:311 +0x33e
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000fa8dc0, {0x37818d0, 0xc0012bd180})
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:223 +0x357

KEDA Version

2.6.0

Kubernetes Version

1.21

Platform

Other

Scaler Details

DataDog.

Anything else?

In case i changed the DataDog query with an Tag filter like
max:aws.sqs.approximate_number_of_messages_visible{app_name:supper-app}

Then the keda-operator will run properly however that metric will not be able to return along with the error (from keda-metric-server)

E0211 16:12:52.946301       1 provider.go:124] keda_metrics_adapter/provider "msg"="error getting metric for scaler" "error"="error getting metrics from Datadog: no Datadog metrics returned"  "scaledObject.Name"="keda-demo-4" "scaledObject.Namespace"="development" "scaler"={}
@toan-hf toan-hf added the bug Something isn't working label Feb 11, 2022
@tomkerkhove tomkerkhove moved this to Proposed in Roadmap - KEDA Core Feb 11, 2022
@zroubalik
Copy link
Member

zroubalik commented Feb 14, 2022

Thanks for reporting this. I can see that your DataDog Query is: query: "max:aws.sqs.approximate_number_of_messages_visible", though the code expects a curly bracket in the query:

metricName := meta.query[0:strings.Index(meta.query, "{")]
meta.metricName = GenerateMetricNameWithIndex(config.ScalerIndex, kedautil.NormalizeString(fmt.Sprintf("datadog-%s", metricName)))

I am by no means an expert on DataDog (queries), but if a valid query doesn't have to contain curly bracket, we shouldn't create a metric name based on that expectation. We should probably just put a beginning of the query in the metric name or something like that.

@arapulido FYI^

@arapulido
Copy link
Contributor

Thanks a lot @toan-hf for the issue!

Yes, a query in Datadog needs to contain a curly bracket. If you want to say that you need all tags, then you can use {*}
I agree that this needs to be checked before, there is currently a PR for that: #2629

Depending on the metric, it may take a while until retrieving the value from Datadog, hence the error (I will work to change it to a warning depending on what's returned).

If you wait a while, does it eventually get a metric value?

@zroubalik
Copy link
Member

@arapulido thanks for the confirmation :)

@zroubalik zroubalik added this to the v2.7.0 milestone Feb 14, 2022
@zroubalik zroubalik moved this from Proposed to In Review in Roadmap - KEDA Core Feb 14, 2022
@toan-hf
Copy link
Author

toan-hf commented Feb 14, 2022

@arapulido I am aware of the delay metric issue returning as it is crawled from the AWS CloudWatch (had been mentioned here.
But if changing the query time windows to 15 minutes i still see the same error

E0214 10:07:46.713683 1 logr.go:270] keda_metrics_adapter/datadog_scaler "msg"="error getting metrics from Datadog" "error"="no Datadog metrics returned"
image

Even I have changed to

      age: "900" = 15 minutes 

poolingInterval = 15 minutes

my query is

query: "avg:aws.sqs.approximate_number_of_messages_visible{app_name:REDACTED}"

Even though if i query directly from the DataDog UI it is still showing properly
image

So what I am worried about now is the query that we made from KEDA to the Api.DataDogHq, is there anything to look at

@toan-hf
Copy link
Author

toan-hf commented Feb 14, 2022

To summarise what i have been done

  • This problem has been raised here is about We did not put the { in our query. So to prevent the Keda Operator pod crashing loopback, We need to make sure to put that specific character into our query. Thanks to this patch Ali
  • Regarding the error, "no metrics found" that happens sometimes I have adjusted the pooling interval time and the query time windows to reduce it, even though it is still happening sometimes. So better to figure out the main cause of why it happens like that.

Thanks a lot for your help @zroubalik & @arapulido

Let me know if you have other ideas otherwise, We can close this issue and try to open another issue to investigate the second one.

@zroubalik
Copy link
Member

@toan-hf opening another issue for the second problem would be great. This one will be closed once the related PR is merged.

Repository owner moved this from In Review to Ready To Ship in Roadmap - KEDA Core Feb 15, 2022
@tomkerkhove tomkerkhove moved this from Ready To Ship to Done in Roadmap - KEDA Core Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants