Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting empty metric data with multiple dimensions in aws-cloudwatch trigger #2760

Closed
dekelev opened this issue Mar 15, 2022 · 12 comments
Closed
Labels
bug Something isn't working help wanted Looking for support from community stale All issues that are marked as stale due to inactivity

Comments

@dekelev
Copy link
Contributor

dekelev commented Mar 15, 2022

Report

Using the following ScaledObject config, I always get an empty metric data:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker
spec:
  scaleTargetRef:
    name: worker
  minReplicaCount: 1
  maxReplicaCount: 10
  pollingInterval: 30
  cooldownPeriod:  300
  triggers:
    - type: aws-cloudwatch
      authenticationRef:
        name: keda-auth
      metadata:
        namespace: AWS/AmazonMQ
        dimensionName: Broker;Queue
        dimensionValue: aws-production;aws-worker
        metricName: MessageCount
        metricStat: Average
        metricUnit: Count
        metricStatPeriod: '60'
        metricCollectionTime: '60'
        targetMetricValue: '10'
        minMetricValue: '0'
        awsRegion: us-east-1

Running the following query in CloudWatch UI with 1 minute period returns a number above 0:
SELECT AVG(MessageCount) FROM "AWS/AmazonMQ" WHERE Broker = 'aws-production' and Queue = 'aws-worker'

While keda-operator log repeatedly shows:
aws_cloudwatch_scaler empty metric data received, returning minMetricValue

Expected Behavior

Getting a number above 0 for the requested metric.

Actual Behavior

Getting 0 for the requested metric.

Steps to Reproduce the Problem

  1. Deploy a managed RabbitMQ - Amazon MQ
  2. Deploy the ScaledObject config
  3. Check keda-operator log

Logs from KEDA operator

1.6473359324006155e+09	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": ":8080"}
1.6473359324026015e+09	INFO	setup	Running on Kubernetes 1.21+	{"version": "v1.21.5-eks-bc4871b"}
1.6473359324028835e+09	INFO	setup	Starting manager
1.6473359324029562e+09	INFO	setup	KEDA Version: 2.6.1
1.64733593240299e+09	INFO	setup	Git Commit: efca71d6bc770408468a9e1a4b3984f7136c0967
1.6473359324030223e+09	INFO	setup	Go Version: go1.17.3
1.6473359324030423e+09	INFO	setup	Go OS/Arch: linux/amd64
1.6473359324034576e+09	INFO	Starting server	{"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
1.6473359324034605e+09	INFO	Starting server	{"kind": "health probe", "addr": "[::]:8081"}
I0315 09:18:52.403596       1 leaderelection.go:248] attempting to acquire leader lease keda/operator.keda.sh...
I0315 09:19:36.449460       1 leaderelection.go:258] successfully acquired lease keda/operator.keda.sh
1.6473359764499505e+09	INFO	controller.scaledobject	Starting EventSource	{"reconciler group": "keda.sh", "reconciler kind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
1.647335976450036e+09	INFO	controller.scaledobject	Starting EventSource	{"reconciler group": "keda.sh", "reconciler kind": "ScaledObject", "source": "kind source: *v2beta2.HorizontalPodAutoscaler"}
1.6473359764500802e+09	INFO	controller.scaledobject	Starting Controller	{"reconciler group": "keda.sh", "reconciler kind": "ScaledObject"}
1.6473359764505255e+09	INFO	controller.triggerauthentication	Starting EventSource	{"reconciler group": "keda.sh", "reconciler kind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
1.6473359764505723e+09	INFO	controller.triggerauthentication	Starting Controller	{"reconciler group": "keda.sh", "reconciler kind": "TriggerAuthentication"}
1.6473359764506705e+09	INFO	controller.scaledjob	Starting EventSource	{"reconciler group": "keda.sh", "reconciler kind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
1.6473359764507174e+09	INFO	controller.scaledjob	Starting Controller	{"reconciler group": "keda.sh", "reconciler kind": "ScaledJob"}
1.6473359764499505e+09	INFO	controller.scaledobject	Starting EventSource	{"reconciler group": "keda.sh", "reconciler kind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
1.647335976450036e+09	INFO	controller.scaledobject	Starting EventSource	{"reconciler group": "keda.sh", "reconciler kind": "ScaledObject", "source": "kind source: *v2beta2.HorizontalPodAutoscaler"}
1.6473359764500802e+09	INFO	controller.scaledobject	Starting Controller	{"reconciler group": "keda.sh", "reconciler kind": "ScaledObject"}
1.6473359764505255e+09	INFO	controller.triggerauthentication	Starting EventSource	{"reconciler group": "keda.sh", "reconciler kind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
1.6473359764505723e+09	INFO	controller.triggerauthentication	Starting Controller	{"reconciler group": "keda.sh", "reconciler kind": "TriggerAuthentication"}
1.6473359764506705e+09	INFO	controller.scaledjob	Starting EventSource	{"reconciler group": "keda.sh", "reconciler kind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
1.6473359764507174e+09	INFO	controller.scaledjob	Starting Controller	{"reconciler group": "keda.sh", "reconciler kind": "ScaledJob"}
1.6473359764508784e+09	INFO	controller.clustertriggerauthentication	Starting EventSource	{"reconciler group": "keda.sh", "reconciler kind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"}
1.647335976450922e+09	INFO	controller.clustertriggerauthentication	Starting Controller	{"reconciler group": "keda.sh", "reconciler kind": "ClusterTriggerAuthentication"}
1.6473359765509717e+09	INFO	controller.triggerauthentication	Starting workers	{"reconciler group": "keda.sh", "reconciler kind": "TriggerAuthentication", "worker count": 1}
1.647335976551054e+09	INFO	controller.scaledobject	Starting workers	{"reconciler group": "keda.sh", "reconciler kind": "ScaledObject", "worker count": 5}
1.6473359765512447e+09	INFO	controller.scaledobject	Reconciling ScaledObject	{"reconciler group": "keda.sh", "reconciler kind": "ScaledObject", "name": "worker, "namespace": "default"}
1.6473359765512555e+09	INFO	controller.clustertriggerauthentication	Starting workers	{"reconciler group": "keda.sh", "reconciler kind": "ClusterTriggerAuthentication", "worker count": 1}
1.6473360069127352e+09	INFO	aws_cloudwatch_scaler	empty metric data received, returning minMetricValue
1.6473360368981402e+09	INFO	aws_cloudwatch_scaler	empty metric data received, returning minMetricValue
1.647336066911557e+09	INFO	aws_cloudwatch_scaler	empty metric data received, returning minMetricValue

KEDA Version

2.6.1

Kubernetes Version

1.21

Platform

Amazon Web Services

Scaler Details

AWS CloudWatch

More Info

I'm debugging this locally and this is what I see when I add the input variable to the log:

{"level":"debug","ts":1647427653.20592,"logger":"aws_cloudwatch_scaler","msg":"empty metric data received, returning minMetricValue","data":"{\n  EndTime: 2022-03-16 12:47:00 +0200 IST,\n  MetricDataQueries: [{\n      Id: \"c1\",\n      MetricStat: {\n        Metric: {\n          Dimensions: [{\n              Name: \"Broker\",\n              Value: \"aws-production\"\n            },{\n              Name: \"Queue\",\n              Value: \"aws-worker\"\n            }],\n          MetricName: \"MessageCount\",\n          Namespace: \"AWS/AmazonMQ\"\n        },\n        Period: 60,\n        Stat: \"Average\",\n        Unit: \"Count\"\n      },\n      ReturnData: true\n    }],\n  ScanBy: \"TimestampDescending\",\n  StartTime: 2022-03-16 12:46:00 +0200 IST\n}"}

And this log shows that the response from CloudWatch is missing the Values field:

{"level":"debug","ts":1647427772.906795,"logger":"aws_cloudwatch_scaler","msg":"Received Metric Data","data":"{\n  MetricDataResults: [{\n      Id: \"c1\",\n      Label: \"MessageCount\",\n      StatusCode: \"Complete\"\n    }]\n}"}
@dekelev dekelev added the bug Something isn't working label Mar 15, 2022
@JorTurFer
Copy link
Member

Hi @dekelev
I'm not an expert on AWS services, but the query looks good, doesn't?
Maybe is there any change in their API?

@JorTurFer JorTurFer added the help wanted Looking for support from community label Mar 16, 2022
@dekelev
Copy link
Contributor Author

dekelev commented Mar 16, 2022

Hi @JorTurFer , Thanks for responding!

After debugging the Scaler class and the cloudwatch-sdk, I came to conclusion that it is probably an issue with the cloudwatch-sdk, because the request params sent to CloudWatch looks fine when using multiple dimensions.

As a workaround, I ended up with a forked git repo, replacing the metric data query with an expression query:

Expression: aws.String(c.metadata.expression),
Id:         aws.String("q1"),
Period:     aws.Int64(c.metadata.metricStatPeriod),
Label:      aws.String(c.metadata.label),

dekelev@d92d23c

@zroubalik
Copy link
Member

@dekelev would you mind opening a PR to fix this issue?

@dekelev
Copy link
Contributor Author

dekelev commented Mar 16, 2022

My current fix is just to replace the whole aws-cloudwatch scaler's internals with an expression logic, which wasn't the author vision for this scaler.

I don't have enough time available now to create a PR, but I'll consider extending this scaler in the future if no one else will.
I know there was another issue opened regarding the missing of expression feature in this scaler, so I'm not the only one that is bumping into this issue.

It doesn't seems to be much effort to extend the scaler so it would validate and execute the expression only if it is defined in the user's config.

@dekelev
Copy link
Contributor Author

dekelev commented Mar 19, 2022

From some reason, I also had to multiply the metric value returned from CloudWatch by 100, so that the external metric's quantity would be correctly consumed and displayed by the HPA.

Any idea why the metric is not consumed as is by the HPA?

I'm not sure if using the original scaler without expression won't lead to the same issue, so if I create a PR, this might be a breaking-change.

@zroubalik
Copy link
Member

HPA doesn't work with float numbers, integers only.
Seems like this scaler is not correct with that regard,

targetMetricValue float64
minMetricValue float64

@JorTurFer FYI

@JorTurFer
Copy link
Member

Interesting! So we are missing the info during the cast to int64 🤔
Maybe we should update all scalers to use always int64 and explain it in the docs to be clearest as possible

@zroubalik
Copy link
Member

yeah, we should fix that

@dekelev
Copy link
Contributor Author

dekelev commented Apr 15, 2022

I've opened this PR to add alternative expression field and also changed the scaler to use int64 instead of float64.

@dekelev
Copy link
Contributor Author

dekelev commented Apr 15, 2022

HPA doesn't work with float numbers, integers only. Seems like this scaler is not correct with that regard,

targetMetricValue float64
minMetricValue float64

@JorTurFer FYI

@zroubalik I still had to multiply the metric by 100, even with int64. I have no idea why, but it only works like that.
I'm running it in production environment, it runs well and I can see the expected metric numbers in the HPA stats.

https://github.com/kedacore/keda/pull/2911/files#diff-79307abe3419ddfb44131b054ea48e8b03c304201d38ad08b386b9d64e40875fR284

@stale
Copy link

stale bot commented Jun 14, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Jun 14, 2022
@JorTurFer
Copy link
Member

JorTurFer commented Jun 14, 2022

I think this is already covered in the PR, feel free to reopen it if it's not done 😄

@JorTurFer JorTurFer closed this as not planned Won't fix, can't repro, duplicate, stale Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Looking for support from community stale All issues that are marked as stale due to inactivity
Projects
Archived in project
Development

No branches or pull requests

3 participants