-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gcp-pubsub throwing "could not find stackdriver metric" #5429
Comments
Hello, Could you share KEDA operator logs as well?
This case is already covered by e2e tests and it works. One thing that can happen is that if you don't have messages, you won't get a metric because the API itself responses with an error (which is normal if there isn't any activity related with the queue AFAIK about pub-sub monitoring). There is a change that could affect, but I don't think that it's affecting as the e2e tests still passed, and the change kept the default behavior (and that's why I ask for more info xD) |
I am having the exact same issue. I had to rollback to 2.12.1 because we had issue with some workload not scaling up. |
So, does it work in KEDA v2.12.1 and not in KEDA v2.13.0? |
Hi, first of all, thanks for this great project! But facing same issue here, I'm using:
My ScaledObject: apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: some-name
spec:
scaleTargetRef:
name: some-name
triggers:
- type: gcp-pubsub
authenticationRef:
name: trigger-authentication-dev
kind: ClusterTriggerAuthentication
metadata:
mode: NumUndeliveredMessages
value: "5"
activationValue: "0"
subscriptionName: projects/my-project/subscriptions/my-sub The error: 2024-01-31T04:53:00Z ERROR gcp_pub_sub_scaler error getting metric {"type": "ScaledObject", "namespace": "default", "name": "some-name", "metricType": "pubsub.googleapis.com/subscription/num_undelivered_messages", "error": "could not find stackdriver metric with query fetch pubsub_subscription | metric 'pubsub.googleapis.com/subscription/num_undelivered_messages' | filter (resource.project_id == 'my-project' && resource.subscription_id == 'my-sub') | within 1m"} I copy-pasted the query I found in error message and run it on GCP's Metrics Explorer: Things that I noticed are:
Based on its documentation, it says:
Those might not related, but I'm just trying to provide data as much as possible and hopefully they helps to debug the situation. And based on the scaler code, looks like we mark it as an Error if stackdriver doesn't return value.
|
Some more context about what happened. I have 8 gcp-pubsub scaledObject. When we upgraded keda to 2.13.0, all 5 deployements targeted by scaledobject with Those with I am trying to get a reproduction scenario. here is a basic manifest. apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: test-francois
namespace: test-francois
spec:
maxReplicaCount: 5
minReplicaCount: 0
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-francois
triggers:
- authenticationRef:
kind: ClusterTriggerAuthentication
name: keda-clustertrigger-auth-gcp-credentials
metadata:
mode: SubscriptionSize
subscriptionName: test-francois-sub
value: "4"
type: gcp-pubsub You need to publish manually in the topic in question. Don't need to ack any message. Just use the value to scale up/down any random deployment. Using this I can reproduce having an error log in keda-operator.
Roughly 1/3 of the entries show the target with However this setup is not enough to reproduce the issue above. The deployement is not being scaled down to 0. |
Looking at https://github.com/kedacore/keda/pull/5246/files which as been merged for 2.13.0, I see And
|
Nice research! I was thinking that maybe we changed a default behavior by mistake and it looks like that (and we have to fix it) I'm thinking on adding the aggregation window as optional parameter too (for next version) |
@FrancoisPoinsot , I've reverted the change in default time horizon in this PR. The generated image with that change is |
I confirm that with |
Do you see any increase of the goroutines now? |
goroutines count looks stable too. |
Thanks for the feedback ❤️ Probably I was right and the issue with the routines was the not closed properly. As now the scaler isn't being regenerated on each check, the issue is mitigated. I've included the proper closing of the connection too as part of the PR: 4084ee0 |
For anyone still encountering this error, ensure that your service account is granted the role |
Report
After updating to 2.13.0, gcp_pub_sub_scaler repeatedly throws "error getting metric" and scale_handler throws "error getting scale decision" with "could not find stackdriver metric with query fetch pubsub_subscription" with unacked messages and failure to scale.
Expected Behavior
keda scales application from zero
Actual Behavior
keda fails to scale application from zero
Steps to Reproduce the Problem
Logs from KEDA operator
No response
KEDA Version
2.13.0
Kubernetes Version
Other
Platform
Google Cloud
Scaler Details
gcp pubsub
Anything else?
kubernetes version 1.28.3-gke.1203001
The text was updated successfully, but these errors were encountered: