-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scaledobject fallback not working as expected when prometheus trigger is failing #4249
Comments
Hi Jorge,
This is what the hpa looks like
And the metric being returned
|
Could you share operator logs? |
Sure here are the operator logs: |
This is weird. It looks like something is scaling your deployment in parallel. Do you have any other HPA or ScaledObject scaling the workload? Maybe one with CPU? |
Which line are you looking on? There are some other ScaledObjects which the operator is picking up but the ScaledObject |
yes, I though so, but I see this line every 10 seconds:
Basically, every 10 seconds I can see how the operator is scaling out to fallback, and I can't see any row saying that it's scaled in. I'll try to replicate the issue in my own env |
Ah I get you, thanks |
I have been able to reproduce it, thanks for reporting the issue |
Amazing! Any ideas what's causing the issue? |
We have an idea, but it's complex, we are debugging the code to discover the root cause |
We have found the problem, should be fixed in the next release, thanks for reporting! |
Thanks for solving the problem! Do you know when the next release will be shipped? |
it'll be in 2 week approx |
Report
When using prometheus as a trigger for a Scaledobject I've run into some unexpected behaviour where the number of replicas of a deployment will oscillate between the
minReplicaCount
andfallback.replicas
despite the status of the trigger consistently being in the failing state. Furthermore thekeda_scaler_metrics_value
consistently returns a value of 0 instead of fallback.replicas when the trigger is failing, this is different to the cron scaler for example which will fallback to the correct number of replicas specified byfallback.replicas
when given an invalid timezone.Here are the manifests I've been using to test this. Note that
spec.triggers.metadata.query
is invalid since the promQLrate
function requires an argument and in this example I haven't provided it with one. This consistently produces a 400 bad request to get the scaler to failExpected Behavior
Deployment scales to
fallback.replicas
i.e. 5 pods and then stays at 5 replicas.Actual Behavior
minReplicaCount
Steps to Reproduce the Problem
Logs from KEDA operator
Deployment state 1:
Deployment state 2:
KEDAScalerFailed event which is as expected because I've used a bad query
This metric returns 0 instead of fallback.replicas (5)
Can see that the status of the scaler is
Failing
Desired HPA metric is 1 instead of 5
KEDA Version
2.8.2
Kubernetes Version
< 1.23
Platform
Amazon Web Services
Scaler Details
Prometheus
Anything else?
No response
The text was updated successfully, but these errors were encountered: