-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka scaler reports -2 and scale to maxReplicas #2612
Comments
So I check the Keda log and it's effectively what I thought, I saw the following line in the log file:
I think that in this case the scaler should report 0 instead of -2. |
Are you open to contributing a fix? |
When getting the consumer offset of a Kafka topic for which no offset was committed yet, the scaler returns with -1 instead of 0, which causes to scale to the maximum number of replicas. Also fixed some typos and used interim variables for error strings. Fixes kedacore#2612 Signed-off-by: Ram Cohen <ram.cohen@gmail.com>
Already working on it |
@tomkerkhove next time I'll try to contribute, this time @RamCohen beat me at it ;) |
@RamCohen @loicmathieu could you please check this related PR #2033 |
@zroubalik as far as I understand it, this seems to be the same issue so #2621 will also fix #2033 |
@loicmathieu what do you think about this specific comment? #2033 (comment) Specifically about this:
|
There is a risk of chicken and egg issue here, when there is no offset, if we return 0 and a minReplicas of 0 is configured, we will never scalle to 1 so there will never be any consumption, so there will never be any offset, ... In case no offset, maybe the best is to return 1 so at least we scale to 1 to wait for the first message. Returing 0 as in the current PR seems more logical but in this case a big WARNING needs to be written in the documentation stateting that in case no offset exists the scaler will return 0 so minReplicas mustnot be configured to 0 as long as no offsets has been committed for this topic. This should be error prone thought. |
Anyway, the behaviour in case no offset must be described in the documentation as if we return 1 this can also surpised people. |
+1 That's why I am more inclined towards to solution propsed in that comment. Not sure if we should scale to 0, we should try to make it as easy as possible for users. And scaling to at least 1 replica could help mitigate the problem. Solid documentation is a must. That's why I don't think we should release this in 2.6.1, but should think about this properly and prepare the feature for the next release. (CC @tomkerkhove that' why I am removing the milestone). |
Perhaps having a parameter that specifies what to return in case of a missing offset (0/1), with a default of '1' as this resembles more what is happening today ? |
Yes, I was thinking about it also, the best is to let the user choose how many replicas should be available waiting for a message when no offsets exist. What can be done is to fix the current issue, as returning -1 is a bug (the HPA doesn't understand it) so the current PR is better and could be shipped as a workaround while waiting for a better solution: a fix number of replicas in case there is no offset. |
Having and optional parameter on 0/1 with a sane default seems like a great idea. I prefer to not create an intermediate step in the functionality. So I will keep this #2621 open and would be better if we can find with a proper fix for the next release @RamCohen @loicmathieu |
@zroubalik I still think that it's a bug on the current implementation and 0 should be returned instead as the HPA have an incorrect behaviour when metrics are negative (it goes to the max). This bug is easily fixable to why not fixing it asap ? Having an ability to choose how many replicas to maintain when no offset is a new feature that may be more complex to solves but I agree it's a best way to handle no offset. |
@loicmathieu that might be true, but the current behavior is present from the beginning so some users might expect this. So I don't want to break any existing stuff twice in a short period of time. Let's do it properly in one attempt. |
Reading the description from #996 where the Agree with @loicmathieu , it does seem to be a more reasonable approach to let the user know to set minRep >= 1 in such case + warning instead. |
When getting the consumer offset of a Kafka topic for which no offset was committed yet, the scaler returns with -1 instead of 0, which causes to scale to the maximum number of replicas. Also fixed some typos and used interim variables for error strings. Fixes kedacore#2612 Signed-off-by: Ram Cohen <ram.cohen@gmail.com>
When getting the consumer offset of a Kafka topic for which no offset was committed yet, the scaler returns with -1 instead of 0, which causes to scale to the maximum number of replicas. Also fixed some typos and used interim variables for error strings. Fixes kedacore#2612 Signed-off-by: Ram Cohen <ram.cohen@gmail.com>
When getting the consumer offset of a Kafka topic for which no offset was committed yet, the scaler returns with -1 instead of 0, which causes to scale to the maximum number of replicas. Also fixed some typos and used interim variables for error strings. Fixes kedacore#2612 Signed-off-by: Ram Cohen <ram.cohen@gmail.com>
Report
We have a situation where Kafka scaler reports -2 and the HPA scale to the maxReplicas.
A CPU scaler is also defined but it reports 1% usage so the desired replicas count should be 1 not 3.
The following information comes from a kubectl describe on the HPA
Other scalers defined the same way and connected to the same Kafka broker works as expected.
Expected Behavior
As there is no lag and no CPU, HPA need to scale to the minReplicas wich is 1.
Actual Behavior
HPA scale to maxReplicas wich is 3.
Steps to Reproduce the Problem
Here is the sclaled objet definition we use:
Logs from KEDA operator
KEDA Version
No response
Kubernetes Version
No response
Platform
Google Cloud
Scaler Details
cpu, kafka
Anything else?
Maybe the issue is that the lag is not reported and Keda returns -2 instead of 0 when there is no lag ?
The text was updated successfully, but these errors were encountered: