Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka scaler not scaling to zero when offset is not properly initialized #2033

Closed
raffis opened this issue Aug 13, 2021 · 15 comments
Closed
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity
Milestone

Comments

@raffis
Copy link

raffis commented Aug 13, 2021

Report

How is scaling to zero supposed to work? I'd like to scale to 0 if there is a topic with no messages at all or all consumed so far.
However it always scales to 1.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: svc-webhook-processor
spec:
  cooldownPeriod: 300
  fallback:
    failureThreshold: 3
    replicas: 6
  idleReplicaCount: 0
  maxReplicaCount: 100
  minReplicaCount: 0
  pollingInterval: 30
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: svc-webhook-processor
  triggers:
  - metadata:
      allowIdleConsumers: "false"
      bootstrapServers: kafka-client.devbox-raffis-1:9092
      consumerGroup: svc-webhook-processor
      lagThreshold: "5"
      offsetResetPolicy: latest
      topic: webhook-request
      version: 1.0.0
    type: kafka
status:
  conditions:
  - message: ScaledObject is defined correctly and is ready for scaling
    reason: ScaledObjectReady
    status: "True"
    type: Ready
  - message: Scaling is performed because triggers are active
    reason: ScalerActive
    status: "True"
    type: Active
  - message: No fallbacks are active on this scaled object
    reason: NoFallbackFound
    status: "False"
    type: Fallback
  externalMetricNames:
  - kafka-webhook-request-svc-webhook-processor
  health:
    kafka-webhook-request-svc-webhook-processor:
      numberOfFailures: 0
      status: Happy
  lastActiveTime: "2021-08-13T09:45:24Z"
  originalReplicaCount: 3
  scaleTargetGVKR:
    group: apps
    kind: Deployment
    resource: deployments
    version: v1
  scaleTargetKind: apps/v1.Deployment

Expected Behavior

Scale deployment to 0.

Actual Behavior

Scaled to 1 replica.

Steps to Reproduce the Problem

Logs from KEDA operator

2021-08-13T09:49:46.505Z	INFO	kafka_scaler	invalid offset found for topic webhook-request in group svc-webhook-processor and partition 0, probably no offset is committed yet
2021-08-13T09:50:17.607Z	INFO	kafka_scaler	invalid offset found for topic webhook-request in group svc-webhook-processor and partition 0, probably no offset is committed yet
2021-08-13T09:50:48.211Z	INFO	kafka_scaler	invalid offset found for topic webhook-request in group svc-webhook-processor and partition 0, probably no offset is committed yet
2021-08-13T09:51:19.244Z	INFO	kafka_scaler	invalid offset found for topic webhook-request in group svc-webhook-processor and partition 0, probably no offset is committed yet
2021-08-13T09:51:49.732Z	INFO	kafka_scaler	invalid offset found for topic webhook-request in group svc-webhook-processor and partition 0, probably no offset is committed yet

KEDA Version

2.4.0

Kubernetes Version

1.18

Platform

Amazon Web Services

Scaler Details

Kafka

Anything else?

No response

@raffis raffis added the bug Something isn't working label Aug 13, 2021
@zroubalik
Copy link
Member

@raffis seems like the offset is not properly committed in the topic. Please try to fix that on the Kafka side (or try a new topic) and scaling should work.

@raffis
Copy link
Author

raffis commented Aug 17, 2021

@raffis seems like the offset is not properly committed in the topic. Please try to fix that on the Kafka side (or try a new topic) and scaling should work.

Yes I am aware of that but I expected that it scales down still since there is no offset yet.

@tpiperatgod
Copy link

@raffis seems like the offset is not properly committed in the topic. Please try to fix that on the Kafka side (or try a new topic) and scaling should work.

Yes I am aware of that but I expected that it scales down still since there is no offset yet.

Same issue, is it better to scale to 0 in this case?

@gcaracuel
Copy link

Same problem here. Does anybody solved this?
It is a very common case to start with an oversized topic with let's say 20 partitions so consumer lag for most of those partitions is -1 hence produces this problem but still sum lag for the topic is 0 and not -1 so it is a valid metric.

In this case if you scale the deployment to 0 KEDA Operator will scale it back to 1.

@zroubalik
Copy link
Member

I see, my only concern is that we might break existing behavior if we do this change. I am curious if there could exist a usecase where we would like to keep the current behaviour in this scenario?

@messense @grassiale @matzew @lionelvillard thoughts?

@zroubalik zroubalik changed the title Kafka scaler not scaling to zero Kafka scaler not scaling to zero when offset is not properly initialized Nov 3, 2021
@lionelvillard
Copy link
Contributor

For our use cases, the lag is the only thing that really matters.

@pierDipi
Copy link

What I would expect from KEDA is:

  1. When there are no messages at all, the lag is 0 so scale to minReplicaCount
  2. When there are messages and no committed offset, the lag depends on offsetResetPolicy, in any case, it should scale to at least one max(minReplicaCount, 1)
  3. When there is a committed offset use the consumer group lag, if the lag is 0, scale to minReplicaCount.

@stale
Copy link

stale bot commented Jan 11, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Jan 11, 2022
@zroubalik
Copy link
Member

@bpinske @PaulLiang1 opinion on this?

@stale stale bot removed the stale All issues that are marked as stale due to inactivity label Jan 11, 2022
@bpinske
Copy link
Contributor

bpinske commented Jan 13, 2022

Okay after reading through this and reminding myself what offsetResetPolicy really does.

There might people who depend on the existing behaviour, but it's going to be a small group with a misconfiguration that works by accident.

If people do solely depend on the kafka lag and want to guarantee at least 1 pod to always be available, regardless of lag being 0, then they should be be setting minReplicas=1. They should not be relying on this particular quirk where 0 gets interpreted as a valid metric preventing them from scaling to 0 even if they set minReplicas=0.

I'd suggest a highlighted note in a change log would be sufficient for this. I personally don't think this is a behaviour worth preserving when people can simply set minReplicas appropriately for what they really want.

I have similar uses cases with SQS where if there are simply no messages that need to be processed, just scale to zero - I basically treat it like aws lambda. I'd expect Kafka to have the same behaviour.

@pierDipi has the right idea.

@zroubalik
Copy link
Member

OK cool, anybody willing to implement this for the upcoming release?

@stale
Copy link

stale bot commented Apr 11, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Apr 11, 2022
@stale
Copy link

stale bot commented Apr 18, 2022

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Apr 18, 2022
Repository owner moved this from In Progress to Ready To Ship in Roadmap - KEDA Core Apr 18, 2022
@gcaracuel
Copy link

I think this should no be closed as the issue is still on going and affecting a big group of users including myself :)

@zroubalik
Copy link
Member

@gcaracuel following PR #2621 will be included in the next release. It should fix this issue, is there anything you are missing there?

@tomkerkhove tomkerkhove moved this from Ready To Ship to Done in Roadmap - KEDA Core Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity
Projects
Archived in project
Development

No branches or pull requests

8 participants