EventHub plugin reports strange values periodically #4250

ilya-scale · 2023-02-17T12:05:55Z

Report

I have connected KEDA to event hub

      spec:
        minReplicaCount: 2
        maxReplicaCount: 8
      triggers:
      - type: azure-eventhub
        metadata:
          storageAccountName: myaccount
          blobContainer: checkpoints
          eventHubNamespace: mynamespace
          eventHubName: myhub
          consumerGroup: myconsumer
          checkpointStrategy: blobMetadata
          unprocessedEventThreshold: "5000"

There seem to be a number of issues with the connection:

The connection seems to work but I often get very strange values. I monitor them via grafana and prometheus monitor.
The value is negative and extremely low:

I also get these logs every 10 seconds or even more often:

2023-02-17T11:58:41Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"xxx","namespace":"xxx"}, "namespace": "xxx", "name": "data-ingest-iot", "reconcileID": "e272cdaa-8474-421e-9ca4-9fae7bda42f4"}

Which is probably strange since the default check period is every 30 seconds. I have a service bus trigger on another ScaledObject and neither of these things happen (no logs for reconciling, no "off" values either)

It seems that the value of the metric is capped at unprocessedEventThreshold * maxReplicaCount as very often I get value 40000 which is quite strange. So I think the value should not be capped. The value itself seems to be off as well, somewhat like in the point 1. but I cannot see it since it is capped.

Expected Behavior

There should not be any values that are "off"
There probably should not be reconciling every 10 seconds in the log (not sure about that)
There should not be any "cap" on the metric value

Actual Behavior

Strange values appear periodically
Logs indicate that probably something is wrong
The value of the metric is capped

Steps to Reproduce the Problem

Create an eventhub with some messages coming through
Register trigger with the details posted above
Run for some time and observe this behaviour

P.S. The negative value does not seem to happen in every environment where I run k8s.

Logs from KEDA operator

example

KEDA Version

2.9.3

Kubernetes Version

1.24

Platform

Microsoft Azure

Scaler Details

EventHubs

Anything else?

No response

The text was updated successfully, but these errors were encountered:

stale · 2023-04-18T20:45:48Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

JorTurFer · 2023-04-23T21:49:04Z

Does this behavior still happen in v2.10?

ilya-scale · 2023-04-24T12:21:38Z

I am running in a development environment the upgraded version (2.10.1) and it actually does not seem to happen anymore. I need some more time to verify that this is indeed the case and upgrade production before I can tell for sure.

Was there something that was fixed in 2.10 to make that happen?

JorTurFer · 2023-04-24T17:48:53Z

If I'm not wrong (and I can be), we upgraded the dep azure-amqp-common-go from v3 to v4

ilya-scale · 2023-04-25T07:23:18Z

It seems it did not help after all:

The 0 on the chart is not the 0, it is some real value (around 200), but as you see sometimes it goes to a completely weird value (-9223372036854776)

JorTurFer · 2023-04-25T07:40:55Z

If you check the values using prometheus directly, do you see the same? I have had some renderization issues with grafana some times

ilya-scale · 2023-04-25T07:47:40Z

I am not using a dashboard, but a raw Metric from explore:

Would that be enough, or you think it can still render wrong? (The picture here does not have the error as it does not happen every time)

JorTurFer · 2023-04-25T08:06:42Z

I still think that it can be a render error, I have faced with them sometimes in grafana :/
The problem is that value doesn't make sense :( maybe you could try with different time windows when you see the error to ensure that the negative peaks are at the same time. I mean, if you have the last 3 hours and you see an error 20 minutes ago, try again with the last hour and check if the error is at the same time (if the peak is at the same time with 2 different visualizations, looks like metric problem)

ilya-scale · 2023-04-25T08:13:47Z

I have some issues with persistence, so I cannot go back in time to check unfortunately, but let me wait until it pops up again, and I can check raw prometheus data from prometheus pod to double check

ilya-scale · 2023-04-25T11:16:52Z

Here is the result from raw prometheus data (grafana is not involved):

So I guess that proves the issue is with the raw data?

JorTurFer · 2023-04-25T11:58:05Z

So, definitively KEDA is who is exposing that value :/
Do you have any idea about why this could be happening @v-shenoy @tomkerkhove ?

ilya-scale · 2023-04-25T12:12:10Z

I can also add that my setup is 2 - 8 replicas, since I have 8 partitions in a hub. Usually it runs with 2 pods, each listening to 4 partitions each.

Azure.Messaging.EventHubs.Processor 5.7.5, Azure Premium Storage Account.

stale · 2023-06-24T12:46:56Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale · 2023-07-01T13:18:45Z

This issue has been automatically closed due to inactivity.

ilya-scale · 2023-07-03T14:43:59Z

I suppose the issue was not really fixed, just closed? Can we reopen it?

JorTurFer · 2023-07-03T14:52:26Z

sure!
One question, how often does it happen? we have recently released a product that uses EventHub, it's just to check it looking for this weird values

ilya-scale · 2023-07-03T14:54:58Z

I have now turned the whole setup off since it is not possible to use with this issue, but as far as I remember it happened relatively often, more than once a day.

JorTurFer · 2023-07-03T15:01:36Z

I'll check it tomorrow morning

JorTurFer · 2023-07-04T07:11:55Z

In KEDA v2.10.1 still happens:

Do you have any idea about the root cause @tomkerkhove @v-shenoy . I guess that it's related with the circular buffer of the EventHub, but I'm not sure

stale · 2023-09-03T09:15:22Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

ilya-scale · 2023-09-05T09:40:05Z

I think this is still an active issue that should not be closed

JorTurFer · 2023-09-05T18:44:32Z

@tomkerkhove , do you think that we could get a revision from anyone with knowledge about the eventhub SDK?
Maybe we are doing something wrong, or it's something wrong at some point. I do confirm that this happens (I see it also in my own products)

tomkerkhove · 2023-09-08T06:22:55Z

Can you summarize the issue please?

JorTurFer · 2023-09-10T17:33:08Z

The OP is using evenhub scaler and they monitor the exposed value during the time through KEDA's prometheus metrics. The problem is that periodically, KEDA is reporting a weird value:

I have seen the same weird behavior in our own clusters (SCRM) and I have the suspicious that it's because we are doing something wrong calculating the lag.
I'm not an expert in how event hub works, so maybe the issue is in any other place, but if someone who knows about event hub and the golang SDK could take a look and give us some feedback, it'd be nice :)

ilya-scale added the bug Something isn't working label Feb 17, 2023

JorTurFer added this to Roadmap - KEDA Core Feb 17, 2023

github-project-automation bot moved this to To Triage in Roadmap - KEDA Core Feb 17, 2023

JorTurFer moved this from To Triage to To Do in Roadmap - KEDA Core Mar 13, 2023

stale bot added the stale All issues that are marked as stale due to inactivity label Apr 18, 2023

stale bot removed the stale All issues that are marked as stale due to inactivity label Apr 23, 2023

stale bot added the stale All issues that are marked as stale due to inactivity label Jun 24, 2023

stale bot closed this as completed Jul 1, 2023

github-project-automation bot moved this from To Do to Ready To Ship in Roadmap - KEDA Core Jul 1, 2023

JorTurFer reopened this Jul 3, 2023

github-project-automation bot moved this from Ready To Ship to Proposed in Roadmap - KEDA Core Jul 3, 2023

stale bot removed the stale All issues that are marked as stale due to inactivity label Jul 3, 2023

stale bot added the stale All issues that are marked as stale due to inactivity label Sep 3, 2023

stale bot removed the stale All issues that are marked as stale due to inactivity label Sep 5, 2023

JorTurFer added the stale-bot-ignore All issues that should not be automatically closed by our stale bot label Sep 5, 2023

troydn mentioned this issue Oct 24, 2023

Improve Azure Event Hub scaling #5125

Merged

4 tasks

JorTurFer closed this as completed in #5125 Jan 4, 2024

github-project-automation bot moved this from Proposed to Ready To Ship in Roadmap - KEDA Core Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EventHub plugin reports strange values periodically #4250

EventHub plugin reports strange values periodically #4250

ilya-scale commented Feb 17, 2023

stale bot commented Apr 18, 2023

JorTurFer commented Apr 23, 2023

ilya-scale commented Apr 24, 2023

JorTurFer commented Apr 24, 2023

ilya-scale commented Apr 25, 2023 •

edited

Loading

JorTurFer commented Apr 25, 2023

ilya-scale commented Apr 25, 2023 •

edited

Loading

JorTurFer commented Apr 25, 2023

ilya-scale commented Apr 25, 2023

ilya-scale commented Apr 25, 2023

JorTurFer commented Apr 25, 2023

ilya-scale commented Apr 25, 2023 •

edited

Loading

stale bot commented Jun 24, 2023

stale bot commented Jul 1, 2023

ilya-scale commented Jul 3, 2023 •

edited

Loading

JorTurFer commented Jul 3, 2023

ilya-scale commented Jul 3, 2023

JorTurFer commented Jul 3, 2023

JorTurFer commented Jul 4, 2023

stale bot commented Sep 3, 2023

ilya-scale commented Sep 5, 2023

JorTurFer commented Sep 5, 2023 •

edited

Loading

tomkerkhove commented Sep 8, 2023

JorTurFer commented Sep 10, 2023 •

edited

Loading

EventHub plugin reports strange values periodically #4250

EventHub plugin reports strange values periodically #4250

Comments

ilya-scale commented Feb 17, 2023

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

stale bot commented Apr 18, 2023

JorTurFer commented Apr 23, 2023

ilya-scale commented Apr 24, 2023

JorTurFer commented Apr 24, 2023

ilya-scale commented Apr 25, 2023 • edited Loading

JorTurFer commented Apr 25, 2023

ilya-scale commented Apr 25, 2023 • edited Loading

JorTurFer commented Apr 25, 2023

ilya-scale commented Apr 25, 2023

ilya-scale commented Apr 25, 2023

JorTurFer commented Apr 25, 2023

ilya-scale commented Apr 25, 2023 • edited Loading

stale bot commented Jun 24, 2023

stale bot commented Jul 1, 2023

ilya-scale commented Jul 3, 2023 • edited Loading

JorTurFer commented Jul 3, 2023

ilya-scale commented Jul 3, 2023

JorTurFer commented Jul 3, 2023

JorTurFer commented Jul 4, 2023

stale bot commented Sep 3, 2023

ilya-scale commented Sep 5, 2023

JorTurFer commented Sep 5, 2023 • edited Loading

tomkerkhove commented Sep 8, 2023

JorTurFer commented Sep 10, 2023 • edited Loading

ilya-scale commented Apr 25, 2023 •

edited

Loading

ilya-scale commented Apr 25, 2023 •

edited

Loading

ilya-scale commented Apr 25, 2023 •

edited

Loading

ilya-scale commented Jul 3, 2023 •

edited

Loading

JorTurFer commented Sep 5, 2023 •

edited

Loading

JorTurFer commented Sep 10, 2023 •

edited

Loading