feat: Add a Prometheus metric for measuring the scalable object loop processing deviation #4703

JorTurFer · 2023-06-16T19:29:42Z

Checklist

Tests have been added
Changelog has been updated and is aligned with our changelog requirements
A PR is opened to update the documentation on docs(operate): add keda_internal_scale_loop_latency metric keda-docs#1165
Commits are signed with Developer Certificate of Origin (DCO - learn more)

Fixes #4702

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

github-actions · 2023-06-16T19:29:53Z

Thank you for your contribution! 🙏 We will review your PR as soon as possible.

While you are waiting, make sure to:

Add an entry in our changelog in alphabetical order and link related issue
Update the documentation, if needed
Add unit & e2e tests for your changes
GitHub checks are passing
Is the DCO check failing? Here is how you can fix DCO issues

Learn more about:

Our contribution guide

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

JorTurFer · 2023-06-16T19:42:19Z

/run-e2e sequential
Update: You can check the progress here

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

pkg/prommetrics/prommetrics.go

zroubalik

I wonder, do you see any actualy latency that could be mesured and exposed here? I mean one in the loop itself? We are already reporting latency per scaler. I am not sure if this kind of metric won't be confusion for users?

JorTurFer · 2023-06-18T16:11:59Z

I wonder, do you see any actualy latency that could be mesured and exposed here? I mean one in the loop itself? We are already reporting latency per scaler. I am not sure if this kind of metric won't be confusion for users?

The difference here is what are we measuring. In one case, we are measuring the trigger latency and in the other case, we are measuring the deviation between loops. Both are similar, but they are different. For example:
If I have a single ScaledObject with pollingInterval: 1 and I have 12 triggers with 100ms of latency, the latency looks nice, but the ScaledObject will have a deviation of 200 ms all the time.

These new metrics measure the difference between the expected execution time and the real execution time, so we are getting a real picture of the overload that we have, if we should have been executed the loop at X time and we execute it later, it's because something is slowing us, it could be due to the triggers' latency but also due to an overload.

If we have throttling, both metrics will increase, but the increase of keda_scaler_metrics_latency doesn't mean an overload in KEDA (because upstream can be responding slowly) and these new metrics do mean it

JorTurFer · 2023-06-18T17:01:59Z

For example, this is how it looks when there is an overload (I have forced it using 1000 SO with pollingInterval:1 and cpu: 120m) and upstream without any problem:

This is how it looks when KEDA isn't overloaded (cpu: 1) and it's the upstream who responses slowly:

zroubalik

The numbers are interesting, great stuff.

Let's procced with keda_internal_scale_loop_latency as we discussed offline.

Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>

JorTurFer · 2023-06-21T10:40:59Z

/run-e2e sequential
Update: You can check the progress here

Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

zroubalik

LGTM, great stuff!

Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

zroubalik · 2023-06-21T12:17:04Z

/run-e2e sequential
Update: You can check the progress here

tomkerkhove · 2023-06-21T12:18:27Z

Let's make sure we also update our documentation to ensure everyone knows what it represents and how to interpret it

JorTurFer · 2023-06-21T12:52:38Z

Let's make sure we also update our documentation to ensure everyone knows what it represents and how to interpret it

yeah, I plan to do it later on, I just did this PR before to ensure that we include it in the release (if the release would have been done today).
Docs update incoming

feat: add prometheus metric

1eaee4b

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

JorTurFer requested a review from a team as a code owner June 16, 2023 19:29

JorTurFer changed the title ~~feat: Add a Promethean metric for measuring the scalable object loop processing deviation~~ feat: Add a Prometheus metric for measuring the scalable object loop processing deviation Jun 16, 2023

feat: add prometheus metric

0b67b0d

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

JorTurFer force-pushed the cycle-delay branch from d94c6cc to 0b67b0d Compare June 16, 2023 19:42

fix style

c766d8d

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

JorTurFer commented Jun 16, 2023

View reviewed changes

pkg/prommetrics/prommetrics.go Outdated Show resolved Hide resolved

zroubalik reviewed Jun 18, 2023

View reviewed changes

zroubalik reviewed Jun 21, 2023

View reviewed changes

JorTurFer and others added 2 commits June 21, 2023 12:39

update metric name and fix test

3f9657d

Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

Merge branch 'main' into cycle-delay

45105d2

Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>

update metric message

933da63

Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

zroubalik approved these changes Jun 21, 2023

View reviewed changes

fix static checks

67fc6dd

Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

zroubalik merged commit a634b66 into kedacore:main Jun 22, 2023

JorTurFer deleted the cycle-delay branch July 27, 2023 07:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add a Prometheus metric for measuring the scalable object loop processing deviation #4703

feat: Add a Prometheus metric for measuring the scalable object loop processing deviation #4703

JorTurFer commented Jun 16, 2023 •

edited by zroubalik

Loading

github-actions bot commented Jun 16, 2023

JorTurFer commented Jun 16, 2023 •

edited by github-actions bot

Loading

zroubalik left a comment

JorTurFer commented Jun 18, 2023

JorTurFer commented Jun 18, 2023

zroubalik left a comment

JorTurFer commented Jun 21, 2023 •

edited by github-actions bot

Loading

zroubalik left a comment

zroubalik commented Jun 21, 2023 •

edited by github-actions bot

Loading

tomkerkhove commented Jun 21, 2023

JorTurFer commented Jun 21, 2023

feat: Add a Prometheus metric for measuring the scalable object loop processing deviation #4703

feat: Add a Prometheus metric for measuring the scalable object loop processing deviation #4703

Conversation

JorTurFer commented Jun 16, 2023 • edited by zroubalik Loading

Checklist

github-actions bot commented Jun 16, 2023

JorTurFer commented Jun 16, 2023 • edited by github-actions bot Loading

zroubalik left a comment

Choose a reason for hiding this comment

JorTurFer commented Jun 18, 2023

JorTurFer commented Jun 18, 2023

zroubalik left a comment

Choose a reason for hiding this comment

JorTurFer commented Jun 21, 2023 • edited by github-actions bot Loading

zroubalik left a comment

Choose a reason for hiding this comment

zroubalik commented Jun 21, 2023 • edited by github-actions bot Loading

tomkerkhove commented Jun 21, 2023

JorTurFer commented Jun 21, 2023

JorTurFer commented Jun 16, 2023 •

edited by zroubalik

Loading

JorTurFer commented Jun 16, 2023 •

edited by github-actions bot

Loading

JorTurFer commented Jun 21, 2023 •

edited by github-actions bot

Loading

zroubalik commented Jun 21, 2023 •

edited by github-actions bot

Loading