Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pause metric in prometheus for scaledobject #4446

Closed
wants to merge 4 commits into from

Conversation

eladmotola
Copy link

@eladmotola eladmotola commented Apr 12, 2023

Provide a description of what has been changed

Checklist

Fixes #

Relates to #4430
Relates to kedacore/keda-docs#1111

@eladmotola eladmotola requested a review from a team as a code owner April 12, 2023 09:41
@tomkerkhove
Copy link
Member

Can you open a PR for our docs on kedacore/keda-docs please?

@eladmotola
Copy link
Author

@JorTurFer Hey, can you please review again?
I will fix the ci(dco) on the weekend

Copy link
Member

@tomkerkhove tomkerkhove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JorTurFer
Copy link
Member

@JorTurFer
Copy link
Member

@JorTurFer
Copy link
Member

We still need the DCO fixed 🙏
https://github.com/kedacore/keda/blob/main/CONTRIBUTING.md?#i-didnt-sign-my-commit-now-what

Signed-off-by: Elad Motola <eladmotola95@gmail.com>
@eladmotola
Copy link
Author

@JorTurFer I fixed everything 😁

@JorTurFer JorTurFer requested a review from zroubalik April 24, 2023 20:40
@JorTurFer
Copy link
Member

JorTurFer commented Apr 24, 2023

/run-e2e prometheus
Update: You can check the progress here

@tomkerkhove
Copy link
Member

tomkerkhove commented Apr 25, 2023

/run-e2e prometheus
Update: You can check the progress here

Signed-off-by: Elad Motola <eladmotola95@gmail.com>

Signed-off-by: eladmotola <43670376+eladmotola@users.noreply.github.com>
@JorTurFer
Copy link
Member

JorTurFer commented Apr 25, 2023

/run-e2e prometheus
Update: You can check the progress here

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see some inconsistencies in the code:

	scaledObjectPausedTotal = prometheus.NewGaugeVec(
		prometheus.GaugeOpts{
			Namespace: DefaultPromMetricsNamespace,
			Subsystem: "scaled_object",
			Name:      "paused",
			Help:      "Total number of paused scaled_objects",
		},
		[]string{"namespace", "scaledObject"},

vs

	if _, ok := family["keda_scaleobject_pause"]; ok {
		t.Errorf("metric should not be available")
	}

etc.

So what is the intention behind this PR? It reports total number of ScaledObjects that are in a paused state?

I think that it would be great if we add a metric, that acutally shows whether a specific ScaledObject is paused or not? At least that's what was the idea behind #4430

I think, that similar metric to keda_scaler_activity, but in this case on ScaledObject level and not Scaler level. ie. keda_scaled_object_paused should tell us whether a specific SO is paused or not.

@eladmotola
Copy link
Author

here

oh wow i missed that...
and yes this is the purpose of this pr. do you think i implement it the wrong way?

Signed-off-by: Elad Motola <eladmotola95@gmail.com>


Signed-off-by: eladmotola <43670376+eladmotola@users.noreply.github.com>
@JorTurFer
Copy link
Member

JorTurFer commented Apr 25, 2023

I think that it would be great if we add a metric, that acutally shows whether a specific ScaledObject is paused or not? At least that's what was the idea behind #4430

Why do you think that this PR doesn't do it? I mean, the metric is updated on each reconciliation based on if it's paused or not
image

@zroubalik
Copy link
Member

zroubalik commented Apr 26, 2023

I am not saying it doesn't do that, it was just super confusing given the metric name mismatch and also this:

Help: "Total number of paused scaled_objects",

I just briefly went through the code, I will do a proper review later :)

@eladmotola
Copy link
Author

Hey, can you trigger the e2e tests?

@JorTurFer
Copy link
Member

JorTurFer commented May 1, 2023

/run-e2e prometheus
Update: You can check the progress here

@eladmotola
Copy link
Author

The branch was merged from main

@JorTurFer
Copy link
Member

JorTurFer commented May 7, 2023

/run-e2e prometheus
Update: You can check the progress here

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR should be rebased once this: #4470 is merged.

@@ -204,3 +215,9 @@ func DecrementCRDTotal(crdType, namespace string) {

crdTotalsGaugeVec.WithLabelValues(crdType, namespace).Dec()
}

func RecordScalerPaused(namespace string, scaledObject string, value float64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please rewrite this function to accept bool instead of float64? the same way RecordScalerActive() is written.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind doing it. I just can't figure out why the tests fails

Namespace: DefaultPromMetricsNamespace,
Subsystem: "scaled_object",
Name: "paused",
Help: "Total number of paused scaled_objects",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help message is wrong.

@eladmotola
Copy link
Author

Any chance that one of you guys can see why what I did is wrong?

@JorTurFer
Copy link
Member

JorTurFer commented Jul 5, 2023

/run-e2e prometheus
Update: You can check the progress here

@JorTurFer
Copy link
Member

It looks like the expected metric is 0:
image

Comment on lines +766 to +769
KubectlApplyWithTemplate(t, data, "deploymentForPausedTemplate", deploymentForPausedTemplate)
KubectlApplyWithTemplate(t, data, "scaledObjectPausedTemplate", scaledObjectPausedTemplate)

family = fetchAndParsePrometheusMetrics(t, fmt.Sprintf("curl --insecure %s", kedaOperatorPrometheusURL))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the metric is checked too fast and KEDA doesn't have enough time to process the ScaledObject. I guess that you can introduce a sleep here (5 seconds should be enough) or maybe deploying this resources in the beginning of the test file instead of inside the test case

@JorTurFer
Copy link
Member

Any update? We aim to cut a release next week

@eladmotola
Copy link
Author

Any update? We aim to cut a release next week

Hey,
I'm on vacation at the moment. Feel free to take it from here

@geoffrey1330
Copy link
Contributor

geoffrey1330 commented Oct 3, 2023

Hi @zroubalik I want to finish up this PR.
could you please assign the issue to me

@zroubalik
Copy link
Member

Any update? We aim to cut a release next week

Hey, I'm on vacation at the moment. Feel free to take it from here

@eladmotola we will continue in this PR - @geoffrey1330 volunteered to do so.

@zroubalik
Copy link
Member

Done in #5045

@zroubalik zroubalik closed this Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants