Implements PendingJobCount to fix kedacore/keda#1323 #1391

thomas-lamure · 2020-12-02T15:33:53Z

Signed-off-by: Thomas Lamure thomas.lamure@eridanis.com

Provide a description of what has been changed

When using the accurateScalingStrategy, if a job's pod takes time to be created or started, this PR will prevent multiple jobs to be requested instead of one.
The reason for the pod start delay may be multiple, pulling new image, scaling cluster...
This PR is a port of the code from #639 that I have been using in Keda V1 since the mentioned PR.

Checklist

Commits are signed with Developer Certificate of Origin (DCO)
Tests have been added
(NOT NEEDED)A PR is opened to update the documentation on https://github.com/kedacore/keda-docs
Changelog has been updated

Fixes #1323

Signed-off-by: Thomas Lamure <thomas.lamure@eridanis.com>

zroubalik · 2020-12-02T19:25:31Z

@TsuyoshiUshio PTAL^

TsuyoshiUshio

Hi @thomas-lamure
Thank you for the contribution! This feature is awesome. I reviewed your code I have only one thing to ask. Could you keep the current behavior as a default? For example, we can add an optional property on the type in https://github.com/kedacore/keda/blob/main/api/v1alpha1/scaledjob_types.go#L64 like AccurateEnablePendingJobCount as bool default false. If any other cool way is there I'm ok as well. Just want to keep the current behavior as default to prevent breaking change for users. I believe your fix will be great for most cases.

thomas-lamure · 2020-12-04T13:49:36Z

Hi @TsuyoshiUshio,
Thanks for the review. As the current behavior, is quite erratic and is a bug forcing multiple jobs for one message if and only if the pod takes more time than the poll interval to mark the message as being handled, I am not sure that the request to keep the current behavior is really needed.
I am ready however to code it like that if you confirm your request and/or if @zroubalik and/or @tomkerkhove come to the same conclusion as you.
My intention here is to start a discussion on the subject to be sure to make the best improvement possible.
What do you think?

TsuyoshiUshio · 2020-12-04T18:03:04Z

Hi @thomas-lamure

What you said is totally true. However, I'd like to share why I said that. I'm flexible since your code is cool, I think it depend on the current policy of this repo.

The accurate strategy is developed with a customer. They are using it in production with a very important cloud backend system that deploys cloud platform. It works perfectly for now.
I'm working for Cloud provider product team. I learned how people use it differently. Even if, it looks no problem feature, deployed then cause unexpected outage. For this repo. I might too careful though.
That is why I introduce the Strategy concept to adopt a various request.
So that I thought, I'd like to keep the behavior is the safer option since we already GA, with a switch like feature flags. Your change is essential change, then in the next major version release, we can consider, the breaking change and make it default.

As I said, I might be too careful. As you did ask @zroubalik and @tomkerkhove might be a very good idea. speed vs consistency.

zroubalik · 2020-12-07T10:21:27Z

Even though I am generally all in for consistency, in this particular case, I am more inclined towards what @thomas-lamure is suggesting.

@TsuyoshiUshio I see your points, but is the currect solution really the one, that user would expect?

I am not sure if spinning mutliple jobs for one incoming message is ideal.

@tomkerkhove WDYT?

tomkerkhove · 2020-12-07T13:07:41Z

I am not sure if spinning mutliple jobs for one incoming message is ideal.

@tomkerkhove WDYT?

I don't see this happening as well since they would compete for the message

TsuyoshiUshio · 2020-12-07T23:30:38Z

Ok. Then Let's merge it as is!

zroubalik · 2020-12-08T08:57:28Z

Ok. Then Let's merge it as is!

We can add more detailed info and wanening to the changelog about this particular change, would that be better for you @TsuyoshiUshio ?

fjmacagno · 2020-12-11T19:41:34Z

pkg/scaling/executor/scale_jobs.go

+func (s accurateScalingStrategy) GetEffectiveMaxScale(maxScale, runningJobCount, pendingJobCount, maxReplicaCount int64) int64 {
 	if (maxScale + runningJobCount) > maxReplicaCount {
 		return maxReplicaCount - runningJobCount
 	}
-	return maxScale
+	return maxScale - pendingJobCount


@thomas-lamure I don't think this is quite correct. i think pendingJobCount needs to be taken into account in the comparison to maxReplicaCount.

If we say

maxScale = 10 runningJobCount = 5 pendingJobCount = 5 maxReplicaCount = 20

we get 5, which is correct.
10 + 5 = 15 < 20 -=> 10 - 5 = 5

However, if we say:

maxScale = 10 runningJobCount = 5 pendingJobCount = 5 maxReplicaCount = 10

then we get the return 5, even though runningJobCount + pendingJobCount = 10, so the maxReplicaCount is already hit.
(10 + 5 = 15) > 10 -=> 10 - 5 = 5

So this should instead be

if (maxScale + runningJobCount) > maxReplicaCount { return maxReplicaCount - runningJobCount - pendingJobCount } return maxScale - pendingJobCount

or to simplify

return min(maxReplicaCount - runningJobCount, maxScale) - pendingJobCount

Reference: #1227 (comment)

Hi @fjmacagno,

I think the problem here is the terminology, runningJobCount is in fact "jobs that are not finished", so in your example the 5 running jobs are the 5 pending jobs. so the result =5 is good, the runner can accept 5 other jobs.
If you subtract again runningJobCount, you will never scale to maxReplicaCount.

Hi @fjmacagno I tend to share @thomas-lamure view on this. But I am happy to hear more from you, whether we need to change something, or change docs in some area. Thanks!

Signed-off-by: Thomas Lamure <thomas.lamure@eridanis.com>

Implements PendingJobCount to fix kedacore#1323

7bf0cf7

Signed-off-by: Thomas Lamure <thomas.lamure@eridanis.com>

thomas-lamure requested review from ahmelsayed and zroubalik as code owners December 2, 2020 15:33

tomkerkhove assigned TsuyoshiUshio Dec 3, 2020

TsuyoshiUshio suggested changes Dec 3, 2020

View reviewed changes

tomkerkhove requested a review from TsuyoshiUshio December 8, 2020 06:38

zroubalik approved these changes Dec 11, 2020

View reviewed changes

zroubalik merged commit 85e90e8 into kedacore:main Dec 11, 2020

fjmacagno reviewed Dec 11, 2020

View reviewed changes

ycabrer pushed a commit to ycabrer/keda that referenced this pull request Mar 1, 2021

Implements PendingJobCount to fix kedacore#1323 (kedacore#1391)

a46674d

Signed-off-by: Thomas Lamure <thomas.lamure@eridanis.com>

yaronya mentioned this pull request Jul 15, 2021

Support Pod conditions for calculating PendingJobCount in accurate scaling startegy #1963

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implements PendingJobCount to fix kedacore/keda#1323 #1391

Implements PendingJobCount to fix kedacore/keda#1323 #1391

thomas-lamure commented Dec 2, 2020 •

edited

Loading

zroubalik commented Dec 2, 2020

TsuyoshiUshio left a comment •

edited

Loading

thomas-lamure commented Dec 4, 2020

TsuyoshiUshio commented Dec 4, 2020

zroubalik commented Dec 7, 2020 •

edited

Loading

tomkerkhove commented Dec 7, 2020

TsuyoshiUshio commented Dec 7, 2020

zroubalik commented Dec 8, 2020

fjmacagno Dec 11, 2020 •

edited

Loading

thomas-lamure Dec 14, 2020

zroubalik Dec 14, 2020

Implements PendingJobCount to fix kedacore/keda#1323 #1391

Implements PendingJobCount to fix kedacore/keda#1323 #1391

Conversation

thomas-lamure commented Dec 2, 2020 • edited Loading

Provide a description of what has been changed

Checklist

zroubalik commented Dec 2, 2020

TsuyoshiUshio left a comment • edited Loading

Choose a reason for hiding this comment

thomas-lamure commented Dec 4, 2020

TsuyoshiUshio commented Dec 4, 2020

zroubalik commented Dec 7, 2020 • edited Loading

tomkerkhove commented Dec 7, 2020

TsuyoshiUshio commented Dec 7, 2020

zroubalik commented Dec 8, 2020

fjmacagno Dec 11, 2020 • edited Loading

Choose a reason for hiding this comment

thomas-lamure Dec 14, 2020

Choose a reason for hiding this comment

zroubalik Dec 14, 2020

Choose a reason for hiding this comment

thomas-lamure commented Dec 2, 2020 •

edited

Loading

TsuyoshiUshio left a comment •

edited

Loading

zroubalik commented Dec 7, 2020 •

edited

Loading

fjmacagno Dec 11, 2020 •

edited

Loading