-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements PendingJobCount to fix kedacore/keda#1323 #1391
Conversation
Signed-off-by: Thomas Lamure <thomas.lamure@eridanis.com>
@TsuyoshiUshio PTAL^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @thomas-lamure
Thank you for the contribution! This feature is awesome. I reviewed your code I have only one thing to ask. Could you keep the current behavior as a default? For example, we can add an optional property on the type in https://github.com/kedacore/keda/blob/main/api/v1alpha1/scaledjob_types.go#L64 like AccurateEnablePendingJobCount
as bool default false. If any other cool way is there I'm ok as well. Just want to keep the current behavior as default to prevent breaking change for users. I believe your fix will be great for most cases.
Hi @TsuyoshiUshio, |
What you said is totally true. However, I'd like to share why I said that. I'm flexible since your code is cool, I think it depend on the current policy of this repo.
As I said, I might be too careful. As you did ask @zroubalik and @tomkerkhove might be a very good idea. speed vs consistency. |
Even though I am generally all in for consistency, in this particular case, I am more inclined towards what @thomas-lamure is suggesting. @TsuyoshiUshio I see your points, but is the currect solution really the one, that user would expect? I am not sure if spinning mutliple jobs for one incoming message is ideal. @tomkerkhove WDYT? |
I don't see this happening as well since they would compete for the message |
Ok. Then Let's merge it as is! |
We can add more detailed info and wanening to the changelog about this particular change, would that be better for you @TsuyoshiUshio ? |
func (s accurateScalingStrategy) GetEffectiveMaxScale(maxScale, runningJobCount, pendingJobCount, maxReplicaCount int64) int64 { | ||
if (maxScale + runningJobCount) > maxReplicaCount { | ||
return maxReplicaCount - runningJobCount | ||
} | ||
return maxScale | ||
return maxScale - pendingJobCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomas-lamure I don't think this is quite correct. i think pendingJobCount needs to be taken into account in the comparison to maxReplicaCount.
If we say
maxScale = 10
runningJobCount = 5
pendingJobCount = 5
maxReplicaCount = 20
we get 5
, which is correct.
10 + 5 = 15 < 20 -=> 10 - 5 = 5
However, if we say:
maxScale = 10
runningJobCount = 5
pendingJobCount = 5
maxReplicaCount = 10
then we get the return 5
, even though runningJobCount + pendingJobCount = 10
, so the maxReplicaCount is already hit.
(10 + 5 = 15) > 10 -=> 10 - 5 = 5
So this should instead be
if (maxScale + runningJobCount) > maxReplicaCount {
return maxReplicaCount - runningJobCount - pendingJobCount
}
return maxScale - pendingJobCount
or to simplify
return min(maxReplicaCount - runningJobCount, maxScale) - pendingJobCount
Reference: #1227 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @fjmacagno,
I think the problem here is the terminology, runningJobCount
is in fact "jobs that are not finished", so in your example the 5 running jobs are the 5 pending jobs. so the result =5 is good, the runner can accept 5 other jobs.
If you subtract again runningJobCount
, you will never scale to maxReplicaCount
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @fjmacagno I tend to share @thomas-lamure view on this. But I am happy to hear more from you, whether we need to change something, or change docs in some area. Thanks!
Signed-off-by: Thomas Lamure <thomas.lamure@eridanis.com>
Signed-off-by: Thomas Lamure thomas.lamure@eridanis.com
Provide a description of what has been changed
When using the accurateScalingStrategy, if a job's pod takes time to be created or started, this PR will prevent multiple jobs to be requested instead of one.
The reason for the pod start delay may be multiple, pulling new image, scaling cluster...
This PR is a port of the code from #639 that I have been using in Keda V1 since the mentioned PR.
Checklist
Fixes #1323