-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added alternative expression field in AWS CloudWatch scaler #2911
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general (thanks for this feature), I have left one inline comment (and it is probably the reason because test are failing).
Apart from the inline comment, there are some things that need to be done to do this PR mergeable.
- Add some cases to unit tests
- Add one e2e test to cover this new feature
- Sign all the commits
- Open a PR to docs repository explaining this feature
pkg/scalers/aws_cloudwatch_scaler.go
Outdated
@@ -288,19 +281,27 @@ func (c *awsCloudwatchScaler) GetMetrics(ctx context.Context, metricName string, | |||
|
|||
metric := external_metrics.ExternalMetricValue{ | |||
MetricName: metricName, | |||
Value: *resource.NewQuantity(int64(metricValue), resource.DecimalSI), | |||
Value: *resource.NewQuantity(metricValue*100, resource.DecimalSI), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you doing this? Multiplying this value * 100 changes current behaviour, doesn't?
I'd say that using an expression, this multiplier can be applied inside the expression, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JorTurFer When debugging the scaler I've noticed that values are returned as whole numbers from the CloudWatch lib and are matching the value that I get when running the query directly in CloudWatch UI. After passing the metric to HPA, k8s stats shows that the value is not interpreted correctly and indeed auto-scale wasn't scaling up as expected, but multiplying it by 100 solved the issue. I haven't tested this scaler with a query builder, but I find it hard to believe that HPA would somehow work fine with the same value, just because it was fetched with a query builder in oppose to using expression.
If we just want to avoid breaking-change in all cost, then I'll move it into the expression logic, right after it is fetched from the CloudWatch lib (I've tried embedding it directly in the query, but the query just fails to run).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get your point about the scaler behaviour because it's a fact that casting float64 into int64, we are losing information and multiplying it by 100, the information missed is less, but there are use cases where that situation is not a problem.
Apart from that, IMO we should maintain the value exposed by the upstream as it is given. This new feature gives to the users the option of multiplying the value as part of the expression, doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to multiply by 100 even when the scaler received an integer value from CloudWatch (e.g. values: ['12345']
, something like that). My query that I'm testing with is actually getting the max value, so it returns integers, since it's the max number of messages that were in queue over a period of a minute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that you are getting 123,45
when you should receive 12345
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, and that's what I tried to explain. The value that you can see in the HPA is the result of the formula (returnedValue/podCount). You cannot compare it with the value from the upstream directly, you need to apply the same formula.
You are doing:
- CloudWatch -> 40
- KEDA -> 4k (40 * 100)
- HPA -> 80 (4000/50)
This is wrong because the expected values are:
- CloudWatch -> 40
- KEDA -> 40
- HPA -> 800m (40/50) *Remember that in k8s, decimal numbers don't exist, 0.8 is translated into mili scale so 0.8 -> 800m
You shouldn't multiply the upstream value * 100 because the behavior changes.
When I said to multiply inside the expression, I though that your case was related with working with rates between 0 and 1 and you need to do it because the information lost during the casting to int64.
If that'd be your case, multiplying it for 100 in the expression and taking that into account in the threshold could solve the problem, but that's not your case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so what is targetAverageValue
actually referring to? I wonder how can I set it up correctly? Is it 800m that I'm comparing it now?
In my example, it's 200 which seems to refer to HPA -> 80 (4000/50)
.
So if the HPA -> 800m (40/50)
, then should it be 2 (2000m)? I think I tried that before and it worked. I just couldn't understand the numbers and it somehow made more sense with multiplying the metric by 100.
Anyway, I think have everything to complete the PR. Thanks a lot for spending your time to explain this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
targetAverageValue
is the desired value per pod. Imagine that your workload can process one message at once and you want to process all messages faster as possible, in that case, you could set targetAverageValue
to 1, to have 1 pod per message.
In the same case, if your pod can process 2 messages at once (and you want to process them faster as possible with less resources as possible), the targetAverageValue
value will be 2.
if 10 messages are pending , with targetAverageValue
1 you will have 10 pods (10/1=10) and with 2 you will have 5 pods (10/2=5)
No worries about the spent time, I'm happy to help 😄
Let me know if you have any doubt about how it works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks. Now it makes sense. I'm hoping to get the tests & docs changes done in the next few days. Have a great day!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, no rush at all
Have a great day too
I see that the scaler has Go test file (aws_cloudwatch_scaler_test.go), but the contributing testing guide here refers to testing it with TypeScript. Is there any guide on how to test a single scaler Go file? Is JS used for e2e tests while Go for Unit tests? |
You are right, that doc refers to use typescript but for e2e tests. Right now, e2e test are written in TS and unit tests are written in Golang. We are working on unifying this, but for the moment the working flow is that. Related with executing the tests from that file alone, AFAIK in Golang you cannot specify a single file, but you can specify the test names that you want to execute, so you can run them doing:
|
@JorTurFer I've added tests, signed the commit and synced my main branch with keda. After that, I noticed that the PR got closed, not sure why. It looks like I've done it somehow. Can you please re-open it? |
I cannot because the branch is deleted after this force push |
@JorTurFer OK, thanks. I've created PR #2997 to resume this. Can you please verify that commits are signed as expected and review the changes to the tests? I'll PR the docs change later. |
Yes, we can continue the discussion there 👍 |
Added alternative expression field in AWS CloudWatch scaler.
Here's an example ScaledObject config:
In addition, AWS CloudWatch scaler was changed to use int64 instead of float64 for metric type, since HPA doesn't work with float numbers, integers only.
Checklist
Fixes #