Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the Datadog scaler, including a new optional parameter metricUnavailableValue to fill data when no Datadog metric was returned #2694

Merged
merged 18 commits into from
Mar 14, 2022

Conversation

arapulido
Copy link
Contributor

@arapulido arapulido commented Mar 1, 2022

Several non-breaking fixes for the Datadog scaler. Fixes #2657

The problem described in #2657 is partially caused by the selection of a time window that is too small (15 seconds). Datadog doesn't have that metric yet in many cases, and returns an empty value, that logs as a warning in the HPA.

To avoid this from happening often, the user should select a bigger window, and also added some retries before giving up. I also changed the documentation, to discourage reducing the time window: kedacore/keda-docs#682

To avoid this, we should remove the "age" parameter going forward, but as this is a breaking change, I will create a different PR for 2.7.

Other improvements in this PR:

  • If more than one point is returned, return the last one, instead of the first one, for more recent results
  • More parsing of the initial query, and some potential fixes to a incomplete query: default to avg aggregator, and only add a rollup if one wasn't added by the user
  • Moved to MilliQuantities and floats for query results for more granularity
  • Return a specific error if Datadog didn't return a metric value due to reaching rate limits for the organization
  • IsActive doesn't try to get a metric value, to reduce rate limiting

Checklist

  • Commits are signed with Developer Certificate of Origin (DCO - learn more)
  • Tests have been added
  • A PR is opened to update the documentation on (repo) (if applicable)
  • Changelog has been updated and is aligned with our changelog requirements

Fixes #2657

@arapulido arapulido requested a review from a team as a code owner March 1, 2022 13:42
@arapulido arapulido changed the title Fixes datadog Improve the Datadog scaler with non-breaking changes Mar 1, 2022
Copy link
Member

@JorTurFer JorTurFer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution ❤️ ❤️
I have left some comments

pkg/scalers/datadog_scaler.go Outdated Show resolved Hide resolved
pkg/scalers/datadog_scaler.go Outdated Show resolved Hide resolved
pkg/scalers/datadog_scaler.go Outdated Show resolved Hide resolved
pkg/scalers/datadog_scaler.go Outdated Show resolved Hide resolved
pkg/scalers/datadog_scaler.go Outdated Show resolved Hide resolved
@arapulido
Copy link
Contributor Author

@tomkerkhove I implemented here the unavailableMetricValue optional parameter.

@JorTurFer I addressed your comments

Once this is approved, I will create a PR with the updated documentation

The missing thing is IsActive. I am not sure what needs implementing here. Anything that requires contacting the API will accelerate rate limiting, and it is not reliable. Any suggestions?

@JorTurFer
Copy link
Member

@tomkerkhove I implemented here the unavailableMetricValue optional parameter.

@JorTurFer I addressed your comments

Once this is approved, I will create a PR with the updated documentation

The missing thing is IsActive. I am not sure what needs implementing here. Anything that requires contacting the API will accelerate rate limiting, and it is not reliable. Any suggestions?

The point here is that KEDA uses that method for getting if the scaler is active or not and that triggers things like scale from/to zero. If we return true always, KEDA thinks that the scaler is always active and will not scale to zero

@arapulido
Copy link
Contributor Author

The point here is that KEDA uses that method for getting if the scaler is active or not and that triggers things like scale from/to zero. If we return true always, KEDA thinks that the scaler is always active and will not scale to zero

My question is: what do we mean by "a scaler is active"?

@JorTurFer
Copy link
Member

I'm AFK but this is part of the info (from docs)

KEDA polls ScaledObject object according to the pollingInterval configured in the ScaledObject; it checks the last time it was polled, it checks if the number of replicas is greater than 0, and if the scaler itself is active. So if the scaler returns false for IsActive, and if current number of replicas is greater than 0, and there is no configured minimum pods, then KEDA scales down to 0.

@JorTurFer
Copy link
Member

JorTurFer commented Mar 7, 2022

So it could be like:
IsActive=true: KEDA should scale from zero and let the HPA works
IsActive=false: KEDA should scale to zero (HPA doesn't work)

Of course, min replicas should be 0 for scaling to/from zero

@zroubalik
Copy link
Member

The point here is that KEDA uses that method for getting if the scaler is active or not and that triggers things like scale from/to zero. If we return true always, KEDA thinks that the scaler is always active and will not scale to zero

My question is: what do we mean by "a scaler is active"?

As I mentioned above: #2694 (comment)

Active is when the target metric is above threshold. In this case a the value returned by query is above queryValue.

@arapulido
Copy link
Contributor Author

@JorTurFer @zroubalik Thanks both! I have finally understood this :) I have added a commit for this, and I have also opened a PR to the Datadog scaler docs to improve the explanation related to polling intervals configuration, as it is fairly important to understand it, as it is directly affected by rate limits: kedacore/keda-docs#702

@JorTurFer
Copy link
Member

JorTurFer commented Mar 8, 2022

/run-e2e datadog*
Update: You can check the progres here

Copy link
Member

@JorTurFer JorTurFer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thanks for these improvements!
(I'll wait to merge till e2e passes and docs PR is ready to merge too)

@JorTurFer
Copy link
Member

e2e test has failed and also the workflow has failed because a needed rebase (rebasing main is not important at this moment if you don't want to do it, e2e test runs and only fails after it trying to clean up all namespaces).
This is the link to datadog e2e logs

@arapulido
Copy link
Contributor Author

e2e test has failed and also the workflow has failed because a needed rebase (rebasing main is not important at this moment if you don't want to do it, e2e test runs and only fails after it trying to clean up all namespaces). This is the link to datadog e2e logs

@JorTurFer Added a couple of changes to make the tests more resilient. Can we rerun, please?

@JorTurFer
Copy link
Member

JorTurFer commented Mar 10, 2022

/run-e2e datadog*
Update: You can check the progres here

@JorTurFer
Copy link
Member

They have worked (again, ignore the error, it's because a rebase is needed for fixing it)
Have you got any other thing to add in this PR?
Could you update the PR with the issue please? Then I can merge this

As always, thanks a ton for you nice work ❤️ ❤️ ❤️ ❤️

@arapulido
Copy link
Contributor Author

@JorTurFer Thanks! I don't have plans to add anything for now, but I need to know whether this is going to be targeted for 2.6 or 2.7, as I need to add changes to the documentation related to the new metricUnavailableValue parameter

@JorTurFer
Copy link
Member

This change will be shipped as part of 2.7

@arapulido
Copy link
Contributor Author

This change will be shipped as part of 2.7

Cool! The corresponding docs PR is here: kedacore/keda-docs#710

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zroubalik
Copy link
Member

@arapulido could you please update the Changelog? then we can merge this one. thanks!

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rebase, there's a conflict in Changelog. Thanks!

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
… limiting

Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
…ng a null metric, without blocking

Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
…eturn that value as metric if the metric is not available for the time period selected

Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@zroubalik
Copy link
Member

zroubalik commented Mar 14, 2022

/run-e2e datadog*
Update: You can check the progres here

@zroubalik zroubalik merged commit 877d0f6 into kedacore:main Mar 14, 2022
@zroubalik zroubalik changed the title Improve the Datadog scaler with non-breaking changes Improve the Datadog scaler, including a new optional parameter metricUnavailableValue to fill data when no Datadog metric was returned Mar 14, 2022
@arapulido arapulido deleted the fixes_datadog branch March 17, 2022 16:55
RamCohen pushed a commit to RamCohen/keda that referenced this pull request Mar 23, 2022
…vailableValue` to fill data when no Datadog metric was returned (kedacore#2694)

Signed-off-by: Ara Pulido <ara.pulido@datadoghq.com>
Signed-off-by: Ram Cohen <ram.cohen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Datadog scaler is not able to find matching metrics
3 participants