Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay force refresh by DefaultInterval when OCI GetNodePool call retu… #6584

Merged
merged 1 commit into from
Mar 6, 2024

Conversation

vbhargav875
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

In the OCI Nodepools Implementation of ClusterAutoscaler, we do not set any delay to the calls that ClusterAutoscaler makes to the OKE GetNodePool API when the API returns a 404 error (which is possible when the Nodepool is deleted or an incorrect Nodepool ocid is passed) and hence we continuously trigger forceRefresh (when CA triggers Refresh) and therefore CA will indefinitely make GetNodePool calls to CP.
Introducing a max retry count and setting a delay to the next force refresh when the retries is exhausted will help prevent this.

Does this PR introduce a user-facing change?

No

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

None

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. area/cluster-autoscaler labels Mar 4, 2024
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 4, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @vbhargav875!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 4, 2024
@vbhargav875 vbhargav875 force-pushed the delay_retries branch 2 times, most recently from 07e885c to 23ab68f Compare March 5, 2024 07:26
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 5, 2024
if err != nil {
if httpStatusCode == 404 {
Copy link
Contributor

@trungng92 trungng92 Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the OCI SDK returns a 404, is that considered an error (i.e. is the err != nil check going to be true)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in case of 404, err is not nil.

klog.Infof("rebuilding cache")
var resp oke.GetNodePoolResponse
var statusCode int
for id := range staticNodePools {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One consideration. We're looping over all of the node pools, but we exit/error out on the first one that errors out, which means you could have 20 node pools, and if the first one is deleted/errors out, then we won't reconcile any of them.

This is how the previous behavior worked as well, so we don't need to address it now, but we should note it as a bug (perhaps as a code comment and/or an internal ticket).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@trungng92
Copy link
Contributor

Changes look good to me

Copy link
Contributor

@jlamillan jlamillan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 6, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlamillan, vbhargav875

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2024
@k8s-ci-robot k8s-ci-robot merged commit 06fa717 into kubernetes:master Mar 6, 2024
6 checks passed
@gjtempleton gjtempleton added the area/provider/oci Issues or PRs related to oci provider label Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler area/provider/oci Issues or PRs related to oci provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants