-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 4230: remove readiness check for cache exclusion #4234
Issue 4230: remove readiness check for cache exclusion #4234
Conversation
Welcome @alexanderConstantinescu! |
Hi @alexanderConstantinescu. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This will need back-ports to release-1.26 and release-1.27 |
Thanks for the PR! |
@feiskyer what's the back-port process for AKS? I guess I need to file a PR to master in https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/legacy-cloud-providers/azure ? And then back-port that to release-1.26 / release-1.27? Or is everything done in this repo? |
Changes LGTM based on my understanding of the code and changes reported in the linked issue |
b9567bc
to
4dbe194
Compare
@feiskyer : could you please have a look at this quite soon? AKS has released several versions of 1.26 with this bug, and it's a pretty important one since it's really easy to trigger and and cause a complete downtime for any ETP:Local services on the cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alexanderConstantinescu, feiskyer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-1.26 |
@feiskyer: once the present PR merges, I will cherry-pick it on top of release-1.26 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-1.27 |
@feiskyer: once the present PR merges, I will cherry-pick it on top of release-1.27 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
New changes are detected. LGTM label has been removed. |
Rebased due to conflicts and added lgtm label back. |
/test pull-cloud-provider-azure-e2e-ccm-capz |
@feiskyer: #4234 failed to apply on top of branch "release-1.26":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@feiskyer: #4234 failed to apply on top of branch "release-1.27":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @feiskyer, are you able to confirm which AKS releases this has/will make it's way into? I see there was one 18 hours ago: https://github.com/Azure/AKS/releases/tag/2023-07-30 Thanks! |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Issue #4230 explains things in greater depth, but here's the TL;DR
kubernetes/kubernetes#109706 stopped syncing load balancers when changes are observed to the readiness state of the Node object. Load balancers are essentially almost only re-synced whenever a Node is: added, deleted, has the exclusion label added. The service controller expectation is that the cloud-providers just accept the list of nodes provided through the
UpdateLoadBalancerHost
call and try to configure the load balancers with the entire set of nodes provided. Given that Azure performs additional filtering of the nodes and excludesNotReady
nodes, there will be situations where the node which just transitioned toReady
isn't added to the load balancer node set, because the service controller doesn't re-sync load balancers due to the readiness state change.This patch removes this additional filtering of the readiness state so that all
NotReady
nodes get added to the load balancer set. The health check probes used by the load balancer will then determine which nodes should be used for traffic load balancing.Please have a look at the referenced enhancement proposal which merged in 1.26: kubernetes/enhancements#3458
/cc @JoelSpeed
Which issue(s) this PR fixes:
Fixes #4230
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: