-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k8s: fix incorrect EndpointSlice API version #27277
Conversation
This commit fixes a bug that k8s watcher incorrectly chooses the discovery/v1beta1 EndpointSlice in environments where discovery/v1 EndpointSlice is available. The agent doesn't wait for the discovery/v1 EndpointSlice to be received and ends up removing live services because of this bug. The issue happens in the following scenario: 1. The agent initializes and runs the informers for Service and EndpointSlice. 2. The reflectors list the resources from kube-apiserver and replace DeltaFIFO. 3. The informers pop items from the DeltaFIFO and call the onAdd handler one by one. 4. If the Service handler is called first, ServiceCache.UpdateService doesn't emit the UpdateService event, because the service is not ready. (No backend) 5. The EndpointSlice handler is called next, ServiceCache.updateEndpoints emits UpdateService event. 6. The agent doesn't wait until the event triggered at step 5 is processed, and calls SyncWithK8sFinished. 7. The agent removes live services because it hasn't heard about it. (It is supposed to hear about those services in the event process triggered at step 5) fixes: cilium#27215 Signed-off-by: Yusuke Suzuki <yusuke-suzuki@cybozu.co.jp>
/test-backport-1.13 Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.25-kernel-4.19/818/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
Good catch! |
/test-1.25-4.19 |
I think we can merge this one, but we also need this change in 1.12, right? |
This PR fixes a bug that k8s watcher incorrectly chooses the discovery/v1beta1 EndpointSlice in environments where discovery/v1 EndpointSlice is available. The agent doesn't wait for the discovery/v1 EndpointSlice to be received on its restart and ends up removing live services because of this bug.
The issue happens in the following scenario:
This issue doesn't persist in v1.14 and later because
K8sAPIGroupEndpointSliceOrEndpoint
is used in those versions. The affected versions are v1.13 and v1.12.Fixes: #27215
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXX
line if the commit addresses a particularGitHub issue.
Fixes: <commit-id>
tag, thenplease add the commit author[s] as reviewer[s] to this issue.