Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cilium dual stack upgrades loss of connectivity #4673

Open
camrynl opened this issue Nov 26, 2024 · 0 comments
Open

[BUG] Cilium dual stack upgrades loss of connectivity #4673

camrynl opened this issue Nov 26, 2024 · 0 comments
Labels

Comments

@camrynl
Copy link

camrynl commented Nov 26, 2024

Describe the bug
Cilium dual stack upgrades are facing a bug that causes nodes not ready and loss of connectivity in the cluster.
This issue comes from a bug in cilium upstream where filters and bpf programs that are attached to the interface do not get cleaned up if bpf-filter-priority is changed. Currently, single stack clusters have bpf-filter-priority defaulted to 1, but in the upgrade to dual stack we set it to 2 to allow for our health probe bpf program.
After the upgrade from single stack to dual stack, cilium gets reconciled and the filters from the single stack state are still present. There are duplicate filters at priority 1 and 2, and this breaks connectivity.

To Reproduce
Steps to reproduce the behavior:

  1. Create single stack cilium cluster
  2. Run az aks update <cluster> --ip-families ipv4, ipv6 --load-balancer-managed-outbound-ipv6-count <count>

Expected behavior
The upgrade will take place and complete, so new nodes will have both ipv4 and ipv6 addresses.
Non host network pods may be stuck in Pending or ContainerCreating and nodes will fall into a NotReady state.
The not ready nodes will have duplicate cilium filters at different priorities.
Cluster connectivity is lost.

Environment (please complete the following information):

  • Cilium v1.14, v1.16 on Kubernetes v1.29, v1.30, v1.31

Additional context
This is a known issue in cilium as well. A fix is in progress upstream cilium/cilium#36172

@camrynl camrynl added the bug label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant