[BUG] Cilium dual stack upgrades loss of connectivity #4673

camrynl · 2024-11-26T21:49:23Z

Describe the bug
Cilium dual stack upgrades are facing a bug that causes nodes not ready and loss of connectivity in the cluster.
This issue comes from a bug in cilium upstream where filters and bpf programs that are attached to the interface do not get cleaned up if bpf-filter-priority is changed. Currently, single stack clusters have bpf-filter-priority defaulted to 1, but in the upgrade to dual stack we set it to 2 to allow for our health probe bpf program.
After the upgrade from single stack to dual stack, cilium gets reconciled and the filters from the single stack state are still present. There are duplicate filters at priority 1 and 2, and this breaks connectivity.

To Reproduce
Steps to reproduce the behavior:

Create single stack cilium cluster
Run az aks update <cluster> --ip-families ipv4, ipv6 --load-balancer-managed-outbound-ipv6-count <count>

Expected behavior
The upgrade will take place and complete, so new nodes will have both ipv4 and ipv6 addresses.
Non host network pods may be stuck in Pending or ContainerCreating and nodes will fall into a NotReady state.
The not ready nodes will have duplicate cilium filters at different priorities.
Cluster connectivity is lost.

Environment (please complete the following information):

Cilium v1.14, v1.16 on Kubernetes v1.29, v1.30, v1.31

Additional context
This is a known issue in cilium as well. A fix is in progress upstream cilium/cilium#36172

The text was updated successfully, but these errors were encountered:

camrynl added the bug label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Cilium dual stack upgrades loss of connectivity #4673

[BUG] Cilium dual stack upgrades loss of connectivity #4673

camrynl commented Nov 26, 2024

[BUG] Cilium dual stack upgrades loss of connectivity #4673

[BUG] Cilium dual stack upgrades loss of connectivity #4673

Comments

camrynl commented Nov 26, 2024