Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico eBPF fails to init on Talos Linux #7892

Open
monoxane opened this issue Jul 31, 2023 · 8 comments · May be fixed by tigera/operator#3235
Open

Calico eBPF fails to init on Talos Linux #7892

monoxane opened this issue Jul 31, 2023 · 8 comments · May be fixed by tigera/operator#3235
Assignees
Labels

Comments

@monoxane
Copy link

monoxane commented Jul 31, 2023

I'm provisioning a cluster using the Talos Linux + Kube distro and am finding that the calico-node mount-ebpffs container fails to mount the cgroup2 file system as called from calico/node/pkg/nodeinit/calico-init_linux.go.

W0731 07:12:21.604403       1 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
2023-07-31 07:12:21.607 [INFO][1] init/startup.go 432: Early log level set to info
2023-07-31 07:12:21.607 [INFO][1] init/calico-init_linux.go 57: Checking if BPF filesystem is mounted.
2023-07-31 07:12:21.607 [INFO][1] init/calico-init_linux.go 69: BPF filesystem is mounted.
2023-07-31 07:12:21.607 [INFO][1] init/calico-init_linux.go 92: Checking if cgroup2 filesystem is mounted.
2023-07-31 07:12:21.609 [INFO][1] init/calico-init_linux.go 120: Cgroup2 filesystem is not mounted. Trying to mount it...
2023-07-31 07:12:21.609 [INFO][1] init/calico-init_linux.go 126: Mount point /run/calico/cgroup is ready for mounting root cgroup2 fs.
2023-07-31 07:12:21.613 [ERROR][1] init/calico-init_linux.go 48: Failed to mount cgroup2 filesystem. error=failed to mount cgroup2 filesystem: exit status 1

Expected Behavior

Calico with eBPF dataplane works on Talos

Current Behavior

Calico with eBPF dataplane does not work on Talos due to an FS mount failure in the eBPF mount init container

Possible Solution

I am currently under the impression this is because bpfdefs.CgroupV2Path is /run/calico/cgroup which seems to be a non-writable directory under Talos (the vast majority of rootfs is readonly with the exception of specific files and the entirety of /var), but mounting an emptyDir at that location in both the init and the main pod does not provide any improvement.

I am unable to change the bpfdefs const and rebuild calico entirely due to environmental constraints (no Docker installs as required by the makefiles) but if needed I can go through the processes to get a environments set up in my work gcloud tenancy and build it that way. I am also happy to run any dev builds produced by the calico team.

Steps to Reproduce (for bugs)

Install a Talos cluster
Install Calico with the operator and this Installation CR

 apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
    name: default
spec:
    calicoNetwork:
        bgp: Enabled
        linuxDataplane: BPF
        ipPools:
        -   blockSize: 26
            cidr: 10.244.0.0/16
            disableBGPExport: false
            encapsulation: None
            natOutgoing: Enabled

Context

We need to use the eBPF dataplane for some shenanigans that doesn't work with the iptables one (mostly Source IP related), so can't just use the non-eBPF mode. Calico is the only competent CNI with BGP + eBPF support that meets our needs.

Cilium, while not helpful to us due to BGP issues, is supported on Talos and their eBPF dataplane works when installed with the following Talos guide, something in there might be helpful in working this out. https://www.talos.dev/v1.4/kubernetes-guides/network/deploying-cilium/#without-kube-proxy

Your Environment

Calico:
quay.io/tigera/operator:v1.30.4
docker.io/calico/node:v3.26.1

Other:
Talos (v1.4.6) kernel 6.1.35-talos
Containerd 1.6.21
Kubelet v1.27.3

@monoxane monoxane changed the title Calico eBPF on Talos Calico eBPF fails to init on Talos Linux Jul 31, 2023
@frezbo
Copy link
Contributor

frezbo commented Jul 31, 2023

Talos already mounts cgroupv2 and bpffs filesystems, it seems the calico check failed to detect that and trying to mount again:

Checking if cgroup2 filesystem is mounted.

Check if calico supports skipping those checks, in case of cilium there's options in the helm chart to skip those two checks, also less init containers

@Cubea01
Copy link

Cubea01 commented Aug 10, 2023

I'm running into the same issue, has any progress been made?

@tomastigera tomastigera added area/bpf eBPF Dataplane issues kind/bug labels Aug 14, 2023
@tomastigera
Copy link
Contributor

tomastigera commented Aug 14, 2023

It first tries to find in /nodeproc/1/mountinfo whether the fs is already mounted. So it does not seem like it is mounted yet. Interestingly, it manages to create /run/calico/cgroup. Maybe os.MkdirAll does not fail if the dir already exists. We could make the location configurable instead of hardwired 🤦

@Cubea01
Copy link

Cubea01 commented Sep 13, 2023

Are there any suggested workarounds for this little issue?

@ErikLundJensen
Copy link

We would also like a solution for this.

@agmimidi
Copy link

Any updates on the issue? We would like to switch to eBPF but this is currently a blocker

@tomastigera
Copy link
Contributor

I have merged the PR that allows you to set cgroup path for calico node. Atm if you are using operator, you would need to annotate calico-node ds so that operator does not change it. We will try to figure out acceptable operator configuration asap.

@tomastigera
Copy link
Contributor

tomastigera commented Feb 15, 2024

#8085 managed to get into 3.27.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants