-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use of cgroups v2 triggers SELinux denial when loading eBPF progams #881
Comments
Was there a decent way to start on the prior good image and point upgrade specific packages to the exact versions in https://builds.coreos.fedoraproject.org/browser?stream=testing? Might be a way to try to identify the package. |
Probably either
or
|
I know there were some other issues surrounding BPF recently resolved in the kernel too (https://bugzilla.redhat.com/show_bug.cgi?id=1955585) so it could be in the kernel. Just as a sanity check can you check with the very latest Also maybe the following ones just to weed out selinux packages:
You can override packages. You'll need to find the URL to them from koji (selinux-policy, container-selinux) and then you can do something like:
|
I've only had time to confirm 34.20210626.20.0 (latest atm) is still affected. |
@dghubble were you able to do any more investigation? |
Could this be related to https://bugzilla.redhat.com/show_bug.cgi?id=1976858 (Temporarily) fixed in OKD with openshift/okd-machine-os#159 |
He said |
In Clusters were made using the DigitalOcean images. The change between
Observing
Editing just
Inspecting is made more tricky by the builds UI, which requires selecting between 5, 10, or all (which becomes unusable or eventually crashes the browser). So its not clear to me what the diff between those images is. |
This isn't surfaced in the builds browser, but Edit: filed coreos/fedora-coreos-browser#27 to make this easier to see in the future. |
I'm guessing somehow moving to cgroups v2 has activated an SELinux denial related to BPF filters which used to be accepted. @rhatdan Does that sound familiar? Do we need to add a rule to allow |
This is useful to have easily accessible and matches the RHCOS release browser. Came up in: coreos/fedora-coreos-tracker#881 (comment)
@jlebon yep, reverting to cgroups v1 does seem to restore normal Cilium behavior and no denials are seen. |
Using Calico on Fedora CoreOS seems also to be affected.
|
In order to speed the process along, I also added the rules to container-selinux. https://github.com/containers/container-selinux/releases/tag/v2.164.0 A build should be triggered later this afternoon. |
ok we'll fast-track container-selinux to get this fixed: coreos/fedora-coreos-config#1122 Does anyone have handy a simple reproducer that doesn't involve setting up a kubernetes cluster that we could add to our CI that would catch this type of problem in the future? |
I don't have a small repro, but can try to test the image soon |
Thanks @dghubble - please test with |
The SELinux denials disappear with that image. Though Cilium service connectivity remains broken, without followup clues/logs. I don't have other info right now, but there must be additional causes. Reverting to cgroups v1 remains a workaround, which I'm using. |
Alright, I think there are a bunch of changes needed to get Cilium to be compatible with cgroups v2, which I'll need to look into separately when I can. The SELinux fix unblocks this effort, so thank you and I think for now we can say the FCOS side isn't the blocker. |
Thanks @dghubble for the info. Please let us know if there are upstream issues in cilium or calico we can follow for status updates. |
* On Fedora CoreOS, Cilium cross-node service IP load balancing stopped working for a time (first observable as CoreDNS pods located on worker nodes not being able to reach the kubernetes API service 10.3.0.1). This turned out to have two parts: * Fedora CoreOS switched to cgroups v2 by default. In our early testing with cgroups v2, Calico (default) was used. With the cgroups v2 change, SELinux policy denied some eBPF operations. Since fixed in all Fedora CoreOS channels * Cilium requires new mounts to support cgroups v2, which are added here * coreos/fedora-coreos-tracker#292 * coreos/fedora-coreos-tracker#881 * cilium/cilium#16259
* On Fedora CoreOS, Cilium cross-node service IP load balancing stopped working for a time (first observable as CoreDNS pods located on worker nodes not being able to reach the kubernetes API service 10.3.0.1). This turned out to have two parts: * Fedora CoreOS switched to cgroups v2 by default. In our early testing with cgroups v2, Calico (default) was used. With the cgroups v2 change, SELinux policy denied some eBPF operations. Since fixed in all Fedora CoreOS channels * Cilium requires new mounts to support cgroups v2, which are added here * coreos/fedora-coreos-tracker#292 * coreos/fedora-coreos-tracker#881 * cilium/cilium#16259
With cgroups v2 and now the mentioned SELinux fix, I was able to update Cilium's setup to work with cgroups v2. So what remained was indeed external to FCOS. For anyone curious, these were the changes: poseidon/terraform-render-bootstrap#271, poseidon/typhoon#1021. Thanks all! |
Thanks @dghubble for the update and links! |
This will land in the |
The fix for this went into testing stream release |
The fix for this went into stable stream release |
* On Fedora CoreOS, Cilium cross-node service IP load balancing stopped working for a time (first observable as CoreDNS pods located on worker nodes not being able to reach the kubernetes API service 10.3.0.1). This turned out to have two parts: * Fedora CoreOS switched to cgroups v2 by default. In our early testing with cgroups v2, Calico (default) was used. With the cgroups v2 change, SELinux policy denied some eBPF operations. Since fixed in all Fedora CoreOS channels * Cilium requires new mounts to support cgroups v2, which are added here * coreos/fedora-coreos-tracker#292 * coreos/fedora-coreos-tracker#881 * cilium/cilium#16259
* On Fedora CoreOS, Cilium cross-node service IP load balancing stopped working for a time (first observable as CoreDNS pods located on worker nodes not being able to reach the kubernetes API service 10.3.0.1). This turned out to have two parts: * Fedora CoreOS switched to cgroups v2 by default. In our early testing with cgroups v2, Calico (default) was used. With the cgroups v2 change, SELinux policy denied some eBPF operations. Since fixed in all Fedora CoreOS channels * Cilium requires new mounts to support cgroups v2, which are added here * coreos/fedora-coreos-tracker#292 * coreos/fedora-coreos-tracker#881 * cilium/cilium#16259
* On Fedora CoreOS, Cilium cross-node service IP load balancing stopped working for a time (first observable as CoreDNS pods located on worker nodes not being able to reach the kubernetes API service 10.3.0.1). This turned out to have two parts: * Fedora CoreOS switched to cgroups v2 by default. In our early testing with cgroups v2, Calico (default) was used. With the cgroups v2 change, SELinux policy denied some eBPF operations. Since fixed in all Fedora CoreOS channels * Cilium requires new mounts to support cgroups v2, which are added here * coreos/fedora-coreos-tracker#292 * coreos/fedora-coreos-tracker#881 * cilium/cilium#16259
Describe the bug
Using Cilium on Fedora CoreOS seems to have quietly broken between Fedora CoreOS testing 34.20210518.2.1 (ok) and 34.20210529.2.0 (bad). Pod-to-pod traffic works, but pod-to-service traffic seems affected for reasons that aren't clear to me yet.
There are new SElinux denials which don't appear when using the older FCOS image. I wouldn't expect to see denials for a pod that runs as privileged (
spc_t
).Reproduction steps
Steps to reproduce the behavior:
Expected behavior
Debug pods should be able to curl service ClusterIPs.
Actual behavior
Debug pods can not longer curl the same service ClusterIP's.
System details
Additional information
I've isolated the symptom from details like Kubernetes or Cilium version. Re-deploying a past cluster configuration (Kubernetes v1.21.1 and Cilium v1.10.0) is affected now because its using new Fedora CoreOS images.
The text was updated successfully, but these errors were encountered: