Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use of cgroups v2 triggers SELinux denial when loading eBPF progams #881

Closed
dghubble opened this issue Jun 24, 2021 · 27 comments · Fixed by coreos/fedora-coreos-config#1122

Comments

@dghubble
Copy link
Member

dghubble commented Jun 24, 2021

Describe the bug

Using Cilium on Fedora CoreOS seems to have quietly broken between Fedora CoreOS testing 34.20210518.2.1 (ok) and 34.20210529.2.0 (bad). Pod-to-pod traffic works, but pod-to-service traffic seems affected for reasons that aren't clear to me yet.

There are new SElinux denials which don't appear when using the older FCOS image. I wouldn't expect to see denials for a pod that runs as privileged (spc_t).

Jun 24 17:39:26 kernel: audit: type=1400 audit(1624556366.467:1891): avc:  denied  { prog_run } for  pid=5227 comm="bpftool" scontext=system_u:system_r:spc_t:s0 tcontext=system_u:system_r:container_runtime_t:s0 tclass=bpf permissive=0
Jun 24 17:39:26 audit[5231]: AVC avc:  denied  { prog_run } for  pid=5231 comm="bpftool" scontext=system_u:system_r:spc_t:s0 tcontext=system_u:system_r:container_runtime_t:s0 tclass=bpf permissive=0

Reproduction steps

Steps to reproduce the behavior:

  1. Deploy a Kubernetes cluster with Cilium at a specific FCOS image
  2. CoreDNS will block waiting to contact Kubernetes (10.3.0.1), the first use of a Kubernetes Service ClusterIP

Expected behavior

Debug pods should be able to curl service ClusterIPs.

Actual behavior

Debug pods can not longer curl the same service ClusterIP's.

System details

Additional information

I've isolated the symptom from details like Kubernetes or Cilium version. Re-deploying a past cluster configuration (Kubernetes v1.21.1 and Cilium v1.10.0) is affected now because its using new Fedora CoreOS images.

@dghubble
Copy link
Member Author

Was there a decent way to start on the prior good image and point upgrade specific packages to the exact versions in https://builds.coreos.fedoraproject.org/browser?stream=testing? Might be a way to try to identify the package.

@dustymabe
Copy link
Member

$ rpm-ostree --repo=./ db diff 40820b1c281b530dda190a91d7a019bf0d51b254bddc252061da69d70afb90c4 d7ad41d882de1a9b5652d29ea69b0aedb83e5dec66cb4ce379ff651af14536ee 
ostree diff commit from: 40820b1c281b530dda190a91d7a019bf0d51b254bddc252061da69d70afb90c4
ostree diff commit to:   d7ad41d882de1a9b5652d29ea69b0aedb83e5dec66cb4ce379ff651af14536ee
Upgraded:
  btrfs-progs 5.11.1-1.fc34 -> 5.12.1-1.fc34
  chrony 4.0-3.fc34 -> 4.1-1.fc34
  container-selinux 2:2.160.0-2.fc34 -> 2:2.162.1-3.fc34
  coreutils 8.32-24.fc34 -> 8.32-26.fc34
  coreutils-common 8.32-24.fc34 -> 8.32-26.fc34
  cups-libs 1:2.3.3op2-5.fc34 -> 1:2.3.3op2-7.fc34
  curl 7.76.1-2.fc34 -> 7.76.1-3.fc34
  fuse-common 3.10.2-1.fc34 -> 3.10.3-1.fc34
  fuse3 3.10.2-1.fc34 -> 3.10.3-1.fc34
  fuse3-libs 3.10.2-1.fc34 -> 3.10.3-1.fc34
  ignition 2.10.1-1.fc34 -> 2.10.1-3.fc34
  iptables-services 1.8.7-7.fc34 -> 1.8.7-8.fc34
  kernel 5.11.20-300.fc34 -> 5.12.7-300.fc34
  kernel-core 5.11.20-300.fc34 -> 5.12.7-300.fc34
  kernel-modules 5.11.20-300.fc34 -> 5.12.7-300.fc34
  kmod 28-2.fc34 -> 29-2.fc34
  kmod-libs 28-2.fc34 -> 29-2.fc34
  krb5-libs 1.19.1-3.fc34 -> 1.19.1-8.fc34
  libcurl 7.76.1-2.fc34 -> 7.76.1-3.fc34
  libedit 3.1-36.20210419cvs.fc34 -> 3.1-37.20210522cvs.fc34
  libgusb 0.3.6-1.fc34 -> 0.3.7-1.fc34
  libibverbs 34.0-3.fc34 -> 35.0-1.fc34
  libidn2 2.3.0-5.fc34 -> 2.3.1-1.fc34
  libipa_hbac 2.4.2-3.fc34 -> 2.5.0-2.fc34
  libldb 2.3.0-1.fc34 -> 2.3.0-2.fc34
  libreport-filesystem 2.14.0-17.fc34 -> 2.15.1-1.fc34
  libsss_certmap 2.4.2-3.fc34 -> 2.5.0-2.fc34
  libsss_idmap 2.4.2-3.fc34 -> 2.5.0-2.fc34
  libsss_nss_idmap 2.4.2-3.fc34 -> 2.5.0-2.fc34
  libsss_sudo 2.4.2-3.fc34 -> 2.5.0-2.fc34
  libtirpc 1.3.1-1.rc2.fc34 -> 1.3.2-0.fc34
  libuser 0.63-1.fc34 -> 0.63-3.fc34
  libxml2 2.9.10-12.fc34 -> 2.9.12-2.fc34
  libxmlb 0.3.0-1.fc34 -> 0.3.2-1.fc34
  mpfr 4.1.0-6.fc34 -> 4.1.0-7.fc34
  rpcbind 1.2.5-5.rc1.fc34.4 -> 1.2.6-0.fc34
  rpm-ostree 2021.4-3.fc34 -> 2021.5-1.fc34
  rpm-ostree-libs 2021.4-3.fc34 -> 2021.5-1.fc34
  selinux-policy 34.7-1.fc34 -> 34.8-1.fc34
  selinux-policy-targeted 34.7-1.fc34 -> 34.8-1.fc34
  sssd-ad 2.4.2-3.fc34 -> 2.5.0-2.fc34
  sssd-client 2.4.2-3.fc34 -> 2.5.0-2.fc34
  sssd-common 2.4.2-3.fc34 -> 2.5.0-2.fc34
  sssd-common-pac 2.4.2-3.fc34 -> 2.5.0-2.fc34
  sssd-ipa 2.4.2-3.fc34 -> 2.5.0-2.fc34
  sssd-krb5 2.4.2-3.fc34 -> 2.5.0-2.fc34
  sssd-krb5-common 2.4.2-3.fc34 -> 2.5.0-2.fc34
  sssd-ldap 2.4.2-3.fc34 -> 2.5.0-2.fc34
  tpm2-tss 3.0.3-2.fc34 -> 3.1.0-1.fc34
  vim-minimal 2:8.2.2846-1.fc34 -> 2:8.2.2879-1.fc34
  zchunk-libs 1.1.11-1.fc34 -> 1.1.14-1.fc34
  zincati 0.0.20-1.fc34 -> 0.0.21-1.fc34

Probably either container-selinux:

233e620 <v2.162.1> Fix labeling in users homedir
da28288 <v2.162.0> Allow init_t domain to manager kubernetes_file_t
e1092cd <v2.161.1> Add label for kublet to run as a container_runtime_t
9b1ebb6 <v2.161.0> Add support for lockdown:confidentiality to container_runtime
266203e <v2.160.2> Bump to v2.160.2
bbe4f19 <v2.160.1> allow (most) binaries to live in sbin and bin directories
450f56e Add SECURITY.md                                                  

or selinux-policy:

84d400b o Allow local_login_t nnp_transition to login_userdomain
d1c2be7 o Allow asterisk watch localization symlinks
78fa9be o Allow NetworkManager_t to watch /etc
af7e4b6 o Label /var/lib/kdump with kdump_var_lib_t
31a9e4a o Allow amanda get attributes of cgroup filesystems
fd22bbb o Allow sysadm_t nnp_domtrans to systemd_tmpfiles_t
634a82c o Allow install_t nnp_domtrans to setfiles_mac_t
f89885f o Allow fcoemon create sysfs files
c05289b o Allow tgtd read and write infiniband devices

@dustymabe
Copy link
Member

I know there were some other issues surrounding BPF recently resolved in the kernel too (https://bugzilla.redhat.com/show_bug.cgi?id=1955585) so it could be in the kernel. Just as a sanity check can you check with the very latest testing-devel?

Also maybe the following ones just to weed out selinux packages:

  • 34.20210520.20.0 and 34.20210525.20.0

    • container-selinux 2:2.160.0-2.fc34.noarch → 2:2.162.1-3.fc34.noarch
  • 34.20210527.20.0 and 34.20210528.20.0

    • selinux-policy 34.7-1.fc34.noarch → 34.8-1.fc34.noarch

Was there a decent way to start on the prior good image and point upgrade specific packages to the exact versions in https://builds.coreos.fedoraproject.org/browser?stream=testing? Might be a way to try to identify the package.

You can override packages. You'll need to find the URL to them from koji (selinux-policy, container-selinux) and then you can do something like:

sudo rpm-ostree override replace https://kojipkgs.fedoraproject.org//packages/selinux-policy/34.12/1.fc34/noarch/selinux-policy-34.12-1.fc34.noarch.rpm https://kojipkgs.fedoraproject.org//packages/selinux-policy/34.12/1.fc34/noarch/selinux-policy-targeted-34.12-1.fc34.noarch.rpm

@dghubble
Copy link
Member Author

I've only had time to confirm 34.20210626.20.0 (latest atm) is still affected.

@dustymabe
Copy link
Member

@dghubble were you able to do any more investigation?

@LorbusChris
Copy link
Contributor

LorbusChris commented Jun 29, 2021

Could this be related to search . having been added to resolv.conf recently? This did (again) brake reverse DNS resolution for services in OKD.

https://bugzilla.redhat.com/show_bug.cgi?id=1976858
https://bugzilla.redhat.com/show_bug.cgi?id=1874419

(Temporarily) fixed in OKD with openshift/okd-machine-os#159

@dustymabe
Copy link
Member

He said 34.20210518.2.1 was good. So it wasn't the switch to f34, but something after. See package diff (systemd wasn't even upgraded).

@dghubble
Copy link
Member Author

dghubble commented Jul 4, 2021

In testing-devel, 34.20210525.20.0 seems ok. Meanwhile, 34.20210528.20.0, 34.20210527.20.0, and 34.20210525.20.1 walking backward, have the issue.

Clusters were made using the DigitalOcean images. The change between 34.20210525.20.0 and 34.20210525.20.1 was indirectly observable on AWS clusters as well (34.20210525.20.0 ami-01f2a630e89eccfc5 -> 34.20210526.20.0 (ami-0d1aa052cb06038c7) - for some reason 34.20210525.20.1 (ami-0c78e764a580e6ef7) in between is missing on AWS.

  • 34.20210525.20.0 ok
  • 34.20210525.20.1 broken, AMI ami-0c78e764a580e6ef7 missing also
  • 34.20210526.20.0 broken
  • ...

Observing bpftool denials in system logs is 1:1 with the connectivity issues being present.

[  505.181371] audit: type=1400 audit(1625374777.822:1795): avc:  denied  { prog_run } for  pid=8128 comm="bpftool" scontext=system_u:system_r:spc_t:s0 tcontext=system_u:system_r:container_runtime_t:s0 tclass=bpf permissive=0

Editing just selinux-policy on a 34.20210525.20.0 node doesn't succeed and the diff command doesn't say much.

$ sudo rpm-ostree override replace https://kojipkgs.fedoraproject.org//packages/selinux-policy/34.8/1.fc34/noarch/selinux-policy-34.8-1.fc34.noarch.rpm https://kojipkgs.fedoraproject.org//packages/selinux-policy/34.8/1.fc34/noarch/selinux-policy-targeted-34.8-1.fc34.noarch.rpm
...
Checkout selinux-policy-targeted-34.8-1.fc34.noarch: Hardlinking a5/8b8b3f84fa2d588c41ae5fa6615dfe387b262737198f5b2a9c5f24b0b23045.file to commit_num: Operation not permitted

$ rpm-ostree --repo=./ db diff 52bae8872c1ab0b5b0af329f7f41598050e546fb17139838fadcd24d3743a148 ca7a7214da5198f2a057199395223c06a67b728a4ef50a47b9a50c98bb004701
error: opening repo: opendir(objects): No such file or directory

Inspecting is made more tricky by the builds UI, which requires selecting between 5, 10, or all (which becomes unusable or eventually crashes the browser). So its not clear to me what the diff between those images is.

@jlebon
Copy link
Member

jlebon commented Jul 5, 2021

This isn't surfaced in the builds browser, but 34.20210525.20.1 is when we moved to cgroups v2 by default: coreos/fedora-coreos-config#1033. So it's likely related to that. Can you try re-adding the systemd.unified_cgroup_hierarchy=0 karg to check this?

Edit: filed coreos/fedora-coreos-browser#27 to make this easier to see in the future.

@jlebon
Copy link
Member

jlebon commented Jul 5, 2021

I'm guessing somehow moving to cgroups v2 has activated an SELinux denial related to BPF filters which used to be accepted. @rhatdan Does that sound familiar? Do we need to add a rule to allow spc_t to load eBPF programs?

jlebon added a commit to jlebon/fedora-coreos-browser that referenced this issue Jul 5, 2021
This is useful to have easily accessible and matches the RHCOS release
browser. Came up in:

coreos/fedora-coreos-tracker#881 (comment)
@dghubble
Copy link
Member Author

dghubble commented Jul 6, 2021

@jlebon yep, reverting to cgroups v1 does seem to restore normal Cilium behavior and no denials are seen.

@dustymabe dustymabe self-assigned this Jul 8, 2021
@dustymabe dustymabe changed the title Cilium routing issues starting in 34.20210529.2.0 use of cgroups v2 triggers SELinux denial when loading eBPF progams Jul 8, 2021
@dustymabe
Copy link
Member

Nice investigation @dghubble @jlebon!

@jlebon
Copy link
Member

jlebon commented Jul 8, 2021

@wkruse
Copy link

wkruse commented Jul 16, 2021

Using Calico on Fedora CoreOS seems also to be affected.

audit[7161]: AVC avc:  denied  { prog_run } for  pid=7161 comm="bpftool" scontext=system_u:system_r:spc_t:s0 tcontext=system_u:system_r:container_runtime_t:s0 tclass=bpf permissive=0
audit[7161]: SYSCALL arch=c000003e syscall=321 success=no exit=-13 a0=d a1=7ffc594ac340 a2=70 a3=7ffc594ac3e4 items=0 ppid=6991 pid=7161 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="bpftool" exe="/usr/bin/bpftool" subj=system_u:system_r:spc_t:s0 key=(null)
audit[7162]: AVC avc:  denied  { map_read map_write } for  pid=7162 comm="bpftool" scontext=system_u:system_r:spc_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=bpf permissive=0
audit[7162]: SYSCALL arch=c000003e syscall=321 success=no exit=-13 a0=e a1=7ffc5223f460 a2=70 a3=6f items=0 ppid=6991 pid=7162 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="bpftool" exe="/usr/bin/bpftool" subj=system_u:system_r:spc_t:s0 key=(null)

@rhatdan
Copy link

rhatdan commented Jul 16, 2021

@rhatdan
Copy link

rhatdan commented Jul 16, 2021

In order to speed the process along, I also added the rules to container-selinux.

https://github.com/containers/container-selinux/releases/tag/v2.164.0

A build should be triggered later this afternoon.
@jnovy we will need this built for RHEL8.5.

@dustymabe
Copy link
Member

ok we'll fast-track container-selinux to get this fixed: coreos/fedora-coreos-config#1122

Does anyone have handy a simple reproducer that doesn't involve setting up a kubernetes cluster that we could add to our CI that would catch this type of problem in the future?

@dustymabe dustymabe added the status/pending-testing-release Fixed upstream. Waiting on a testing release. label Jul 19, 2021
dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Jul 19, 2021
@dghubble
Copy link
Member Author

I don't have a small repro, but can try to test the image soon

@dustymabe
Copy link
Member

I don't have a small repro, but can try to test the image soon

Thanks @dghubble - please test with 34.20210719.20.0.

@dghubble
Copy link
Member Author

dghubble commented Jul 20, 2021

The SELinux denials disappear with that image. Though Cilium service connectivity remains broken, without followup clues/logs. I don't have other info right now, but there must be additional causes.

Reverting to cgroups v1 remains a workaround, which I'm using.

@dghubble
Copy link
Member Author

Alright, I think there are a bunch of changes needed to get Cilium to be compatible with cgroups v2, which I'll need to look into separately when I can. The SELinux fix unblocks this effort, so thank you and I think for now we can say the FCOS side isn't the blocker.

@dustymabe
Copy link
Member

Thanks @dghubble for the info. Please let us know if there are upstream issues in cilium or calico we can follow for status updates.

dghubble added a commit to poseidon/typhoon that referenced this issue Jul 24, 2021
* On Fedora CoreOS, Cilium cross-node service IP load balancing
stopped working for a time (first observable as CoreDNS pods
located on worker nodes not being able to reach the kubernetes
API service 10.3.0.1). This turned out to have two parts:
* Fedora CoreOS switched to cgroups v2 by default. In our early
testing with cgroups v2, Calico (default) was used. With the
cgroups v2 change, SELinux policy denied some eBPF operations.
Since fixed in all Fedora CoreOS channels
* Cilium requires new mounts to support cgroups v2, which are
added here

* coreos/fedora-coreos-tracker#292
* coreos/fedora-coreos-tracker#881
* cilium/cilium#16259
dghubble added a commit to poseidon/typhoon that referenced this issue Jul 24, 2021
* On Fedora CoreOS, Cilium cross-node service IP load balancing
stopped working for a time (first observable as CoreDNS pods
located on worker nodes not being able to reach the kubernetes
API service 10.3.0.1). This turned out to have two parts:
* Fedora CoreOS switched to cgroups v2 by default. In our early
testing with cgroups v2, Calico (default) was used. With the
cgroups v2 change, SELinux policy denied some eBPF operations.
Since fixed in all Fedora CoreOS channels
* Cilium requires new mounts to support cgroups v2, which are
added here

* coreos/fedora-coreos-tracker#292
* coreos/fedora-coreos-tracker#881
* cilium/cilium#16259
@dghubble
Copy link
Member Author

With cgroups v2 and now the mentioned SELinux fix, I was able to update Cilium's setup to work with cgroups v2. So what remained was indeed external to FCOS. For anyone curious, these were the changes: poseidon/terraform-render-bootstrap#271, poseidon/typhoon#1021.

Thanks all!

@dustymabe
Copy link
Member

Thanks @dghubble for the update and links!

@dustymabe
Copy link
Member

This will land in the testing/next releases that should happen in the next few days.

@dustymabe
Copy link
Member

The fix for this went into testing stream release 34.20210725.2.0. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. labels Aug 9, 2021
@dustymabe
Copy link
Member

The fix for this went into stable stream release 34.20210725.3.0.

@dustymabe dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label Aug 25, 2021
foltik pushed a commit to foltik/typhoon that referenced this issue Sep 1, 2021
* On Fedora CoreOS, Cilium cross-node service IP load balancing
stopped working for a time (first observable as CoreDNS pods
located on worker nodes not being able to reach the kubernetes
API service 10.3.0.1). This turned out to have two parts:
* Fedora CoreOS switched to cgroups v2 by default. In our early
testing with cgroups v2, Calico (default) was used. With the
cgroups v2 change, SELinux policy denied some eBPF operations.
Since fixed in all Fedora CoreOS channels
* Cilium requires new mounts to support cgroups v2, which are
added here

* coreos/fedora-coreos-tracker#292
* coreos/fedora-coreos-tracker#881
* cilium/cilium#16259
elemental-lf pushed a commit to elemental-lf/typhoon that referenced this issue Dec 11, 2021
* On Fedora CoreOS, Cilium cross-node service IP load balancing
stopped working for a time (first observable as CoreDNS pods
located on worker nodes not being able to reach the kubernetes
API service 10.3.0.1). This turned out to have two parts:
* Fedora CoreOS switched to cgroups v2 by default. In our early
testing with cgroups v2, Calico (default) was used. With the
cgroups v2 change, SELinux policy denied some eBPF operations.
Since fixed in all Fedora CoreOS channels
* Cilium requires new mounts to support cgroups v2, which are
added here

* coreos/fedora-coreos-tracker#292
* coreos/fedora-coreos-tracker#881
* cilium/cilium#16259
Snaipe pushed a commit to aristanetworks/monsoon that referenced this issue Apr 13, 2023
* On Fedora CoreOS, Cilium cross-node service IP load balancing
stopped working for a time (first observable as CoreDNS pods
located on worker nodes not being able to reach the kubernetes
API service 10.3.0.1). This turned out to have two parts:
* Fedora CoreOS switched to cgroups v2 by default. In our early
testing with cgroups v2, Calico (default) was used. With the
cgroups v2 change, SELinux policy denied some eBPF operations.
Since fixed in all Fedora CoreOS channels
* Cilium requires new mounts to support cgroups v2, which are
added here

* coreos/fedora-coreos-tracker#292
* coreos/fedora-coreos-tracker#881
* cilium/cilium#16259
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants