allow CSI helpers in the SELinux policy #3779

bcressey · 2024-02-14T23:25:58Z

Issue number:
#3684

Description of changes:
There's a fair amount of refactoring here, but the essential change involves adding a new /opt/csi mount with a special label - csi_exec_t - where privileged containers can write binaries that systemd is allowed to execute. This is intended for the special case of FUSE mounts, where the mounted filesystem needs to survive a restart or upgrade of the CSI driver daemonset.

Ideally these binaries would either be statically linked or else wrapped by a runc invocation to minimize host dependencies, but this isn't enforced. Asking systemd to run a unit requires the break-glass super_t label so it can be assumed that the caller knows the risks and asserts that it is correct.

In terms of policy refactoring, some of the type attribute identifiers have been renamed for clarity, and new ones have been added so that rules can be applied to the set rather than one-by-one to individual types.

At the OS level, the cni_exec_t label is now applied to all of /opt/cni rather than just /opt/cni/bin. This is done for symmetry with the new /opt/csi mount, and is expected to be safe because /opt/cni is unconditionally removed on each boot.

The only part of this change that's specific to the mountpoint S3 CSI driver is the compat symlink from /opt/mountpoint-s3-csi to redirect into /opt/csi so those files receive the correct label. This is similar to the compat symlink added for the secrets store CSI provider.

Testing done:
Deployed the mountpoint S3 CSI driver to my cluster and made these edits to the s3-csi-node daemonset:

# add to s3-plugin container to allow it to interact with systemd
securityContext:
    seLinuxOptions:
        type: super_t

# add to install-mp initContainer to allow it to write to `/opt/csi`
securityContext:
    privileged: true

I also ran through my SELinux-related test suite and verified that it passed.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

This ensures that the `runc` process will receive the correct label if it's started as a systemd unit instead of being invoked by some other service. Signed-off-by: Ben Cressey <bcressey@amazon.com>

Rather than specifying the transitions for each container executable object type, group them into sets and specify the rules just once for each set. Signed-off-by: Ben Cressey <bcressey@amazon.com>

The distinction between "protected" (i.e. "write-restricted") and "restricted" (i.e. "read-restricted") was unclear and the attribute names did not imply that one was related to the other. Clarify this by renaming the attributes and defining the subset relationship between them. Another distinction in the policy is between local files that can be mutated by both confined and unconfined system processes, and local files that can only be mutated by unconfined system processes. The first kind of objects are matched by rules to specific subjects; the second kind are instead defined by the absence of rules that would allow confined subjects to mutate them. Add the "sensitive" attribute to collect these types and to clarify the policy objective: these are files that can't be mutated by containers and also can't be mutated by confined system processes. Signed-off-by: Ben Cressey <bcressey@amazon.com>

CSI drivers that mount filesystems with FUSE need to ensure that the mounting process survives a container restart; otherwise, they cannot be updated without triggering a filesystem failure in any pod which uses the mount. One workaround for this is to have the host run the mounting process on behalf of the container, so that the lifecycles of the driver and the mount are no longer the same. In some ways this is similar to CNI, where containers can provide plugins that the host runs while setting up new network namespaces. It's also different in that CSI mount helpers must run before the container is created, rather than during creation. CSI mount helpers may also need access to credentials or other secrets to perform the mount, so the processes must be treated as privileged rather than unprivileged containers. Signed-off-by: Ben Cressey <bcressey@amazon.com>

Ideally, CSI drivers that want the host to run a helper process on their behalf would arrange to run that process inside a container, to avoid any dependencies on host software beyond the systemd interface. However, this isn't strictly required, and treating the process as a container fulfills the policy objective. Allow systemd to execute such processes directly, without requiring them to be wrapped by a `runc` invocation. Note however that it requires a high level of privilege to interact with systemd via its DBUS API to create a unit and arrange for it to run. There are no plans to relax this restriction. Signed-off-by: Ben Cressey <bcressey@amazon.com>

These directories will be used for overlayfs state, and unexpected modifications could disrupt the system. Signed-off-by: Ben Cressey <bcressey@amazon.com>

This is done for consistency with the new /opt/csi mount, where the helpers may need to store non-executable files as well as binaries. Note that /opt/cni has always been cleaned up on every boot, so this will not remove any files that weren't previously removed. The main change is that files outside of /opt/cni/bin will now be labeled with "cni_exec_t" instead of "local_t". These types are largely equivalent in the current policy, in terms of file-related permissions, so the change should be safe. Signed-off-by: Ben Cressey <bcressey@amazon.com>

This sets up /opt/csi as the designated location for CSI helpers that the host system is permitted to execute. Signed-off-by: Ben Cressey <bcressey@amazon.com>

Ensure that files written by the S3 CSI driver are written to a path with the correct SELinux label to allow systemd to execute them. Signed-off-by: Ben Cressey <bcressey@amazon.com>

bcressey added 9 commits February 14, 2024 17:43

selinux-policy: label runc with runtime_exec_t

a35575a

This ensures that the `runc` process will receive the correct label if it's started as a systemd unit instead of being invoked by some other service. Signed-off-by: Ben Cressey <bcressey@amazon.com>

selinux-policy: simplify container transition rules

270305d

Rather than specifying the transitions for each container executable object type, group them into sets and specify the rules just once for each set. Signed-off-by: Ben Cressey <bcressey@amazon.com>

selinux-policy: label CSI helper state directory

b7b2262

These directories will be used for overlayfs state, and unexpected modifications could disrupt the system. Signed-off-by: Ben Cressey <bcressey@amazon.com>

release: add mount for /opt/csi

4123ecc

This sets up /opt/csi as the designated location for CSI helpers that the host system is permitted to execute. Signed-off-by: Ben Cressey <bcressey@amazon.com>

kubelet: add compat symlink for S3 CSI driver

21434bf

Ensure that files written by the S3 CSI driver are written to a path with the correct SELinux label to allow systemd to execute them. Signed-off-by: Ben Cressey <bcressey@amazon.com>

webern approved these changes Feb 15, 2024

View reviewed changes

arnaldo2792 approved these changes Feb 15, 2024

View reviewed changes

yeazelm approved these changes Feb 21, 2024

View reviewed changes

bcressey merged commit 1f96e7b into bottlerocket-os:develop Feb 21, 2024
50 checks passed

bcressey deleted the s3-csi-selinux branch February 21, 2024 21:30

vyaghras mentioned this pull request Feb 21, 2024

v1.19.2 💘 Tracking Issue #3795

Closed

9 tasks

bcressey mentioned this pull request Sep 12, 2024

Cannot execute binaries stored in an NFS Server running on a Bottlerocket node #4116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow CSI helpers in the SELinux policy #3779

allow CSI helpers in the SELinux policy #3779

bcressey commented Feb 14, 2024

allow CSI helpers in the SELinux policy #3779

allow CSI helpers in the SELinux policy #3779

Conversation

bcressey commented Feb 14, 2024