{Project} implements security related options in the container runtime. This document describes the methods users have for specifying the security scope and context when running {Project} containers.
Note
It is extremely important to recognize that granting users Linux
capabilities with the capability
command group is usually
identical to granting those users root level access on the host
system. Most if not all capabilities will allow users to "break
out" of the container and become root on the host. This feature is
targeted toward special use cases (like cloud-native architectures)
where an admin/developer might want to limit the attack surface
within a container that normally runs as root. This is not a good
option in multi-tenant HPC environments where an admin wants to grant
a user special privileges within a container. For that and similar
use cases, the :ref:`fakeroot feature <fakeroot>` is a better option.
{Project} provides full support for granting and revoking Linux
capabilities on a user or group basis. For example, let us suppose that
an admin has decided to grant a user (named pinger
) capabilities to
open raw sockets so that they can use ping
in a container where the
binary is controlled via capabilities. For information about how to
manage capabilities as an admin please refer to the capability admin
docs.
This feature requires a setuid-root installation of {Project}.
To take advantage of this granted capability as a user, pinger
must
also request the capability when executing a container with the
--add-caps
flag like so:
$ {command} exec --add-caps CAP_NET_RAW oras://ghcr.io/apptainer/ubuntu_ping:v1.0 ping -c 1 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 64 bytes from 8.8.8.8: icmp_seq=1 ttl=52 time=73.1 ms --- 8.8.8.8 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 73.178/73.178/73.178/0.000 ms
If the admin decides that it is no longer necessary to allow the user
pinger
to open raw sockets within {Project} containers, they can
revoke the appropriate Linux capability and pinger
will not be able
to add that capability to their containers anymore:
$ {command} exec --add-caps CAP_NET_RAW oras://ghcr.io/apptainer/ubuntu_ping:v1.0 ping -c 1 8.8.8.8 WARNING: not authorized to add capability: CAP_NET_RAW ping: socket: Operation not permitted
Another scenario which is atypical of shared resource environments, but useful in cloud-native architectures is dropping capabilities when spawning containers as the root user to help minimize attack surfaces. With a default installation of {Project}, containers created by the root user will maintain all capabilities. This behavior is configurable if desired. Check out the capability configuration and root default capabilities sections of the admin docs for more information.
Assuming the root user will execute containers with the CAP_NET_RAW
capability by default, executing the same container pinger
executed
above works without the need to grant capabilities:
# {command} exec oras://ghcr.io/apptainer/ubuntu_ping:v1.0 ping -c 1 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 64 bytes from 8.8.8.8: icmp_seq=1 ttl=52 time=59.6 ms --- 8.8.8.8 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 59.673/59.673/59.673/0.000 ms
Now we can manually drop the CAP_NET_RAW
capability like so:
# {command} exec --drop-caps CAP_NET_RAW oras://ghcr.io/apptainer/ubuntu_ping:v1.0 ping -c 1 8.8.8.8 ping: socket: Operation not permitted
And now the container will not have the ability to create new sockets,
causing the ping
command to fail.
The --add-caps
and --drop-caps
options will accept the all
keyword. Of course appropriate caution should be exercised when using
this keyword.
With {aProject} setuid installation it is possible to build and run encrypted containers. The containers are decrypted at runtime entirely in kernel space, meaning that no intermediate decrypted data is ever present on disk. See :ref:`encrypted containers <encryption>` for more details.
{Project} has many security related flags that can be passed to the
action commands; shell
, exec
, and run
allowing fine grained
control of security.
As explained above, --add-caps
will "activate" Linux capabilities
when a container is initiated, providing those capabilities have been
granted to the user by an administrator using the capability add
command. This option will also accept the case insensitive keyword
all
to add every capability granted by the administrator.
The SetUID bit allows a program to be executed as the user that owns the binary. The most well-known SetUID binaries are owned by root and allow a user to execute a command with elevated privileges. But other SetUID binaries may allow a user to execute a command as a service account.
By default SetUID is disallowed within {Project} containers as a
security precaution. But the root user can override this precaution and
allow SetUID binaries to behave as expected within {aProject}
container with the --allow-setuid
option like so:
$ sudo {command} shell --allow-setuid some_container.sif
It is possible for an admin to set a different set of default
capabilities or to reduce the default capabilities to zero for the root
user by setting the root default capabilities
parameter in the
{command}.conf
file to file
or no
respectively. If this
change is in effect, the root user can override the {command}.conf
file and enter the container with full capabilities using the
--keep-privs
option.
$ sudo {command} exec --keep-privs docker://centos:7 ping -c 1 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 64 bytes from 8.8.8.8: icmp_seq=1 ttl=128 time=18.8 ms --- 8.8.8.8 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 18.838/18.838/18.838/0.000 ms
By default, the root user has a full set of capabilities when they enter the container. You may choose to drop specific capabilities when you initiate a container as root to enhance security.
For instance, to drop the ability for the root user to open a raw socket inside the container:
$ sudo {command} exec --drop-caps CAP_NET_RAW docker://centos:7 ping -c 1 8.8.8.8 ping: socket: Operation not permitted
The drop-caps
option will also accept the case insensitive keyword
all
as an option to drop all capabilities when entering the
container.
The --security
flag allows the root user to leverage security
modules such as SELinux, AppArmor, and seccomp within your {Project}
container. You can also change the UID and GID of the user within the
container at runtime.
For instance:
$ sudo whoami root $ sudo {command} exec --security uid:1000 my_container.sif whoami david
To use seccomp to blacklist a command follow this procedure. (It is
actually preferable from a security standpoint to whitelist commands but
this will suffice for a simple example.) Note that this example was run
on Ubuntu and that {Project} was installed with the
libseccomp-dev
and pkg-config
packages as dependencies.
First write a configuration file. An example configuration file is
installed with {Project}, normally at
/usr/local/etc/{command}/seccomp-profiles/default.json
. For this
example, we will use a much simpler configuration file to blacklist the
mkdir
command.
{ "defaultAction": "SCMP_ACT_ALLOW", "archMap": [ { "architecture": "SCMP_ARCH_X86_64", "subArchitectures": [ "SCMP_ARCH_X86", "SCMP_ARCH_X32" ] } ], "syscalls": [ { "names": [ "mkdir" ], "action": "SCMP_ACT_KILL", "args": [], "comment": "", "includes": {}, "excludes": {} } ] }
We'll save the file at /home/david/no_mkdir.json
. Then we can invoke
the container like so:
$ sudo {command} shell --security seccomp:/home/david/no_mkdir.json my_container.sif {Project}> mkdir /tmp/foo Bad system call (core dumped)
Note that attempting to use the blacklisted mkdir
command resulted
in a core dump.
The full list of arguments accepted by the --security
option are as
follows:
--security="seccomp:/usr/local/etc/{command}/seccomp-profiles/default.json" --security="apparmor:/usr/bin/man" --security="selinux:context" --security="uid:1000" --security="gid:1000" --security="gid:1000:1:0" (multiple gids, first is always the primary group)