Title: "Use socket activation with Podman to get improved security and native network throughput"
Subtitle: "Learn how to restrict network access for a containerized network server"
Running a web server container is one of the more common uses for Podman. Normally you
would need to publish the ports that need to be open by providing the option --publish
(-p
) to podman run
.
When running rootless Podman you also need to be aware that the network traffic is processed
by the user space application slirp4netns which comes with a performance penalty.
You might be surprised to hear that it's now possible to run a web server container with rootless Podman and get native network throughput! Even more surprising is that the --network=none option can be given to disable the network. There is also no need to publish ports.
The new way to run a network server container with Podman is to use socket activation provided by systemd. Not all software daemons support socket activation but it's getting more popular. For instance Apache HTTP server, MariaDB, DBUS, PipeWire, Gunicorn, CUPS all have socket activation support.
Socket activation conceptually works by having systemd create a socket (e.g. TCP, UDP or Unix socket). As soon as a client connects to the socket, systemd will start the systemd service that is configured for the socket. The newly started program inherits the open file descriptor of the socket and can accept the incoming connection. The new feature is that Podman now passes such a socket to the container. Thanks to the fork/exec model of Podman, the socket will be first inherited by conmon and then by the OCI runtime and finally by the container as can be seen in the following diagram:
stateDiagram-v2
[*] --> systemd: client connects
systemd --> podman: socket inherited via fork/exec
state "OCI runtime" as s2
podman --> conmon: socket inherited via double fork/exec
conmon --> s2: socket inherited via fork/exec
s2 --> container: socket inherited via exec
Before looking into this new feature, let us take a look at another form of socket activation in Podman.
Podman has supported socket activation of its API service for a long time. Here the architecture is simpler because the socket is used by Podman itself:
stateDiagram-v2
[*] --> systemd: client connects
systemd --> podman: socket inherited via fork/exec
The file /usr/lib/systemd/user/podman.socket on a Fedora system defines the Podman API socket for rootless users:
$ cat /usr/lib/systemd/user/podman.socket
[Unit]
Description=Podman API Socket
Documentation=man:podman-system-service(1)
[Socket]
ListenStream=%t/podman/podman.sock
SocketMode=0660
[Install]
WantedBy=sockets.target
The socket is configured to be a Unix socket and can be started like this
$ systemctl --user start podman.socket
$ ls $XDG_RUNTIME_DIR/podman/podman.sock
/run/user/1000/podman/podman.sock
$
The socket can later be used by for instance docker-compose that needs a Docker-compatible API
$ export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/podman/podman.sock
$ docker-compose up
More recently, in version 3.4.0, Podman received support for another type of socket activation, namely, socket activation
of containers. Such socket activation can be used in the systemd services that are generated with
the command podman generate systemd --new --name CTR
.
I created a container image ghcr.io/eriksjolund/socket-activate-echo of an echo server that supports socket activation. The echo server currently has limited functionality. It was written for the sole purpose of demonstrating socket activation. Source code is available in the GitHub repo eriksjolund/socket-activate-echo where also more examples can be found.
Let's try it out. Start the echo server sockets
git clone https://github.com/eriksjolund/socket-activate-echo.git
mkdir -p ~/.config/systemd/user
cp -r socket-activate-echo/systemd/echo* ~/.config/systemd/user
systemctl --user daemon-reload
systemctl --user start echo@demo.socket
List the listening sockets that we will connect to
$ ss -lnp | grep 3000
udp UNCONN 0 0 127.0.0.1:3000 0.0.0.0:* users:(("systemd",pid=2516,fd=33))
udp UNCONN 0 0 [::1]:3000 [::]:* users:(("systemd",pid=2516,fd=35))
tcp LISTEN 0 4096 127.0.0.1:3000 0.0.0.0:* users:(("systemd",pid=2516,fd=28))
tcp LISTEN 0 4096 [::1]:3000 [::]:* users:(("systemd",pid=2516,fd=34))
v_str LISTEN 0 0 *:3000 *:* users:(("systemd",pid=2516,fd=36))
$ ss -lx | grep echo | grep u_str
u_str LISTEN 0 4096 /home/eriksjolund/echo_stream_sock.demo 49486 * 0
$
Test the echo server with the program socat
$ echo hello | socat - tcp4:127.0.0.1:3000
hello
$ echo hello | socat - tcp6:[::1]:3000
hello
$ echo hello | socat - udp4:127.0.0.1:3000
hello
$ echo hello | socat - udp6:[::1]:3000
hello
$ echo hello | socat - unix:$HOME/echo_stream_sock.demo
hello
$ echo hello | socat - VSOCK-CONNECT:1:3000
hello
In case the echo server would get compromised due to a security vulnerability, the container might be used to launch attacks against other PCs or devices on the network.
An echo server does not need the ability to establish outgoing connections. It just needs to accept incoming connections on the socket-activated socket it inherited.
Luckily, the command-line option --network=none, given to podman run
in the service unit file, provides those restrictions.
$ grep -A 9 ExecStart= ~/.config/systemd/user/echo@.service
ExecStart=/usr/bin/podman run \
--cidfile=%t/%n.ctr-id \
--cgroups=no-conmon \
--rm \
--sdnotify=conmon \
--replace \
--name echo-%i \
--detach \
--network none \
ghcr.io/eriksjolund/socket-activate-echo
Assume an intruder has shell access in the container. The situation can be simulated by executing commands with podman exec
.
Only the loopback interface is available
$ podman exec -ti echo-demo /bin/bash -c "ip -brief addr"
lo UNKNOWN 127.0.0.1/8 ::1/128
curl is not able to download any web page
$ podman exec -ti echo-demo /bin/bash -c "curl https://podman.io"
curl: (6) Could not resolve host: podman.io
$
If we instead remove the option --network=none and run the same commands we see that the network interface tap0 is also available
$ podman exec -ti echo-demo /bin/bash -c "ip -brief addr"
lo UNKNOWN 127.0.0.1/8 ::1/128
tap0 UNKNOWN 10.0.2.100/24 fd00::9847:3aff:fe5d:97ea/64 fe80::9847:3aff:fe5d:97ea/64
$
and that curl is able to download the web page.
$ podman exec -ti echo-demo /bin/bash -c "curl https://podman.io" | head -2
<!doctype html>
<html lang="en-US">
$
By using the option --network=none, we thus limit the possibilities for an intruder to use the compromised container as a starting point for attacks on other PCs.
Using socket activation comes with another advantage. The communication in the socket-activated socket has native network throughput. Other network traffic needs to pass through slirp4netns and gets the performance penalty that comes with it.
Unfortunately, using socket activation also comes with a disadvantage. The very first connection to a socket-activated container will have more latency due to container startup. To minimize this latency, consider adding the podman run option --pull=never and instead pull the container image beforehand.
It is possible to restrict Podman from accessing AF_INET and AF_INET6 sockets with the systemd directive RestrictAddressFamilies. Socket-activated sockets are unaffected by the directive.
If the --pull=never
option is added to podman run
, the echo container will continue to work even with the very restricted setting
RestrictAddressFamilies=AF_UNIX AF_NETLINK
All types of sockets are then inaccessible except AF_UNIX sockets, AF_NETLINK sockets and the socket-activated sockets.
In case there would be a security vulnerability in Podman, conmon or runc, this configuration limits the possibilities an intruder has to launch attacks on other PCs on the network.
The echo-restrict.service is configured with RestrictAddressFamilies=AF_UNIX AF_NETLINK
.
The service is activated with echo-restrict.socket
$ grep Listen ~/.config/systemd/user/echo-restrict.socket
ListenStream=127.0.0.1:9000
To try it out, start the socket
$ systemctl --user start echo-restrict.socket
and see that it works
$ echo hello | socat - tcp4:127.0.0.1:9000
hello
$
Caveat 1: Currently, runc supports RestrictAddressFamilies=AF_UNIX AF_NETLINK
, but the number of socket-activated sockets are limited to max 2 (see bug: opencontainers/runc#3488).
Caveat 2: At the time of this writing, crun does not support RestrictAddressFamilies=AF_UNIX AF_NETLINK
(see feature request: containers/crun#929).
If we would have used --pull=always
instead of --pull=never
, the service fails as expected because
Podman is blocked from establishing connections to the container registry.
journalctl would then show such error messages
$ journalctl --user -xe -u echo.service | grep -A2 "Trying to pull" | tail -3
May 26 10:09:54 asus podman[28272]: Trying to pull ghcr.io/eriksjolund/socket-activate-echo:latest...
May 26 10:09:54 asus podman[28272]: Error: initializing source docker://ghcr.io/eriksjolund/socket-activate-echo:latest: pinging container registry ghcr.io: Get "https://ghcr.io/v2/": dial tcp 140.82.121.34:443: socket: address family not supported by protocol
May 26 10:09:54 asus systemd[10686]: test.service: Main process exited, code=exited, status=125/n/a
$
Instead of setting up a systemd service to test out socket activation, an alternative is to use the command-line tool systemd-socket-activate.
As an example let us use the container image ghcr.io/eriksjolund/socket-activate-httpd that contains an Apache HTTP server.
In one shell, start systemd-socket-activate.
$ systemd-socket-activate -l 8080 podman run --rm --network=none ghcr.io/eriksjolund/socket-activate-httpd
The TCP port number 8080 is given as an option to systemd-socket-activate. The --publish (-p)
option for podman run
is not used.
In another shell, fetch a web page from localhost:8080
$ curl -s localhost:8080 | head -6
<!doctype html>
<html>
<head>
<meta charset='utf-8'>
<meta name='viewport' content='width=device-width, initial-scale=1'>
<title>Test Page for the HTTP Server on Fedora</title>
$
If your computer is running SELinux, you need to have container-selinux 2.183.0 or newer installed.
If container socket activation via Podman does not work and you are using an older version of
container-selinux, add --security-opt label=disable
to podman run
as a work around.