Skip to content

Linux-native "fake root" for implementing rootless containers

License

Notifications You must be signed in to change notification settings

rootless-containers/rootlesskit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RootlessKit: Linux-native fakeroot using user namespaces

RootlessKit is a Linux-native implementation of "fake root" using user_namespaces(7).

The purpose of RootlessKit is to run Docker and Kubernetes as an unprivileged user (known as "Rootless mode"), so as to protect the real root on the host from potential container-breakout attacks.

What RootlessKit actually does

RootlessKit creates user_namespaces(7) and mount_namespaces(7), and executes newuidmap(1)/newgidmap(1) along with subuid(5) and subgid(5).

RootlessKit also supports isolating network_namespaces(7) with userspace NAT using "slirp". Kernel-mode NAT using SUID-enabled lxc-user-nic(1) is also experimentally supported.

Similar projects

Tools based on LD_PRELOAD (not enough to run rootless containers and yet lacks support for static binaries):

Tools based on ptrace(2) (not enough to run rootless containers and yet slow):

Tools based on user_namespaces(7) (as in RootlessKit, but without support for --copy-up, --net, ...):

Projects using RootlessKit

Container engines:

Container image builders:

  • BuildKit: Next-generation docker build backend

Kubernetes distributions:

  • Usernetes: Docker & Kubernetes, installable under a non-root user's $HOME.
  • k3s: Lightweight Kubernetes

Setup

Run make && sudo make install .

The following binaries will be installed:

  • /usr/local/bin/rootlesskit
  • /usr/local/bin/rootlessctl
  • /usr/local/bin/rootlesskit-docker-proxy (DEPRECATED; Only required for Docker prior to v28)

Requirements

subuid

  • newuidmap and newgidmap need to be installed on the host. These commands are provided by the uidmap package on most distributions.

  • /etc/subuid and /etc/subgid should contain more than 65536 sub-IDs. e.g. penguin:231072:65536. These files are automatically configured on most distributions.

$ id -u
1001
$ whoami
penguin
$ grep "^$(whoami):" /etc/subuid
penguin:231072:65536
$ grep "^$(whoami):" /etc/subgid
penguin:231072:65536

See also https://rootlesscontaine.rs/getting-started/common/subuid/

sysctl

Some distros require setting up sysctl:

  • Debian (excluding Ubuntu) and Arch: sudo sh -c "echo 1 > /proc/sys/kernel/unprivileged_userns_clone"
  • RHEL/CentOS 7 (excluding RHEL/CentOS 8): sudo sh -c "echo 28633 > /proc/sys/user/max_user_namespaces"

To persist sysctl configurations, edit /etc/sysctl.conf or add a file under /etc/sysctl.d.

See also https://rootlesscontaine.rs/getting-started/common/sysctl/

Usage

Inside rootlesskit bash, your UID is mapped to 0 but it is not the real root:

(host)$ rootlesskit bash
(rootlesskit)# id
uid=0(root) gid=0(root) groups=0(root),65534(nogroup)
(rootlesskit)# ls -l /etc/shadow
-rw-r----- 1 nobody nogroup 1050 Aug 21 19:02 /etc/shadow
(rootlesskit)# cat /etc/shadow
cat: /etc/shadow: Permission denied

Environment variables are kept untouched:

(host)$ rootlesskit bash
(rootlesskit)# echo $USER
penguin
(rootlesskit)# echo $HOME
/home/penguin
(rootlesskit)# echo $XDG_RUNTIME_DIR
/run/user/1001

Filesystems can be isolated from the host with --copy-up:

(host)$ rootlesskit --copy-up=/etc bash
(rootlesskit)# rm /etc/resolv.conf
(rootlesskit)# vi /etc/resolv.conf

You can even create network namespaces with Slirp:

(host)$ rootlesskit --copy-up=/etc --copy-up=/run --net=slirp4netns --disable-host-loopback bash
(rootleesskit)# ip netns add foo
...

Full CLI options

$ rootlesskit --help
NAME:
   rootlesskit - Linux-native fakeroot using user namespaces

USAGE:
   rootlesskit [global options] [arguments...]

VERSION:
   2.0.0-alpha.0

DESCRIPTION:
   RootlessKit is a Linux-native implementation of "fake root" using user_namespaces(7).
   
   Web site: https://github.com/rootless-containers/rootlesskit
   
   Examples:
     # spawn a shell with a new user namespace and a mount namespace
     rootlesskit bash
   
     # make /etc writable
     rootlesskit --copy-up=/etc bash
   
     # set mount propagation to rslave
     rootlesskit --propagation=rslave bash
   
     # create a network namespace with slirp4netns, and expose 80/tcp on the namespace as 8080/tcp on the host
     rootlesskit --copy-up=/etc --net=slirp4netns --disable-host-loopback --port-driver=builtin -p 127.0.0.1:8080:80/tcp bash
   
   Note: RootlessKit requires /etc/subuid and /etc/subgid to be configured by the real root user.
   See https://rootlesscontaine.rs/getting-started/common/ .

OPTIONS:
  Misc:                                                      
    --debug                                                  debug mode (default: false)
    --print-semver value                                     print a version component as a decimal integer [major, minor, patch]
    --help, -h                                               show help
    --version, -v                                            print the version
                                                             
  Mount:                                                     
    --copy-up value [ --copy-up value ]                      mount a filesystem and copy-up the contents. e.g. "--copy-up=/etc" (typically required for non-host network)
    --copy-up-mode value                                     copy-up mode [tmpfs+symlink]
    --propagation value                                      mount propagation [rprivate, rslave]
                                                             
  Network:                                                   
    --net value                                              network driver [host, pasta(experimental), slirp4netns, vpnkit, lxc-user-nic(experimental)]
    --mtu value                                              MTU for non-host network (default: 65520 for pasta and slirp4netns, 1500 for others) (default: 0)
    --cidr value                                             CIDR for pasta and slirp4netns networks (default: 10.0.2.0/24)
    --ifname value                                           Network interface name (default: tap0 for pasta, slirp4netns, and vpnkit; eth0 for lxc-user-nic)
    --disable-host-loopback                                  prohibit connecting to 127.0.0.1:* on the host namespace (default: false)
    --ipv6                                                   enable IPv6 routing. Unrelated to port forwarding. Only supported for pasta and slirp4netns. (experimental) (default: false)
    --detach-netns                                           detach network namespaces  (default: false)
                                                             
  Network [lxc-user-nic]:                                    
    --lxc-user-nic-binary value                              path of lxc-user-nic binary for --net=lxc-user-nic
    --lxc-user-nic-bridge value                              lxc-user-nic bridge name
                                                             
  Network [pasta]:                                           
    --pasta-binary value                                     path of pasta binary for --net=pasta
                                                             
  Network [slirp4netns]:                                     
    --slirp4netns-binary value                               path of slirp4netns binary for --net=slirp4netns
    --slirp4netns-sandbox value                              enable slirp4netns sandbox (experimental) [auto, true, false] (the default is planned to be "auto" in future)
    --slirp4netns-seccomp value                              enable slirp4netns seccomp (experimental) [auto, true, false] (the default is planned to be "auto" in future)
                                                             
  Network [vpnkit]:                                          
    --vpnkit-binary value                                    path of VPNKit binary for --net=vpnkit
                                                             
  Port:                                                      
    --port-driver value                                      port driver for non-host network. [none, implicit (for pasta), builtin, slirp4netns]
    --publish value, -p value [ --publish value, -p value ]  publish ports. e.g. "127.0.0.1:8080:80/tcp"
                                                             
  Process:                                                   
    --pidns                                                  create a PID namespace (default: false)
    --cgroupns                                               create a cgroup namespace (default: false)
    --utsns                                                  create a UTS namespace (default: false)
    --ipcns                                                  create an IPC namespace (default: false)
    --reaper value                                           enable process reaper. Requires --pidns. [auto,true,false]
    --evacuate-cgroup2 value                                 evacuate processes into the specified subgroup. Requires --pidns and --cgroupns
                                                             
  State:                                                     
    --state-dir value                                        state directory
                                                             
  SubID:                                                     
    --subid-source value                                     the source of the subids. "dynamic" executes /usr/bin/getsubids. "static" reads /etc/{subuid,subgid}. [auto,dynamic,static]
                                                             

State directory

The following files will be created in the state directory, which can be specified with --state-dir:

  • lock: lock file
  • child_pid: decimal PID text that can be used for nsenter(1).
  • api.sock: REST API socket. See ./docs/api.md and ./docs/port.md.
  • netns (since v2.0.0): Detached NetNS. Created only with --detach-netns. Valid only in the child mount namespace.
  • resolv.conf (since v2.0.0): resolv.conf file. Bind-mounted to /etc/resolv.conf unles --detach-netns is specified.
  • hosts (since v2.0.0): hosts file. Bind-mounted to /etc/hosts unless --detach-netns is specified.

If --state-dir is not specified, RootlessKit creates a temporary state directory on /tmp and removes it on exit.

Undocumented files are subject to change.

Environment variables

The following environment variables will be set for the child process:

  • ROOTLESSKIT_STATE_DIR (since v0.3.0): absolute path to the state dir
  • ROOTLESSKIT_PARENT_EUID (since v0.8.0): effective UID
  • ROOTLESSKIT_PARENT_EGID (since v0.8.0): effective GID

Undocumented environment variables are subject to change.

Additional documents