Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All Docker containers fail to start after update to 2983.2.0 #544

Open
meltonbw opened this issue Nov 10, 2021 · 33 comments
Open

All Docker containers fail to start after update to 2983.2.0 #544

meltonbw opened this issue Nov 10, 2021 · 33 comments
Labels
area/selinux Issues related to SELinux kind/bug Something isn't working

Comments

@meltonbw
Copy link

Description

After an update to 2983.2.0, all Docker containers fail to start with the message: standard_init_linux.go:228: exec user process caused: operation not permitted

Impact

Cannot start any containers. All containers fail to start at boot.

Environment and steps to reproduce

  1. Set-up: Flatcar version 2983.2.0
  2. Task: Booting up
  3. Action(s): None
  4. Error: All containers fail with the error: standard_init_linux.go:228: exec user process caused: operation not permitted

Expected behavior

Containers should start without error. I cannot start a basic container with bash from the CL either:

$ docker container run --interactive --tty --rm ubuntu bash
Unable to find image 'ubuntu:latest' locally
docker.io/library/ubuntu@sha256:c95a8e48bf88e9849f3e0f723d9f49fa12c5a00cfc6e60d2bc99d87555295e4c: Pulling from library/ubuntu
da7391352a9b: Pull complete
14428a6d4bcd: Pull complete
2c2d948710f2: Pull complete
Digest: sha256:c95a8e48bf88e9849f3e0f723d9f49fa12c5a00cfc6e60d2bc99d87555295e4c
Status: Downloaded newer image for ubuntu@sha256:c95a8e48bf88e9849f3e0f723d9f49fa12c5a00cfc6e60d2bc99d87555295e4c
Tagging ubuntu@sha256:c95a8e48bf88e9849f3e0f723d9f49fa12c5a00cfc6e60d2bc99d87555295e4c as ubuntu:latest
standard_init_linux.go:228: exec user process caused: operation not permitted

Additional information

$ docker info -f "{{json .}}" | jq ".SecurityOptions"
[
  "name=seccomp,profile=default",
  "name=selinux",
  "name=userns",
  "name=cgroupns"
]
$ systemctl cat docker.service
# /run/systemd/system/docker.service
[Unit]
Requires=torcx.target
After=torcx.target
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=containerd.service docker.socket network-online.target
Wants=network-online.target
Requires=containerd.service docker.socket

[Service]
EnvironmentFile=/run/metadata/torcx
Environment=TORCX_IMAGEDIR=/docker
Type=notify
EnvironmentFile=-/run/flannel/flannel_docker_opts.env
Environment=DOCKER_SELINUX=--selinux-enabled=true

# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/env PATH=${TORCX_BINDIR}:${PATH} ${TORCX_BINDIR}/dockerd --host=fd:// --containerd=/var/run/docker/libcontainerd/docker-containerd.sock $DOCKER_SELINUX $DOCKER_OPTS $DOCKER_CGROUPS $DOCKER_OPT_BIP $DOCKER_OPT_MTU $DOCKER_OPT_IPMASQ
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=1048576
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
# restart the docker process if it exits prematurely
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/docker.service.d/docker-opts.conf
[Service]
Environment="DOCKER_OPTS="
@meltonbw meltonbw added the kind/bug Something isn't working label Nov 10, 2021
@tormath1
Copy link
Contributor

Hi @meltonbw,

It looks like SELinux related. What's the return of:

$ getenforce

?

@meltonbw
Copy link
Author

$ getenforce
Enforcing

Doing setenforce 0 does not seem to help.

@tormath1
Copy link
Contributor

tormath1 commented Nov 10, 2021

Thanks, then could you try to deactivate the userns security options ?

In the CI, we run every test with selinux mode set to enforce but we don't test things yet with userns security option, that might be the root cause then.

  • could you share the content of /etc/docker/daemon.json or any Docker customization you have ?
  • do you have custom SELinux policies ?
  • do you have an orchestrator like Kubernetes to run your containers ?

@meltonbw
Copy link
Author

Disabling userns didn't help unfortunately.

$ cat /etc/docker/daemon.json
{
    "icc": false,
    "log-driver": "journald",
    "live-restore": true,
    "userland-proxy": false,
    "no-new-privileges": true,
    "dns": ["10.0.0.1", "9.9.9.9", "1.1.1.1"]
}

No custom SELinux policies and I don't use an orchestrator.

Here is the full docker info:

$ docker --debug info
Client:
 Context:    default
 Debug Mode: true

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 4
 Server Version: 20.10.10
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: journald
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 997b1f9905746cccc997ab5c697f838e5be519ba
 runc version: 61ab78b58f2c0c3fbfc63477f2c020e825b9789d
 init version:
 Security Options:
  seccomp
   Profile: default
  selinux
  cgroupns
 Kernel Version: 5.10.77-flatcar
 Operating System: Flatcar Container Linux by Kinvolk 2983.2.0 (Oklo)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 1.939GiB
 Name: flatcar-lan
 ID: <redacted>
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true

@meltonbw
Copy link
Author

meltonbw commented Nov 10, 2021

Note, I am able to start a --privileged Docker container (with and without userns).

@jepio
Copy link
Member

jepio commented Nov 11, 2021

I think this is connected to the no-new-privileges setting, there are multiple hits when searching for problems with that and selinux.

@tormath1
Copy link
Contributor

I'm still trying to reproduce it - even with no-new-privileges + SELinux in enforce mode I'm still able to start a container.

@serbaut
Copy link

serbaut commented Nov 12, 2021

We have the same issue on some hosts. It looks like the selinux policies differs between the hosts. What is the recommended way to install the default policies?

@tormath1
Copy link
Contributor

@serbaut thanks for raising this too - we're currently investigating.

Do you have the same runtime spec: SELinux in enforcing mode, no-new-privileges and so on ?

@serbaut
Copy link

serbaut commented Nov 12, 2021

The following seems to fix it:

rm -rf /var/lib/selinux
ln -s /usr/lib/selinux/policy /var/lib/selinux
rm -rf /etc/selinux/mcs
ln -s /usr/lib/selinux/mcs /etc/selinux/mcs
semodule -R

@serbaut
Copy link

serbaut commented Nov 12, 2021

Do you have the same runtime spec: SELinux in enforcing mode, no-new-privileges and so on ?

# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             mcs
Current mode:                   permissive
Mode from config file:          permissive
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Memory protection checking:     actual (secure)
Max kernel policy version:      33

We dont have any docker config (node is running k8s).

@tormath1
Copy link
Contributor

@serbaut thanks.

Having Current mode: permissive should not raised permission denied issue in the containers since it's in permissive mode. We're continuing to investigating.

@serbaut
Copy link

serbaut commented Nov 12, 2021

This is how it looks on a bad node

# ls -ld /etc/selinux/* /var/lib/selinux
lrwxrwxrwx. 1 root root   28 Apr 27  2021 /etc/selinux/config -> ../../usr/lib/selinux/config
drwxr-xr-x. 4 root root 4096 May 18 15:55 /etc/selinux/mcs
lrwxrwxrwx. 1 root root   25 Apr 27  2021 /etc/selinux/mls -> ../../usr/lib/selinux/mls
lrwxrwxrwx. 1 root root   35 May 18 13:43 /etc/selinux/semanage.conf -> ../../usr/lib/selinux/semanage.conf
lrwxrwxrwx. 1 root root   30 Apr 27  2021 /etc/selinux/targeted -> ../../usr/lib/selinux/targeted
drwxr-xr-x. 5 root root 4096 May 18 15:55 /var/lib/selinux

vs

# ls -ld /etc/selinux/* /var/lib/selinux
lrwxrwxrwx. 1 root root 28 Apr 27  2021 /etc/selinux/config -> ../../usr/lib/selinux/config
lrwxrwxrwx. 1 root root 25 Apr 27  2021 /etc/selinux/mcs -> ../../usr/lib/selinux/mcs
lrwxrwxrwx. 1 root root 25 Apr 27  2021 /etc/selinux/mls -> ../../usr/lib/selinux/mls
lrwxrwxrwx. 1 root root 35 May 18 08:42 /etc/selinux/semanage.conf -> ../../usr/lib/selinux/semanage.conf
lrwxrwxrwx. 1 root root 30 Apr 27  2021 /etc/selinux/targeted -> ../../usr/lib/selinux/targeted
lrwxrwxrwx. 1 root root 28 Apr 27  2021 /var/lib/selinux -> ../../usr/lib/selinux/policy

So depending on which image you started with you get different selinux configs.

@tormath1
Copy link
Contributor

@serbaut both nodes are on the same Flatcar version ?

@serbaut
Copy link

serbaut commented Nov 12, 2021

@serbaut both nodes are on the same Flatcar version ?

Yes both are on VERSION_ID=2983.2.0

@tormath1
Copy link
Contributor

@serbaut that's weird. I started a fresh VERSION_ID=2983.2.0 :

$ ls -ld /etc/selinux/* /var/lib/selinux
lrwxrwxrwx. 1 root root 28 Nov  5 17:41 /etc/selinux/config -> ../../usr/lib/selinux/config
lrwxrwxrwx. 1 root root 25 Nov  5 17:41 /etc/selinux/mcs -> ../../usr/lib/selinux/mcs
lrwxrwxrwx. 1 root root 25 Nov  5 17:41 /etc/selinux/mls -> ../../usr/lib/selinux/mls
lrwxrwxrwx. 1 root root 35 Nov 12 15:26 /etc/selinux/semanage.conf -> ../../usr/lib/selinux/semanage.conf
lrwxrwxrwx. 1 root root 30 Nov  5 17:41 /etc/selinux/targeted -> ../../usr/lib/selinux/targeted
lrwxrwxrwx. 1 root root 28 Nov  5 17:39 /var/lib/selinux -> ../../usr/lib/selinux/policy

and it looks correct according to the tmpfiles configuration:

$ systemd-tmpfiles --cat-config | grep selinux
L   /etc/audit/rules.d/80-selinux.rules -   -   -   -   /usr/share/audit/rules.d/80-selinux.rules
d	/etc/selinux/		-	-	-	-	-
L	/etc/selinux/semanage.conf	-	-	-	-	../../usr/lib/selinux/semanage.conf
# /usr/lib64/tmpfiles.d/selinux-base.conf
d	/etc/selinux/		-	-	-	-	-
L	/etc/selinux/config	-	-	-	-	../../usr/lib/selinux/config
L	/etc/selinux/mcs	-	-	-	-	../../usr/lib/selinux/mcs

Could it be related to some manual intervention in the past on this nodes ?

@serbaut
Copy link

serbaut commented Nov 12, 2021

Strange. Both nodes are created from flatcar-stable-2765.2.3 so I dont know how they can differ like that. Is part of k8s fiddling with selinux configs? We have not touched them afaik.

@tormath1
Copy link
Contributor

I'd be curious to hear from @meltonbw to have a look at this command output:

ls -ld /etc/selinux/* /var/lib/selinux

He's not using Kubernetes though.

@serbaut
Copy link

serbaut commented Nov 12, 2021

$ systemd-tmpfiles --cat-config | grep selinux
L   /etc/audit/rules.d/80-selinux.rules -   -   -   -   /usr/share/audit/rules.d/80-selinux.rules
d       /etc/selinux/           -       -       -       -       -
L       /etc/selinux/semanage.conf      -       -       -       -       ../../usr/lib/selinux/semanage.conf
# /usr/lib64/tmpfiles.d/selinux-base.conf
d       /etc/selinux/           -       -       -       -       -
L       /etc/selinux/config     -       -       -       -       ../../usr/lib/selinux/config
L       /etc/selinux/mcs        -       -       -       -       ../../usr/lib/selinux/mcs
$ ls -l /etc/selinux/
total 20
lrwxrwxrwx. 1 root root   28 Apr 27  2021 config -> ../../usr/lib/selinux/config
drwxr-xr-x. 4 root root 4096 May 18 15:55 mcs
lrwxrwxrwx. 1 root root   25 Apr 27  2021 mls -> ../../usr/lib/selinux/mls
lrwxrwxrwx. 1 root root   35 May 18 13:46 semanage.conf -> ../../usr/lib/selinux/semanage.conf
lrwxrwxrwx. 1 root root   30 Apr 27  2021 targeted -> ../../usr/lib/selinux/targeted

Anything else I can check?

@tormath1
Copy link
Contributor

@serbaut thanks for helping; maybe you can try to run the following:

$ journalctl --no-pager | grep systemd-tmpfiles

On the "bad nodes" to identify any errors preventing systemd-tmpfiles to create the symlink ?

@serbaut
Copy link

serbaut commented Nov 12, 2021

I dont see anything but we only have 90 days of logs. Im starting to suspect AquaSec, Prisma Cloud or k8s but it would be intesting to see if @meltonbw has the same issue.

@serbaut
Copy link

serbaut commented Nov 12, 2021

I'll check back monday. We dont upgrade our prod nodes until next weekend so as long as we have a fix by then I'm fine :)

@tormath1
Copy link
Contributor

@serbaut thanks for your help and the data you provided - it's really useful. You might be interested to run some Beta nodes in your cluster to try to catch this kind of behavior earlier.

@serbaut
Copy link

serbaut commented Nov 12, 2021

@serbaut thanks for your help and the data you provided - it's really useful. You might be interested to run some Beta nodes in your cluster to try to catch this kind of behavior earlier.

We actually have that in another cluster but they didnt experience any issue :P

@meltonbw
Copy link
Author

Very similar to @serbaut's bad setup:

$ ls -ld /etc/selinux/* /var/lib/selinux
-rw-r--r--. 1 root root  627 Nov 10 18:00 /etc/selinux/config
drwxr-xr-x. 4 root root 4096 May 31  2020 /etc/selinux/mcs
lrwxrwxrwx. 1 root root   25 May 25  2020 /etc/selinux/mls -> ../../usr/lib/selinux/mls
lrwxrwxrwx. 1 root root   35 May 30  2020 /etc/selinux/semanage.conf -> ../../usr/lib/selinux/semanage.conf
lrwxrwxrwx. 1 root root   30 May 25  2020 /etc/selinux/targeted -> ../../usr/lib/selinux/targeted
drwxr-xr-x. 4 root root 4096 May 25  2020 /var/lib/selinux

I will try the fix.

@meltonbw
Copy link
Author

Ok, after trying @serbaut's fix, now Docker does not start, and the daemon is missing from systemctl. systemctl daemon-reload does not help.

$ ls -l /etc/systemd/system/docker.service.d/
total 8
-rw-r--r--. 1 root root 37 Jun  9  2020 docker-opts.conf
$ sudo systemctl list-units | grep .service
  audit-rules.service                                                                         loaded active exited    Load Security Auditing Rules
  clean-ca-certificates.service                                                               loaded active exited    Clean up broken links in /etc/ssl/certs
  dbus.service                                                                                loaded active running   D-Bus System Message Bus
  flatcar-tmpfiles.service                                                                    loaded active exited    Create missing system files
  getty@tty1.service                                                                          loaded active running   Getty on tty1
  kmod-static-nodes.service                                                                   loaded active exited    Create list of static device nodes for the current kernel
  locksmithd.service                                                                          loaded active running   Cluster reboot manager
  lvm2-activation-early.service                                                               loaded active exited    Activation of LVM2 logical volumes
  lvm2-activation.service                                                                     loaded active exited    Activation of LVM2 logical volumes
  sshd-keygen.service                                                                         loaded active exited    Generate sshd host keys
  sshd@0-192.168.87.4:22-192.168.87.49:61153.service                                          loaded active running   OpenSSH per-connection server daemon
  systemd-fsck@dev-disk-by\x2dlabel-OEM.service                                               loaded active exited    File System Check on /dev/disk/by-label/OEM
  systemd-journal-flush.service                                                               loaded active exited    Flush Journal to Persistent Storage
  systemd-journald.service                                                                    loaded active running   Journal Service
  systemd-logind.service                                                                      loaded active running   User Login Management
  systemd-networkd-wait-online.service                                                        loaded active exited    Wait for Network to be Configured
  systemd-networkd.service                                                                    loaded active running   Network Service
  systemd-random-seed.service                                                                 loaded active exited    Load/Save Random Seed
  systemd-remount-fs.service                                                                  loaded active exited    Remount Root and Kernel File Systems
  systemd-resolved.service                                                                    loaded active running   Network Name Resolution
  systemd-sysctl.service                                                                      loaded active exited    Apply Kernel Variables
  systemd-timesyncd.service                                                                   loaded active running   Network Time Synchronization
  systemd-tmpfiles-setup-dev.service                                                          loaded active exited    Create Static Device Nodes in /dev
  systemd-tmpfiles-setup.service                                                              loaded active exited    Create Volatile Files and Directories
  systemd-udev-settle.service                                                                 loaded active exited    Wait for udev To Complete Device Initialization
  systemd-udev-trigger.service                                                                loaded active exited    Coldplug All udev Devices
  systemd-udevd.service                                                                       loaded active running   Rule-based Manager for Device Events and Files
  systemd-update-utmp.service                                                                 loaded active exited    Update UTMP about System Boot/Shutdown
  systemd-user-sessions.service                                                               loaded active exited    Permit User Sessions
  update-engine.service                                                                       loaded active running   Update Engine
  user-runtime-dir@500.service                                                                loaded active exited    User Runtime Directory /run/user/500
  user@500.service
$ docker info
The program docker is managed by torcx, which did not run.

@meltonbw
Copy link
Author

meltonbw commented Nov 12, 2021

Was getting this in the logs:

$ journalctl -fe | grep docker
Nov 12 20:00:17 flatcar-lan audit: PATH item=0 name="/var/lib/docker" inode=164 dev=08:09 mode=040710 ouid=0 ogid=231072 rdev=00:00 obj=system_u:object_r:var_lib_t:s0 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
Nov 12 20:00:17 flatcar-lan audit: PATH item=0 name="/etc/docker" inode=277 dev=08:09 mode=040700 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
Nov 12 20:00:17 flatcar-lan audit: PATH item=0 name="/etc/docker/" inode=277 dev=08:09 mode=040700 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 nametype=PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0

I set selinux to permissive and now everything is back up and running! Thanks a bunch @tormath1 and @serbaut!

Any ideas what beef selinux has with Docker?

@tormath1
Copy link
Contributor

@meltonbw glad to hear you worked-around the issue - SELinux is always full of surprise.

Regarding this comment, we can try to make tmpfiles stricter to avoid missing some links like in here: #544 (comment)

Regarding this:

$ docker info
The program docker is managed by torcx, which did not run.

We can see an AVC message denial:

Nov 15 09:45:43 localhost audit[688]: AVC avc: denied { associate } for pid=688 comm="torcx-generator" name="docker" dev="tmpfs" ino=2 scontext=system_u:object_r:unlabeled_t:s0 tcontext=system_u:object_r:tmpfs_t:s0 tclass=filesystem permissive=0

I'm currently patching our test suite to set enforcing mode at boot time to catch this kind of early boot error. Of course SELinux patch will follow to authorize this torcx "associate" capability.

@serbaut
Copy link

serbaut commented Nov 15, 2021

I have looked some more and it seems the "correct" policies are permissive (the ones with symlinks to /usr).

Nov 15 10:05:06 master1 audit[4625]: AVC avc:  denied  { watch_mount } for  pid=4625 comm="fsmon" path="/foo/passwd" dev="sda9" ino=63 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=file permissive=1

I still do not understand how some clusters have a different config.

@serbaut
Copy link

serbaut commented Nov 15, 2021

I'm 99% sure its https://www.aquasec.com/ enforcer that has created custom policies and that's why some clusters have issues and others don't.

@meltonbw do you know why your selinux config was customized and not directly upstream via links?

@tormath1
Copy link
Contributor

SELinux patch has been submitted in flatcar-archive/coreos-overlay#1426 to fix the torcx error:

Nov 15 09:45:43 localhost audit[688]: AVC avc: denied { associate } for pid=688 comm="torcx-generator" name="docker" dev="tmpfs" ino=2 scontext=system_u:object_r:unlabeled_t:s0 tcontext=system_u:object_r:tmpfs_t:s0 tclass=filesystem permissive=0

@serbaut I was not aware about aquasec, is that something opensource we can easily test / add to our tests ?

@serbaut
Copy link

serbaut commented Nov 25, 2021

I don't know if there is an enforcer available for testing but its not open source at least. What it seems to have done is to remove the links and install its own policies.

Maybe it is enough to document how to handle customized selinux policies and possible effects.

@pothos
Copy link
Member

pothos commented Feb 17, 2022

SELinux patch has been submitted in flatcar-linux/coreos-overlay#1426 to fix the torcx error

As far as I know this was not enough and torcx will still fail
Edit: That's why flatcar/mantle#254 reverted the test for enforcement from boot on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/selinux Issues related to SELinux kind/bug Something isn't working
Projects
Development

No branches or pull requests

5 participants