Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fresh installation of Ubuntu 22.04 and Docker/Sysbox fails to create shim task #608

Closed
SeanMXD opened this issue Dec 4, 2022 · 13 comments
Closed

Comments

@SeanMXD
Copy link

SeanMXD commented Dec 4, 2022

I have a fresh installation of Ubuntu 22.04, Docker and Sysbox. I was trying to start a container with sysbox to test the new installation using this command:

$ sudo docker run --runtime=sysbox-runc -it ubuntu:latest

and this was the error:

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: container_linux.go:419: starting container process caused: process_linux.go:607: container init caused: process_linux.go:578: handleReqOp caused: rootfs_init_linux.go:348: failed to mkdirall /var/lib/sysbox/shiftfs/faf84eee-9fd3-4379-a027-52d63cd53fc7/var/lib/rancher/k3s: mkdir /var/lib/sysbox/shiftfs/faf84eee-9fd3-4379-a027-52d63cd53fc7/var/lib/rancher: value too large for defined data type caused: mkdir /var/lib/sysbox/shiftfs/faf84eee-9fd3-4379-a027-52d63cd53fc7/var/lib/rancher: value too large for defined data type: unknown.
ERRO[0000] error waiting for container: context canceled

Here is the core dump:

core.59640.tar.gz

@SeanMXD
Copy link
Author

SeanMXD commented Dec 4, 2022

Sysbox seems to be running. I ran this command:

$ sudo systemctl list-units -t service --all | grep sysbox

and this was the result:

sysbox-fs.service loaded active running sysbox-fs (part of the Sysbox container runtime)
sysbox-mgr.service loaded active running sysbox-mgr (part of the Sysbox container runtime)
sysbox.service loaded active running Sysbox container runtime

@ctalledo
Copy link
Member

ctalledo commented Dec 4, 2022

Hi @SeanMXD, thanks for giving Sysbox a shot.

Based on the error message I think you are likely hitting #596.

That is, Sysbox uses a kernel module called shiftfs, and unfortunately Ubuntu broke the functionality on 5.15.0-48 and later versions of 5.15.

Just to make sure, what's your kernel version (i.e., uname -a)?

There is a work-around fortunately: you can tell Sysbox to not use shiftfs, by configuring the systemd service for the sysbox-mgr such that sysbox-mgr starts with the --disable-shiftfs flag. Follow the example here for more info on how to reconfigure sysbox.

In general, things work better with shiftfs enabled, but they will also work fine without it.

@SeanMXD
Copy link
Author

SeanMXD commented Dec 7, 2022

Hi @ctalledo! Thanks for the info, I will try that solution as soon as possible.

This is the output of uname -a:

Linux host 5.15.0-48-generic #54-Ubuntu SMP Fri Aug 26 13:26:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

@bijeebuss
Copy link

Hi @SeanMXD, thanks for giving Sysbox a shot.

Based on the error message I think you are likely hitting #596.

That is, Sysbox uses a kernel module called shiftfs, and unfortunately Ubuntu broke the functionality on 5.15.0-48 and later versions of 5.15.

Just to make sure, what's your kernel version (i.e., uname -a)?

There is a work-around fortunately: you can tell Sysbox to not use shiftfs, by configuring the systemd service for the sysbox-mgr such that sysbox-mgr starts with the --disable-shiftfs flag. Follow the example here for more info on how to reconfigure sysbox.

In general, things work better with shiftfs enabled, but they will also work fine without it.

what are the potential consequences of doing this work around. I'm hitting the same issue here

@ctalledo
Copy link
Member

ctalledo commented Dec 7, 2022

Hi @SeanMXD,

Without shiftfs, Sysbox needs to perform a "chown" of the container's root filesystem when the container starts. It chowns the file UID:GID from 0:0 -> 165536:165536 (or whatever unprivileged UID:GID Sysbox assigns to the container).

The chown slows down the container startup time a little bit. For a small container image (e.g., alpine, ubuntu, etc.) you won't notice it. For a larger container image (e.g., nestybox/k8s-node) you may notice a slow down of a few seconds (< 5 secs). The same occurs when the container stops, where Sysbox will revert the chown (unless the container was started with "--rm" meaning that it's going to be removed immediately after stop).

Hope that helps.

@bijeebuss
Copy link

ok I tried adding adding --disable-shiftfs
It seems to be working at first but when I create a container from any of the nestybox/ubuntu jammy or focal images I get an error when I try to run apt-upgrade that looks like this

Unpacking gzip (1.10-0ubuntu4.1) over (1.10-0ubuntu4) ...
dpkg: error processing archive /var/cache/apt/archives/gzip_1.10-0ubuntu4.1_amd64.deb (--unpack):
 unable to make backup link of './bin/uncompress' before installing new version: Operation not permitted
dpkg: error while cleaning up:
 unable to restore backup version of '/bin/gzip': Stale file handle
dpkg: error while cleaning up:
 unable to restore backup version of '/bin/gzexe': Stale file handle
dpkg: error while cleaning up:
 unable to restore backup version of '/bin/gunzip': Stale file handle
Errors were encountered while processing:
 /var/cache/apt/archives/gzip_1.10-0ubuntu4.1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

it appears that a bunch of the files in /usr/bin and /bin are owned by nobody and in group nogroup

I ran
docker run --runtime=sysbox-runc -d --hostname umbrel --name umbrel nestybox/ubuntu-focal-systemd-dev
docker container exec -it umbrel /bin/bash

I also tried jammy with and without docker

@ctalledo
Copy link
Member

ctalledo commented Dec 8, 2022

Hi @bijeebuss, likely the --disable-shiftfs is not getting picked up by sysbox in your setup, because I tried to repro on my end and was not able to (apt-get update worked without problem).

Inside the sysbox container, the findmnt command should show the following mount at the top:

root@umbrel:/# findmnt                                                                                                                                                                                                                                                                                                         
TARGET                             SOURCE                                                   FSTYPE  OPTIONS                      
/                                  overlay                                                  overlay rw,relatime,lowerdir=/var/lib/sysbox/rootfs/ba1803d2020e381211939320cbdc4cebd30887b230c6e426d4ac6b6a8c9aa982/bottom/merged,upperdir=/var/lib/sysbox/rootfs/ba1803d2020e381211939320cbdc4cebd30887b230c6e426d4ac6b6a8c9aa982

If it shows something with shiftfs, that would confirm that Sysbox is not picking up the --disable-shiftfs option. In that case, make sure that the --disable-shiftfs flag is passed to the sysbox-mgr component via the corresponding systemd service unit (i.e., /lib/systemd/system/sysbox-mgr.service). It should look like this:

[Service]
Type=simple
Type=notify
ExecStart=/usr/bin/sysbox-mgr --disable-shiftfs
TimeoutStartSec=45
TimeoutStopSec=90
StartLimitInterval=0
NotifyAccess=main
OOMScoreAdjust=-500

NOTE: Do not pass it via the top-level sysbox service (/lib/systemd/system/sysbox.service) as that top service is simply querying the version of sysbox and ensuring the sub-services for sysbox-mgr and sysbox-fs start.

After this, make sure to reload it and restart sysbox:

sudo systemctl daemon-reload
sudo systemctl restart sysbox

More info here: https://github.com/nestybox/sysbox/blob/master/docs/user-guide/configuration.md#reconfiguration-procedure

@bijeebuss
Copy link

I think it worked because the container wouldn’t even start without disabling shiftfs.
apt-update worked fine but it was apt-upgrade that failed

@ctalledo
Copy link
Member

ctalledo commented Dec 8, 2022

I think it worked because the container wouldn’t even start without disabling shiftfs. apt-update worked fine but it was apt-upgrade that failed

Oh Ok, I see it now on my side to when executing apt-get upgrade:

Preparing to unpack .../gzip_1.10-0ubuntu4.1_amd64.deb ...
Unpacking gzip (1.10-0ubuntu4.1) over (1.10-0ubuntu4) ...
dpkg: error processing archive /var/cache/apt/archives/gzip_1.10-0ubuntu4.1_amd64.deb (--unpack):
 unable to make backup link of './bin/uncompress' before installing new version: Operation not permitted
dpkg: error while cleaning up:
 unable to restore backup version of '/bin/gzip': Stale file handle
dpkg: error while cleaning up:
 unable to restore backup version of '/bin/gzexe': Stale file handle
dpkg: error while cleaning up:
 unable to restore backup version of '/bin/gunzip': Stale file handle
Errors were encountered while processing:
 /var/cache/apt/archives/gzip_1.10-0ubuntu4.1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

I don't believe the problem is sysbox related, but rather a problem in the nestybox/ubuntu-focal-systemd image. I know this because the problem does not show up if I run a sysbox container with the ubuntu:latest image on it.

Maybe it's simply a matter of regenerating the nestybox/ubuntu-focal-systemd image from it's dockerfile to get the latest apt version and stuff?

@bijeebuss
Copy link

yup that fixed it thank you :)

@SeanMXD
Copy link
Author

SeanMXD commented Dec 8, 2022

Hey @ctalledo and @bijeebuss! This is where I'm at so far. Sysbox service fails to start because of a dependency. Here is my terminal transcript below:

user@host:~$ egrep "ExecStart" /lib/systemd/system/sysbox-fs.service

ExecStart=/usr/bin/sysbox-fs

user@host:~$ sudo sed -i --follow-symlinks '/^ExecStart/ s/$/ --disable-shiftfs/' /lib/systemd/system/sysbox-fs.service
user@host:~$ egrep "ExecStart" /lib/systemd/system/sysbox-fs.service

ExecStart=/usr/bin/sysbox-fs --disable-shiftfs

I cut out the part where I ran commands to shut down all of the containers, but I did that right here. I accidentally did this after running the previous two commands instead of before.

user@host:~$ sudo systemctl daemon-reload
user@host:~$ sudo systemctl restart sysbox

A dependency job for sysbox.service failed. See 'journalctl -xe' for details.

user@host:~$ sudo systemctl status sysbox.service

○ sysbox.service - Sysbox container runtime
Loaded: loaded (/lib/systemd/system/sysbox.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Thu 2022-12-08 03:37:35 UTC; 8s ago
Docs: https://github.com/nestybox/sysbox
Process: 59648 ExecStart=/bin/sh -c /usr/bin/sysbox-runc --version && /usr/bin/sysbox-mgr --version && /usr/bin/sysbox-fs --version && /bin/sleep infinity (code=killed, signal=TERM)
Main PID: 59648 (code=killed, signal=TERM)
CPU: 43ms

Dec 02 04:22:07 host sh[59662]: edition: Community Edition (CE)
Dec 02 04:22:07 host sh[59662]: version: 0.5.0
Dec 02 04:22:07 host sh[59662]: commit: 95a773a6ea3920f7ab454f1583465c7aea4c701f
Dec 02 04:22:07 host sh[59662]: built at: Wed Mar 23 23:35:06 UTC 2022
Dec 02 04:22:07 host sh[59662]: built by: Rodny Molina
Dec 08 03:37:35 host systemd[1]: Stopping Sysbox container runtime...
Dec 08 03:37:35 host systemd[1]: sysbox.service: Deactivated successfully.
Dec 08 03:37:35 host systemd[1]: Stopped Sysbox container runtime.
Dec 08 03:37:35 host systemd[1]: Dependency failed for Sysbox container runtime.
Dec 08 03:37:35 host systemd[1]: sysbox.service: Job sysbox.service/start failed with result 'dependency'.

@ctalledo
Copy link
Member

ctalledo commented Dec 8, 2022

Hi @SeanMXD,

The --disable-shiftfs flag is passed to the sysbox-mgr service, not the sysbox-fs service. Please remove it from the sysbox-fs service and add it to the sysbox-mgr service, then do sudo systemctl daemon-releoad && sudo systemctl restart sysbox.

@SeanMXD
Copy link
Author

SeanMXD commented Dec 8, 2022

Hi @ctalledo,

That worked! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants