Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network_mode regression in sysbox 0.5 #518

Closed
dictcp opened this issue Mar 24, 2022 · 7 comments
Closed

network_mode regression in sysbox 0.5 #518

dictcp opened this issue Mar 24, 2022 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@dictcp
Copy link

dictcp commented Mar 24, 2022

Summary

the network_mode not working in recent sysbox 0.5. It works in sysbox 0.4

docker-compose.yaml

version: "3.3"

services:
  sysbox:
    image: busybox
    runtime: "sysbox-runc"
    network_mode: "service:vpn"
  vpn:
    image: gcr.io/google_containers/pause-amd64:3.1

Execution error:

dick@lima-ubuntu-lts:/home/dick/test$ docker compose up -d
[+] Running 4/4
 ⠿ vpn Pulled                                                                                                              3.2s
   ⠿ 67ddbfb20a22 Pull complete                                                                                                1.4s
 ⠿ sysbox Pulled                                                                                                               8.2s
   ⠿ 554879bb3004 Pull complete                                                                                                2.1s
[+] Running 2/3
 ⠿ Network test_default      Created                                                                                           0.3s
 ⠿ Container test-vpn-1  Started                                                                                           2.5s
 ⠿ Container test-sysbox-1   Starting                                                                                          3.0s
Error response from daemon: failed to create shim task: OCI runtime create failed: container_linux.go:419: starting container process caused: process_linux.go:607: container init caused: rootfs_linux.go:67: setting up rootfs mounts caused: rootfs_linux.go:1122: mounting "sysfs" to rootfs "/var/lib/sysbox/shiftfs/3068c2af-4567-450c-a390-dcb76ecb436b" at "sys" caused: operation not permitted: unknown

Environment:

dick@lima-ubuntu-lts:/home/dick/test$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal
dick@lima-ubuntu-lts:/home/dick/test$ uname -a
Linux lima-ubuntu-lts 5.4.0-100-generic #113-Ubuntu SMP Thu Feb 3 18:43:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
dick@lima-ubuntu-lts:/home/dick/test$ lsmod | grep shiftfs
shiftfs                28672  0
dick@lima-ubuntu-lts:/home/dick/test$ dpkg -l | grep sysbox
ii  sysbox-ce                       0.5.0-0.linux                         amd64        Sysbox Community Edition (CE) is a next-generation container runtime,
dick@lima-ubuntu-lts:/home/dick/test$ docker ^C
dick@lima-ubuntu-lts:/home/dick/test$ dpkg -l | grep docker
ii  docker-ce                       5:20.10.14~3-0~ubuntu-focal           amd64        Docker: the open-source application container engine
ii  docker-ce-cli                   5:20.10.14~3-0~ubuntu-focal           amd64        Docker CLI: the open-source application container engine
ii  docker-ce-rootless-extras       5:20.10.14~3-0~ubuntu-focal           amd64        Rootless support for Docker.
ii  docker-scan-plugin              0.17.0~ubuntu-focal                   amd64        Docker scan cli plugin.
  • userns-remap not enabled
@rodnymolina
Copy link
Member

@dictcp, thanks for reporting this. Will take a look at it asap.

In the meantime, to simplify your setup as much as possible, can you please try to launch a regular docker container with a custom network? This works in my setup (Ubuntu-Focal & Sysbox v0.5.0) ...

$ docker network create test-network

$ docker run --runtime=sysbox-runc -it --rm --network test-network ghcr.io/nestybox/ubuntu-focal-systemd-docker:latest

Also, please provide the output of the findmnt command in this lima-ubuntu-lts node?

PS: ARM64 is finally supported in v0.5.0 -- I know, it took longer than anticipated last time we chatted :-P. Hope that's still relevant for you.

@rodnymolina rodnymolina self-assigned this Mar 24, 2022
@rodnymolina rodnymolina added the bug Something isn't working label Mar 24, 2022
@dictcp
Copy link
Author

dictcp commented Mar 24, 2022

Yes. Reproduce similar issue for given command.

update: seems like it is output of #439

dick@lima-test2:/home/dick$ docker network create test-network
f4155219e4393829baed1fb7798b0daaca6dfe24d44a89b02b5c6fed445b4695
dick@lima-test2:/home/dick$ docker run --runtime=sysbox-runc -it --rm --network test-network ghcr.io/nestybox/ubuntu-focal-systemd-docker:latest
Unable to find image 'ghcr.io/nestybox/ubuntu-focal-systemd-docker:latest' locally
latest: Pulling from nestybox/ubuntu-focal-systemd-docker
16ec32c2132b: Pull complete
47f6552e996c: Pull complete
ae9c913e6365: Pull complete
eed07a46b4e7: Pull complete
590a17d7d0da: Pull complete
4ee65cfaf6b8: Pull complete
825fa1db6f86: Pull complete
Digest: sha256:be60054d2afd43ecdefbdd5826969191d363561e1d2e5ecf15d73ac4e5188976
Status: Downloaded newer image for ghcr.io/nestybox/ubuntu-focal-systemd-docker:latest
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: container_linux.go:419: starting container process caused: process_linux.go:607: container init caused: rootfs_linux.go:67: setting up rootfs mounts caused: open sys/devices/virtual/dmi/id/product_uuid: permission denied: unknown.

result findmnt:

/                                     /dev/vda1   ext4       rw,relatime
├─/sys                                sysfs       sysfs      rw,nosuid,nodev,noexec,relatime
│ ├─/sys/kernel/security              securityfs  securityfs rw,nosuid,nodev,noexec,relatime
│ ├─/sys/fs/cgroup                    tmpfs       tmpfs      ro,nosuid,nodev,noexec,mode=755
│ │ ├─/sys/fs/cgroup/unified          cgroup2     cgroup2    rw,nosuid,nodev,noexec,relatime,nsdelegate
│ │ ├─/sys/fs/cgroup/systemd          cgroup      cgroup     rw,nosuid,nodev,noexec,relatime,xattr,name=systemd
│ │ ├─/sys/fs/cgroup/memory           cgroup      cgroup     rw,nosuid,nodev,noexec,relatime,memory
│ │ ├─/sys/fs/cgroup/hugetlb          cgroup      cgroup     rw,nosuid,nodev,noexec,relatime,hugetlb
│ │ ├─/sys/fs/cgroup/net_cls,net_prio cgroup      cgroup     rw,nosuid,nodev,noexec,relatime,net_cls,net_prio
│ │ ├─/sys/fs/cgroup/devices          cgroup      cgroup     rw,nosuid,nodev,noexec,relatime,devices
│ │ ├─/sys/fs/cgroup/cpu,cpuacct      cgroup      cgroup     rw,nosuid,nodev,noexec,relatime,cpu,cpuacct
│ │ ├─/sys/fs/cgroup/freezer          cgroup      cgroup     rw,nosuid,nodev,noexec,relatime,freezer
│ │ ├─/sys/fs/cgroup/blkio            cgroup      cgroup     rw,nosuid,nodev,noexec,relatime,blkio
│ │ ├─/sys/fs/cgroup/cpuset           cgroup      cgroup     rw,nosuid,nodev,noexec,relatime,cpuset
│ │ ├─/sys/fs/cgroup/perf_event       cgroup      cgroup     rw,nosuid,nodev,noexec,relatime,perf_event
│ │ ├─/sys/fs/cgroup/rdma             cgroup      cgroup     rw,nosuid,nodev,noexec,relatime,rdma
│ │ └─/sys/fs/cgroup/pids             cgroup      cgroup     rw,nosuid,nodev,noexec,relatime,pids
│ ├─/sys/fs/pstore                    pstore      pstore     rw,nosuid,nodev,noexec,relatime
│ ├─/sys/firmware/efi/efivars         efivarfs    efivarfs   rw,nosuid,nodev,noexec,relatime
│ ├─/sys/fs/bpf                       none        bpf        rw,nosuid,nodev,noexec,relatime,mode=700
│ ├─/sys/kernel/debug                 debugfs     debugfs    rw,nosuid,nodev,noexec,relatime
│ ├─/sys/kernel/tracing               tracefs     tracefs    rw,nosuid,nodev,noexec,relatime
│ ├─/sys/fs/fuse/connections          fusectl     fusectl    rw,nosuid,nodev,noexec,relatime
│ └─/sys/kernel/config                configfs    configfs   rw,nosuid,nodev,noexec,relatime
├─/proc                               proc        proc       rw,nosuid,nodev,noexec,relatime
│ └─/proc/sys/fs/binfmt_misc          systemd-1   autofs     rw,relatime,fd=28,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_in
├─/dev                                udev        devtmpfs   rw,nosuid,noexec,relatime,size=1992484k,nr_inodes=498121,mode=755
│ ├─/dev/pts                          devpts      devpts     rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000
│ ├─/dev/shm                          tmpfs       tmpfs      rw,nosuid,nodev
│ ├─/dev/hugepages                    hugetlbfs   hugetlbfs  rw,relatime,pagesize=2M
│ └─/dev/mqueue                       mqueue      mqueue     rw,nosuid,nodev,noexec,relatime
├─/run                                tmpfs       tmpfs      rw,nosuid,nodev,noexec,relatime,size=401976k,mode=755
│ ├─/run/lock                         tmpfs       tmpfs      rw,nosuid,nodev,noexec,relatime,size=5120k
│ ├─/run/snapd/ns                     tmpfs[/snapd/ns]
│ │                                               tmpfs      rw,nosuid,nodev,noexec,relatime,size=401976k,mode=755
│ │ └─/run/snapd/ns/lxd.mnt           nsfs[mnt:[4026532277]]
│ │                                               nsfs       rw
│ └─/run/user/1000                    tmpfs       tmpfs      rw,nosuid,nodev,relatime,size=401972k,mode=700,uid=1000,gid=1000
├─/snap/core20/1361                   /dev/loop0  squashfs   ro,nodev,relatime
├─/snap/snapd/14978                   /dev/loop1  squashfs   ro,nodev,relatime
├─/snap/lxd/22526                     /dev/loop2  squashfs   ro,nodev,relatime
├─/boot/efi                           /dev/vda15  vfat       rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shor
├─/home/dick                          :/home/dick fuse.sshfs ro,nosuid,nodev,relatime,user_id=1000,group_id=1000,allow_other
├─/mnt/lima-cidata                    /dev/sr0    iso9660    ro,relatime,nojoliet,overriderockperm,check=s,map=n,blocksize=2048,uid=
└─/tmp/lima                           :/tmp/lima  fuse.sshfs rw,nosuid,nodev,relatime,user_id=1000,group_id=1000,allow_other

@dictcp
Copy link
Author

dictcp commented Mar 24, 2022

Re-run after workaround on #439:

$ docker network create test-network
$ docker run --runtime=sysbox-runc -it --rm --network test-network ghcr.io/nestybox/ubuntu-focal-systemd-docker:latest

the suggested command works for me as well. No error.


I further convert those compose to docker command:

$ docker run -d --name vpn gcr.io/google_containers/pause-amd64:3.1
$ docker run --runtime=sysbox-runc -it --rm --network "container:vpn" busybox
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: container_linux.go:419: starting container process caused: process_linux.go:607: container init caused: rootfs_linux.go:67: setting up rootfs mounts caused: rootfs_linux.go:1122: mounting "sysfs" to rootfs "/var/lib/sysbox/shiftfs/e2607695-0ef5-4255-8478-cbd9da243dc8" at "sys" caused: operation not permitted: unknown.

and either removing runtime OR network parameters from 2nd docker run will make it work.

Seems like when the 2nd container reuse 1st container's network namespace, then error comes.

@dictcp
Copy link
Author

dictcp commented Mar 24, 2022

and its findmnt, after 2 docker run commands

dick@lima-sysbox-1:~/test$ findmnt
TARGET                                SOURCE           FSTYPE     OPTIONS
/                                     /dev/vda1        ext4       rw,relatime
├─/sys                                sysfs            sysfs      rw,nosuid,nodev,noexec,relatime
│ ├─/sys/kernel/security              securityfs       securityfs rw,nosuid,nodev,noexec,relatime
│ ├─/sys/fs/cgroup                    tmpfs            tmpfs      ro,nosuid,nodev,noexec,mode=755
│ │ ├─/sys/fs/cgroup/unified          cgroup2          cgroup2    rw,nosuid,nodev,noexec,relatime,nsdelegate
│ │ ├─/sys/fs/cgroup/systemd          cgroup           cgroup     rw,nosuid,nodev,noexec,relatime,xattr,name=systemd
│ │ ├─/sys/fs/cgroup/cpuset           cgroup           cgroup     rw,nosuid,nodev,noexec,relatime,cpuset
│ │ ├─/sys/fs/cgroup/hugetlb          cgroup           cgroup     rw,nosuid,nodev,noexec,relatime,hugetlb
│ │ ├─/sys/fs/cgroup/cpu,cpuacct      cgroup           cgroup     rw,nosuid,nodev,noexec,relatime,cpu,cpuacct
│ │ ├─/sys/fs/cgroup/devices          cgroup           cgroup     rw,nosuid,nodev,noexec,relatime,devices
│ │ ├─/sys/fs/cgroup/rdma             cgroup           cgroup     rw,nosuid,nodev,noexec,relatime,rdma
│ │ ├─/sys/fs/cgroup/net_cls,net_prio cgroup           cgroup     rw,nosuid,nodev,noexec,relatime,net_cls,net_prio
│ │ ├─/sys/fs/cgroup/pids             cgroup           cgroup     rw,nosuid,nodev,noexec,relatime,pids
│ │ ├─/sys/fs/cgroup/memory           cgroup           cgroup     rw,nosuid,nodev,noexec,relatime,memory
│ │ ├─/sys/fs/cgroup/blkio            cgroup           cgroup     rw,nosuid,nodev,noexec,relatime,blkio
│ │ ├─/sys/fs/cgroup/freezer          cgroup           cgroup     rw,nosuid,nodev,noexec,relatime,freezer
│ │ └─/sys/fs/cgroup/perf_event       cgroup           cgroup     rw,nosuid,nodev,noexec,relatime,perf_event
│ ├─/sys/fs/pstore                    pstore           pstore     rw,nosuid,nodev,noexec,relatime
│ ├─/sys/firmware/efi/efivars         efivarfs         efivarfs   rw,nosuid,nodev,noexec,relatime
│ ├─/sys/fs/bpf                       none             bpf        rw,nosuid,nodev,noexec,relatime,mode=700
│ ├─/sys/kernel/debug                 debugfs          debugfs    rw,nosuid,nodev,noexec,relatime
│ ├─/sys/kernel/tracing               tracefs          tracefs    rw,nosuid,nodev,noexec,relatime
│ ├─/sys/fs/fuse/connections          fusectl          fusectl    rw,nosuid,nodev,noexec,relatime
│ └─/sys/kernel/config                configfs         configfs   rw,nosuid,nodev,noexec,relatime
├─/proc                               proc             proc       rw,nosuid,nodev,noexec,relatime
│ └─/proc/sys/fs/binfmt_misc          systemd-1        autofs     rw,relatime,fd=28,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pi
├─/dev                                udev             devtmpfs   rw,nosuid,noexec,relatime,size=1992304k,nr_inodes=498076,mode=755
│ ├─/dev/pts                          devpts           devpts     rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000
│ ├─/dev/shm                          tmpfs            tmpfs      rw,nosuid,nodev
│ ├─/dev/hugepages                    hugetlbfs        hugetlbfs  rw,relatime,pagesize=2M
│ └─/dev/mqueue                       mqueue           mqueue     rw,nosuid,nodev,noexec,relatime
├─/run                                tmpfs            tmpfs      rw,nosuid,nodev,noexec,relatime,size=401940k,mode=755
│ ├─/run/lock                         tmpfs            tmpfs      rw,nosuid,nodev,noexec,relatime,size=5120k
│ ├─/run/snapd/ns                     tmpfs[/snapd/ns] tmpfs      rw,nosuid,nodev,noexec,relatime,size=401940k,mode=755
│ │ └─/run/snapd/ns/lxd.mnt           nsfs[mnt:[4026532332]]
│ │                                                    nsfs       rw
│ ├─/run/user/503                     tmpfs            tmpfs      rw,nosuid,nodev,relatime,size=401936k,mode=700,uid=503,gid=1000
│ ├─/run/docker/netns/736f7f9c273f    nsfs[net:[4026532354]]
│ │                                                    nsfs       rw
│ ├─/run/docker/netns/b528ea859f38    nsfs[net:[4026532421]]
│ │                                                    nsfs       rw
│ └─/run/docker/netns/9beea0640cfc    nsfs[net:[4026532485]]
│                                                      nsfs       rw
├─/snap/lxd/22526                     /dev/loop1       squashfs   ro,nodev,relatime
├─/snap/core20/1361                   /dev/loop0       squashfs   ro,nodev,relatime
├─/snap/snapd/14978                   /dev/loop2       squashfs   ro,nodev,relatime
├─/boot/efi                           /dev/vda15       vfat       rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1
├─/var/lib/docker/overlay2/6fefcdfed6dc516190f9f687e8d8e942092dd46ef915d382ee68a53421222e10/merged
│                                     overlay          overlay    rw,relatime,lowerdir=/var/lib/docker/overlay2/l/DWEEJDBJZL6UXKGR7B
├─/var/lib/docker/overlay2/a4e50bb0220dbcade5ea76a43ead3c720c8f9bf65a0a2bebd2202eedbe89a3e7/merged
│                                     overlay          overlay    rw,relatime,lowerdir=/var/lib/docker/overlay2/l/OAZJRJGPITEJUZ6QFK
├─/mnt/lima-cidata                    /dev/sr0         iso9660    ro,relatime,nojoliet,overriderockperm,check=s,map=n,blocksize=2048
└─/var/lib/docker/overlay2/c825bd5a5ad70ef3513dd230ac5f45dfbd4799a8d9e36b91f7a3040f5d4f61dc/merged
                                      overlay          overlay    rw,relatime,lowerdir=/var/lib/docker/overlay2/l/PVFSFYPK6K4JDYVZOU

@ctalledo
Copy link
Member

Hi @dictcp ,

Thanks for clear explanation. The problem is that in the following sequence:

$ docker run -d --name vpn gcr.io/google_containers/pause-amd64:3.1
$ docker run --runtime=sysbox-runc -it --rm --network "container:vpn" busybox

The first container is not launched with Sysbox; in order for this to work, both containers must be launched with Sysbox:

$ docker run --runtime=sysbox-runc  -d --name vpn gcr.io/google_containers/pause-amd64:3.1
6ce182814fc38ddcb64982a86d999ab126f827df3793ef9eed1b5a2b1fc7c9dc

$ docker run --runtime=sysbox-runc -it --rm --network "container:vpn" busybox
/ #

The reason for this is that Sysbox containers always use the Linux user namespace for extra isolation (i.e., root in container = unprivileged user in host). The network namespace is a logically a "child" of the user namespace. Thus, in order for two containers to share a network (i.e., which requires sharing a network namespace), both containers must be in the same user-namespace too.

If you launch the first container without Sysbox, then this won't be the case as that first container won't use the Linux user-namespace by default (unless Docker is configured in userns-remap mode).

If you launch the first container with Sysbox, then Sysbox creates a user-namespace and network namespace for them. Then when the second container is launched, Sysbox realizes that the containers need to share the network, so it places that second container in the same user-namespace as the first one and then in the same network namespace too.

I don't know how this worked before though, given that what I've described above is a Linux kernel requirement regarding user-namespace and network namespaces.

Hope this helps.

@dictcp
Copy link
Author

dictcp commented Mar 24, 2022

@ctalledo
Thanks for the detailed explanation!! (Properly I need to re-read it to make sure I understand)

I have just double checked, the 0.4.1 installation was in userns-remap mode, per document suggested. and why those command works for me before.
And I did not enable userns-remap mode in 0.5.0 installation (since the installation doc does not suggest it anymore) and why it fails.

So seems it is my configuration issues.

Let's close it first since it seems to be an expected behaviour.

@dictcp dictcp closed this as completed Mar 24, 2022
@ctalledo
Copy link
Member

Thank you @dictcp, that makes sense. Thanks again for using Sysbox!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants