-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libcontainer: fix a bug when setting shared rootfs propagation mode #1815
base: main
Are you sure you want to change the base?
Conversation
@rhvgoyal PTAL |
Can we have integration test (in this repo)? |
da16461
to
a61334b
Compare
@dongsupark Are you still interested in this PR? |
@lifubang Thanks for the ping. |
Please see #3948 |
The root directory mount propagation was set to [root@fedora38 runc]# podman info --format {{.Host.OCIRuntime.Version}}
runc version 1.1.0+dev
commit: v1.1.0-647-ga5777e87-dirty
spec: 1.1.0-rc.3
go: go1.20.5
libseccomp: 2.5.3
[root@fedora38 runc]# mkdir ~/hoge && mount -t tmpfs tmpfs ~/hoge
[root@fedora38 runc]# mount | grep "/hoge "
tmpfs on /root/hoge type tmpfs (rw,relatime,seclabel,inode64)
[root@fedora38 runc]# podman run -itd --privileged --name testcon --volume ~/hoge:/hoge:z,shared fedora-minimal:34 /bin/bash
1cb7167f2d47f18440c9f1b792a49bbc5c2f52def79dc9c746c9536a6e24f229
[root@fedora38 runc]# cat /var/lib/containers/storage/overlay-containers/1cb7167f2d47f18440c9f1b792a49bbc5c2f52def79dc9c746c9536a6e24f229/userdata/config.json | jq .linux.rootfsPropagation
"shared"
[root@fedora38 runc]# podman exec testcon mkdir /test
[root@fedora38 runc]# podman exec testcon mount -t tmpfs tmpfs /test
[root@fedora38 runc]# podman exec testcon findmnt -o "TARGET,PROPAGATION"
TARGET PROPAGATION
/ shared
|-/test shared
|-/sys private
| `-/sys/fs/cgroup private
|-/proc private
|-/dev private
| |-/dev/console private
| |-/dev/mqueue private
| |-/dev/shm private
| `-/dev/pts private
|-/hoge shared
|-/run/.containerenv private
|-/etc/hosts private
|-/run/secrets private
|-/etc/hostname private
`-/etc/resolv.conf private In other cases, however, the mount propagation was set differently from crun. For example, the following Since the Steps to reproduce:
[test@fedora38 ~]$ podman info --format {{.Host.OCIRuntime.Version}}
runc version 1.1.0+dev
commit: v1.1.0-647-ga5777e87-dirty
spec: 1.1.0-rc.3
go: go1.20.5
libseccomp: 2.5.3
[test@fedora38 ~]$ mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
[test@fedora38 ~]$ mkdir ~/{vol1,vol2}
[test@fedora38 ~]$ podman --storage-driver vfs --root /tmp/hoge run -itd --name testcon --volume ~/vol1:/myvol1:z,shared --volume ~/vol2:/myvol2:z fedora-minimal:34 /bin/bash
... snip ...
f4c1e806d6d475aec845ee0b5e29df5e36badeec00749b8d96be931ce43699a6
[test@fedora38 ~]$ cat /tmp/hoge/vfs-containers/f4c1e806d6d475aec845ee0b5e29df5e36badeec00749b8d96be931ce43699a6/userdata/config.json | jq .linux.rootfsPropagation
"shared"
[test@fedora38 ~]$ podman --root /tmp/hoge --storage-driver vfs exec testcon findmnt -o "TARGET,PROPAGATION" | grep -e "\/ " -e "\/myvol1 " -e "\/myvol2 "
/ shared,slave
|-/myvol1 shared,slave
|-/myvol2 private
[test@fedora38 ~]$ podman info --format {{.Host.OCIRuntime.Version}}
crun version 1.8.5
commit: b6f80f766c9a89eb7b1440c0a70ab287434b17ed
rundir: /run/user/1000/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
[test@fedora38 ~]$ mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
[test@fedora38 ~]$ mkdir ~/{vol1,vol2}
[test@fedora38 ~]$ podman --storage-driver vfs --root /tmp/hoge run -itd --name testcon --volume ~/vol1:/myvol1:z,shared --volume ~/vol2:/myvol2:z fedora-minimal:34 /bin/bash
... snip ...
1d73a122ae6e4f7fe455b88677e91b7e20fd421ff141a0cb620b08f3d5cd2f48
[test@fedora38 ~]$ cat /tmp/hoge/vfs-containers/1d73a122ae6e4f7fe455b88677e91b7e20fd421ff141a0cb620b08f3d5cd2f48/userdata/config.json | jq .linux.rootfsPropagation
"shared"
[test@fedora38 ~]$ podman --root /tmp/hoge --storage-driver vfs exec testcon findmnt -o "TARGET,PROPAGATION" | grep -e "\/ " -e "\/myvol1 " -e "\/myvol2 "
/ shared
|-/myvol1 shared,slave
|-/myvol2 private |
a61334b
to
7d00e0b
Compare
Rebased the PR, addressed review comments, and added an integration test.
Yes, I am seeing that as well. I could not figure it out. |
So far when the input mount flags contain `MS_SHARED`, the flag has not been applied to the container rootfs. That's because we call `rootfsParentMountPrivate()` after applying the original mount flags. As a result, the original flags are overwritten. Though it's also true that we actually need to mount the container rootfs with `MS_PRIVATE`, to avoid failure from `pivot_root()` in the Linux kernel. Thus if the mount flags contain `MS_SHARED`, we need a special case handling. First do `pivotRoot()` (or `msMoveRoot`, `chroot`) with the rootfs with a mount flag `MS_PRIVATE`. Then after `pivotRoot()`, again mount the rootfs with `MS_SHARED`. With this fix, `validation/linux_rootfs_propagation.t` of runtime-tools works well with the shared mode finally. Fixes opencontainers#1755 Signed-off-by: Dongsu Park <dpark@linux.microsoft.com>
Signed-off-by: Dongsu Park <dpark@linux.microsoft.com>
7d00e0b
to
86727b5
Compare
I confirmed that the mount propagation also contains [test@fedora38 ~]$ podman info --format {{.Host.OCIRuntime.Version}}
runc version 1.1.0+dev
commit: v1.1.0-656-g14c7ab7f
spec: 1.1.0
go: go1.20.5
libseccomp: 2.5.3
[test@fedora38 ~]$ mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
[test@fedora38 ~]$ mkdir ~/{vol1,vol2}
[test@fedora38 ~]$ podman --storage-driver vfs --root /tmp/hoge run -itd --name testcon --volume ~/vol1:/myvol1:z,shared --volume ~/vol2:/myvol2:z fedora-minimal:34 /bin/bash
... snip ...
15fdd0d1fce19b8755254cf913eacb8d7355fe6a9739db219c063964a47a711d
[test@fedora38 ~]$ cat /tmp/hoge/vfs-containers/15fdd0d1fce19b8755254cf913eacb8d7355fe6a9739db219c063964a47a711d/userdata/config.json | jq .linux.rootfsPropagation
"shared"
[test@fedora38 ~]$ podman --root /tmp/hoge --storage-driver vfs exec testcon findmnt -o "TARGET,PROPAGATION" | grep -e "\/ " -e "\/myvol1 " -e "\/myvol2 "
/ private,slave
|-/myvol1 shared,slave
|-/myvol2 private |
So far when the input mount flags contain
MS_SHARED
, the flag has not been applied to the container rootfs. That's because we callrootfsParentMountPrivate()
after applying the original mount flags. Though we actually need to mount the container rootfs withMS_PRIVATE
, to avoid failure frompivot_root()
in the Linux kernel.Thus if the mount flags contain
MS_SHARED
, we need a special case handling. First dopivotRoot()
(ormsMoveRoot
,chroot
) with the rootfs with a mount flagMS_PRIVATE
. Then afterpivotRoot()
, again mount the rootfs withMS_SHARED
.With this fix,
validation/linux_rootfs_propagation.t
of runtime-tools works well with the shared mode finally.Fixes #1755
/cc @alban
Signed-off-by: Dongsu Park dongsu@kinvolk.io