Skip to content

Commit

Permalink
runc run: fix mount leak
Browse files Browse the repository at this point in the history
When preparing to mount container root, we need to make its parent mount
private (i.e. no propagation), otherwise the new in-container mounts are
leaked to the host.

To find a parent mount, we use to read mountinfo and find the longest
entry which can be a parent of the container root directory.

Unfortunately, due to kernel bug in all Linux kernels older than v5.8
(see [1], [2]), sometimes mountinfo can't be read in its entirety. In
this case, getParentMount may occasionally return a wrong parent mount.

As a result, we do not change the mount propagation to private, and
container mounts are leaked.

Alas, we can not fix the kernel, and reading mountinfo a few times to
ensure its consistency (like it's done in, say, Kubernetes) does not
look like a good solution for performance reasons.

Fortunately, we don't need mountinfo to find a parent mount. Let's just
traverse the directory tree until we find a directory with a different
device ID.

We don't need to get mount options either, it's easier just to call
remount with the flags we need.

Fixes issue 2404.

[1]: https://github.com/kolyshkin/procfs-test
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f6c61f96f2d97cbb5f
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
  • Loading branch information
kolyshkin committed Sep 26, 2024
1 parent e1635d5 commit 1263560
Showing 1 changed file with 29 additions and 39 deletions.
68 changes: 29 additions & 39 deletions libcontainer/rootfs_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -982,54 +982,47 @@ func mknodDevice(dest string, node *devices.Device) error {
return os.Chown(dest, int(node.Uid), int(node.Gid))
}

// Get the parent mount point of directory passed in as argument. Also return
// optional fields.
func getParentMount(rootfs string) (string, string, error) {
mi, err := mountinfo.GetMounts(mountinfo.ParentsFilter(rootfs))
// Get the parent mount point of directory passed in as argument.
func getParentMount(rootfs string) (string, error) {
// Assuming rootfs is absolute and clean (this is checked in libcontainer/validate).
path := rootfs
var st unix.Stat_t
err := unix.Stat(path, &st)
if err != nil {
return "", "", err
}
if len(mi) < 1 {
return "", "", fmt.Errorf("could not find parent mount of %s", rootfs)
return "", &os.PathError{Op: "stat", Path: path, Err: err}
}
prevDev := st.Dev

for path != "/" {
parent := filepath.Dir(path)
err = unix.Stat(parent, &st)
if err != nil {
return "", &os.PathError{Op: "stat", Path: parent, Err: err}
}

// find the longest mount point
var idx, maxlen int
for i := range mi {
if len(mi[i].Mountpoint) > maxlen {
maxlen = len(mi[i].Mountpoint)
idx = i
if st.Dev != prevDev {
return path, nil
}

path = parent
prevDev = st.Dev
}
return mi[idx].Mountpoint, mi[idx].Optional, nil
return "/", nil
}

// Make parent mount private if it was shared
// rootfsParentMountPrivate ensures rootfs parent mount is private.
// This is needed for two reasons:
// - pivot_root() will fail if parent mount is shared;
// - when we bind mount rootfs, if its parent is not private, the new mount
// will propagate (leak!) to parent namespace and we don't want that.
func rootfsParentMountPrivate(rootfs string) error {
sharedMount := false

parentMount, optionalOpts, err := getParentMount(rootfs)
parentMount, err := getParentMount(rootfs)
if err != nil {
return err
}
logrus.Warnf("found parent mount: %s", parentMount)

optsSplit := strings.Split(optionalOpts, " ")
for _, opt := range optsSplit {
if strings.HasPrefix(opt, "shared:") {
sharedMount = true
break
}
}

// Make parent mount PRIVATE if it was shared. It is needed for two
// reasons. First of all pivot_root() will fail if parent mount is
// shared. Secondly when we bind mount rootfs it will propagate to
// parent namespace and we don't want that to happen.
if sharedMount {
return mount("", parentMount, "", unix.MS_PRIVATE, "")
}

return nil
return mount("", parentMount, "", unix.MS_PRIVATE, "")
}

func prepareRoot(config *configs.Config) error {
Expand All @@ -1041,9 +1034,6 @@ func prepareRoot(config *configs.Config) error {
return err
}

// Make parent mount private to make sure following bind mount does
// not propagate in other namespaces. Also it will help with kernel
// check pass in pivot_root. (IS_SHARED(new_mnt->mnt_parent))
if err := rootfsParentMountPrivate(config.Rootfs); err != nil {
return err
}
Expand Down

0 comments on commit 1263560

Please sign in to comment.