Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker --net=host fails running in sysbox #712

Closed
spikecurtis opened this issue Jun 20, 2023 · 10 comments
Closed

Docker --net=host fails running in sysbox #712

spikecurtis opened this issue Jun 20, 2023 · 10 comments
Labels
bug Something isn't working

Comments

@spikecurtis
Copy link

Starting with Docker v24.0, running --net=host fails if Docker is run inside a sysbox container.

I've raised a Docker issue, and bisected the problem to this Docker commit.

The outer Docker version doesn't seem to matter, and I've verified the problem on Sysbox CE 0.5.2 and 0.6.2.

The new version of Docker attempts to bind-mount /proc/self/task/<tid>/ns/net and this fails with "permission denied". Curiously, if I modify the Docker code to bind-mount /proc/thread-self/ns/net this succeeds. However, this is not a viable solution for Docker, because /proc/thread-self is unsupported on older kernels that Docker still supports.

Below is a minimal Go program that can test bind-mounting in the way Docker does:

package main

import (
	"runtime"
	"fmt"
	"syscall"
	"os"
)

func main() {
	runtime.LockOSThread()
	fmt.Printf("uid: %d\n", syscall.Getuid())
	basePath := "/proc/self/task/%d/ns/net", syscall.Gettid())
	fmt.Println(basePath)
	lnPath := "/var/namespace/test"
	f, err := os.Create(lnPath)
	if err != nil {
		fmt.Printf("failed to create %s: %s\n", lnPath, err)
		os.Exit(1)
	}
	f.Close()
	err = syscall.Mount(basePath, lnPath, "bind", syscall.MS_BIND, "")
	if err != nil {
		fmt.Printf("failed to bind-mount %s: %s\n", basePath, err)
		os.Exit(1)
	}
	runtime.UnlockOSThread()
}

When I run it as root inside a sysbox 0.6.2 container, it fails with "permission denied." If I switch basePath to be /proc/thread-self/ns/net it succeeds.

It also succeeds if I run it in a combination of user and mount namespaces, e.g. unshare --user --mount --map-root-user ./bmount. Using unshare both the /proc/self/task/<tid> and /proc/thread-self variants succeed. However, if I unshare with only a user namespace (no mount namespace), both variants fail with permission denied.

Sysbox is the only environment I've found where /proc/thread-self succeeds, and /proc/self/task/<tid> fails. So, something more subtle than just user namespace isolation is going on.

@rodnymolina
Copy link
Member

@spikecurtis, thanks for the detailed description, that helps.

At first glance, I don't have an explanation for what you're seeing, especially since Sysbox is doing nothing with these bind-mounts. Right, Sysbox traps mount() syscalls through seccomp-bpf but it only handles the /proc/sys hierarchy (as well as certain portions of /sys). For all other paths, Sysbox lets the kernel be the one handling the mount syscalls.

Need to think about this one in more detail.

@rodnymolina rodnymolina added the bug Something isn't working label Jun 22, 2023
@nudgegoonies
Copy link

I can confirm for Sysbox 0.6.2, Docker 24.0.5 on Debian 11. Docker daemon on our gitlab-runner hosts and docker client within our build images is 24.0.5. Our temporary workaround is using Docker 23.0.6 for the dind rather than 24.0.5.

@anshulpatel25
Copy link

I can also confirm Sysbox 0.6.2, Docker 24.0.X on Ubuntu 20.04.
We use sysbox for creating github runner pods using the Actions runner controller. We are also sticking on temporary workaround of using Docker 23.0.

@cruizba
Copy link

cruizba commented Oct 25, 2023

I am having the same problem, I've posted here a way to replicate and bump the issue at moby repo (moby/moby#45681)

What is clear is that version 24 does not work with net=host in sysbox containers and DinD

@ctalledo
Copy link
Member

Hi @spikecurtis, thank you again for reporting this issue and your effort in helping root cause it.

The problem was a bug in the way Sysbox was intercepting mount system calls when the mount path has the form /proc/self/task/<tid>/....

I've fixed it via this commit, so the fix will be present in the upcoming v0.6.3 release (1->2 weeks).

Closing this ticket now.

@vaibhav-shah
Copy link

hey @ctalledo! much thanks for fixing this issue. When can we expect v0.6.3 to be released?

@ctalledo
Copy link
Member

ctalledo commented Jan 2, 2024

Hi @vaibhav-shah,

When can we expect v0.6.3 to be released?

Should be ready before the weekend, unless we hit an unforeseen problem.

@DekusDenial
Copy link
Contributor

Hi @vaibhav-shah,

When can we expect v0.6.3 to be released?

Should be ready before the weekend, unless we hit an unforeseen problem.

@ctalledo will the release cover the fix for this issue?

@ctalledo
Copy link
Member

ctalledo commented Jan 8, 2024

Hi @DekusDenial,

will the release cover the fix for this issue?

Let's discuss in that issue please to avoid polluting this one.

@ctalledo
Copy link
Member

Hi @vaibhav-shah,

When can we expect v0.6.3 to be released?

Sysbox v0.6.3 has been released; let us know if you hit any issues please. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants