Skip to content

Commit

Permalink
tree-wide: use /proc/thread-self for thread-local state
Browse files Browse the repository at this point in the history
With the idmap work, we will have a tainted Go thread in our
thread-group that has a different mount namespace to the other threads.
It seems that (due to some bad luck) the Go scheduler tends to make this
thread the thread-group leader in our tests, which results in very
baffling failures where /proc/self/mountinfo produces gibberish results.

In order to avoid this, switch to using /proc/thread-self for everything
that is thread-local. This primarily includes switching all file
descriptor paths (CLONE_FS), all of the places that check the current
cgroup (technically we never will run a single runc thread in a separate
cgroup, but better to be safe than sorry), and the aforementioned
mountinfo code. We don't need to do anything for the following because
the results we need aren't thread-local:

 * Checks that certain namespaces are supported by stat(2)ing
   /proc/self/ns/...

 * /proc/self/exe and /proc/self/cmdline are not thread-local.

 * While threads can be in different cgroups, we do not do this for the
   runc binary (or libcontainer) and thus we do not need to switch to
   the thread-local version of /proc/self/cgroups.

 * All of the CLONE_NEWUSER files are not thread-local because you
   cannot set the usernamespace of a single thread (setns(CLONE_NEWUSER)
   is blocked for multi-threaded programs).

Note that we have to use runtime.LockOSThread when we have an open
handle to a tid-specific procfs file that we are operating on multiple
times. Go can reschedule us such that we are running on a different
thread and then kill the original thread (causing -ENOENT or similarly
confusing errors). This is not necessary for most usages of
/proc/thread-self (such as using /proc/thread-self/fd/$n directly) but
operating on the actual inodes associated with the tid requires this
locking.

In addition, CentOS's kernel is too old for /proc/thread-self, which
requires us to emulate it -- however in rootfs_linux.go, we are in the
container pid namespace but /proc is the host's procfs. This leads to
the incredibly frustrating situation where there is no way (on pre-4.1
Linux) to figure out which /proc/self/task/... entry refers to the
current tid. We can just use /proc/self in this case.

Yes this is all pretty ugly. I also wish it wasn't necessary.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
  • Loading branch information
cyphar committed Aug 24, 2023
1 parent 3a518da commit be892f0
Show file tree
Hide file tree
Showing 19 changed files with 290 additions and 108 deletions.
16 changes: 12 additions & 4 deletions libcontainer/apparmor/apparmor_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"errors"
"fmt"
"os"
"runtime"
"sync"

"github.com/opencontainers/runc/libcontainer/utils"
Expand All @@ -26,12 +27,19 @@ func isEnabled() bool {
}

func setProcAttr(attr, value string) error {
// Under AppArmor you can only change your own attr, so use /proc/self/
// instead of /proc/<tid>/ like libapparmor does
attrPath := "/proc/self/attr/apparmor/" + attr
// We need to lock ourselves to the current thread, to make sure that this
// thread stays alive while we are using /proc/thread-self. Otherwise, Go
// could switch our running thread and kill the original thread underneath
// us at any point where we might yield to other goroutines.
runtime.LockOSThread()
defer runtime.UnlockOSThread()

// Under AppArmor you can only change your own attr, so use
// /proc/thread-self/ instead of /proc/<tid>/ like libapparmor does.
attrPath := utils.ProcThreadSelf("attr/apparmor/" + attr)
if _, err := os.Stat(attrPath); errors.Is(err, os.ErrNotExist) {
// fall back to the old convention
attrPath = "/proc/self/attr/" + attr
attrPath = utils.ProcThreadSelf("attr/" + attr)
}

f, err := os.OpenFile(attrPath, os.O_WRONLY, 0)
Expand Down
2 changes: 2 additions & 0 deletions libcontainer/cgroups/cgroups_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ import (
)

func TestParseCgroups(t *testing.T) {
// We don't need to use /proc/thread-self here because runc always runs
// with every thread in the same cgroup.
cgroups, err := ParseCgroupFile("/proc/self/cgroup")
if err != nil {
t.Fatal(err)
Expand Down
4 changes: 3 additions & 1 deletion libcontainer/cgroups/file.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ import (
"strings"
"sync"

"github.com/opencontainers/runc/libcontainer/utils"

"github.com/sirupsen/logrus"
"golang.org/x/sys/unix"
)
Expand Down Expand Up @@ -147,7 +149,7 @@ func openFile(dir, file string, flags int) (*os.File, error) {
// TODO: if such usage will ever be common, amend this
// to reopen cgroupFd and retry openat2.
fdStr := strconv.Itoa(cgroupFd)
fdDest, _ := os.Readlink("/proc/self/fd/" + fdStr)
fdDest, _ := os.Readlink(utils.ProcThreadSelf("fd/" + fdStr))
if fdDest != cgroupfsDir {
// Wrap the error so it is clear that cgroupFd
// is opened to an unexpected/wrong directory.
Expand Down
2 changes: 2 additions & 0 deletions libcontainer/cgroups/fs2/defaultpath.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ func _defaultDirPath(root, cgPath, cgParent, cgName string) (string, error) {
return filepath.Join(root, innerPath), nil
}

// we don't need to use /proc/thread-self here because runc always runs
// with every thread in the same cgroup.
ownCgroup, err := parseCgroupFile("/proc/self/cgroup")
if err != nil {
return "", err
Expand Down
33 changes: 31 additions & 2 deletions libcontainer/cgroups/v1_utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,13 @@ import (
"fmt"
"os"
"path/filepath"
"runtime"
"strings"
"sync"
"syscall"

"github.com/opencontainers/runc/libcontainer/utils"

securejoin "github.com/cyphar/filepath-securejoin"
"github.com/moby/sys/mountinfo"
"golang.org/x/sys/unix"
Expand Down Expand Up @@ -99,11 +102,32 @@ func tryDefaultPath(cgroupPath, subsystem string) string {
// expensive), so it is assumed that cgroup mounts are not being changed.
func readCgroupMountinfo() ([]*mountinfo.Info, error) {
readMountinfoOnce.Do(func() {
cgroupMountinfo, readMountinfoErr = mountinfo.GetMounts(
// We need to lock ourselves to the current thread, to make sure that
// this thread stays alive while we are using /proc/thread-self.
// Otherwise, Go could switch our running thread and kill the original
// thread underneath us at any point where we might yield to other
// goroutines.
runtime.LockOSThread()
defer runtime.UnlockOSThread()

// mountinfo.GetMounts uses /proc/self/mountinfo, however in runc we
// have threads that join different namespaces and thus the output of
// the thread-group's mountinfo can depend on Go scheduling. Instead,
// use thread-self as we know the current thread has the "correct"
// mount namespace (we use LockOSThread for threads which change
// namespaces).
threadSelfMountinfo, err := os.Open(utils.ProcThreadSelf("mountinfo"))
if err != nil {
readMountinfoErr = err
return
}
defer threadSelfMountinfo.Close()

cgroupMountinfo, readMountinfoErr = mountinfo.GetMountsFromReader(
threadSelfMountinfo,
mountinfo.FSTypeFilter("cgroup"),
)
})

return cgroupMountinfo, readMountinfoErr
}

Expand Down Expand Up @@ -196,6 +220,8 @@ func getCgroupMountsV1(all bool) ([]Mount, error) {
return nil, err
}

// We don't need to use /proc/thread-self here because runc always runs
// with every thread in the same cgroup.
allSubsystems, err := ParseCgroupFile("/proc/self/cgroup")
if err != nil {
return nil, err
Expand All @@ -214,6 +240,9 @@ func GetOwnCgroup(subsystem string) (string, error) {
if IsCgroup2UnifiedMode() {
return "", errUnified
}

// We don't need to use /proc/thread-self here because runc always runs
// with every thread in the same cgroup.
cgroups, err := ParseCgroupFile("/proc/self/cgroup")
if err != nil {
return "", err
Expand Down
2 changes: 2 additions & 0 deletions libcontainer/configs/namespaces_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ func IsNamespaceSupported(ns NamespaceType) bool {
if nsFile == "" {
return false
}
// We don't need to use /proc/thread-self here because the list of
// namespace types is unrelated to the thread.
_, err := os.Stat("/proc/self/ns/" + nsFile)
// a namespace is supported if it exists and we have permissions to read it
supported = err == nil
Expand Down
2 changes: 2 additions & 0 deletions libcontainer/container_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -463,6 +463,8 @@ func (c *Container) newParentProcess(p *Process) (parentProcess, error) {
}

func (c *Container) commandTemplate(p *Process, comm *processComm) *exec.Cmd {
// We don't need to use /proc/thread-self here because the exe mm of a
// thread-group is guaranteed to be the same for all threads by definition.
cmd := exec.Command("/proc/self/exe", "init")
cmd.Args[0] = os.Args[0]
cmd.Stdin = p.Stdin
Expand Down
2 changes: 1 addition & 1 deletion libcontainer/criu_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -549,7 +549,7 @@ func (c *Container) makeCriuRestoreMountpoints(m *configs.Mount) error {
if err != nil {
return err
}
if err := checkProcMount(c.config.Rootfs, dest, ""); err != nil {
if err := checkProcMount(c.config.Rootfs, dest, nil); err != nil {
return err
}
if err := os.MkdirAll(dest, 0o755); err != nil {
Expand Down
2 changes: 2 additions & 0 deletions libcontainer/init_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -463,6 +463,8 @@ func setupUser(config *initConfig) error {
return err
}

// We don't need to use /proc/thread-self here because setgroups is a
// per-userns file and thus is global to all threads in a thread-group.
setgroups, err := os.ReadFile("/proc/self/setgroups")
if err != nil && !os.IsNotExist(err) {
return err
Expand Down
37 changes: 24 additions & 13 deletions libcontainer/integration/exec_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
"os/exec"
"path/filepath"
"reflect"
"runtime"
"strconv"
"strings"
"syscall"
Expand All @@ -18,6 +19,7 @@ import (
"github.com/opencontainers/runc/libcontainer/cgroups"
"github.com/opencontainers/runc/libcontainer/cgroups/systemd"
"github.com/opencontainers/runc/libcontainer/configs"
"github.com/opencontainers/runc/libcontainer/utils"
"github.com/opencontainers/runtime-spec/specs-go"

"golang.org/x/sys/unix"
Expand Down Expand Up @@ -1685,6 +1687,24 @@ func TestFdLeaksSystemd(t *testing.T) {
testFdLeaks(t, true)
}

func fdList(t *testing.T) []string {
// We need to lock ourselves to the current thread, to make sure that this
// thread stays alive while we are using /proc/thread-self. Otherwise, Go
// could switch our running thread and kill the original thread underneath
// us at any point where we might yield to other goroutines.
runtime.LockOSThread()
defer runtime.UnlockOSThread()

pfd, err := os.Open(utils.ProcThreadSelf("fd"))
ok(t, err)
defer pfd.Close()

fds, err := pfd.Readdirnames(-1)
ok(t, err)

return fds
}

func testFdLeaks(t *testing.T, systemd bool) {
if testing.Short() {
return
Expand All @@ -1699,21 +1719,12 @@ func testFdLeaks(t *testing.T, systemd bool) {
// - /sys/fs/cgroup dirfd opened by prepareOpenat2 in libct/cgroups;
// - dbus connection opened by getConnection in libct/cgroups/systemd.
_ = runContainerOk(t, config, "true")

pfd, err := os.Open("/proc/self/fd")
ok(t, err)
defer pfd.Close()
fds0, err := pfd.Readdirnames(0)
ok(t, err)
_, err = pfd.Seek(0, 0)
ok(t, err)
fds0 := fdList(t)

_ = runContainerOk(t, config, "true")
fds1 := fdList(t)

fds1, err := pfd.Readdirnames(0)
ok(t, err)

if len(fds1) == len(fds0) {
if reflect.DeepEqual(fds0, fds1) {
return
}
// Show the extra opened files.
Expand All @@ -1730,7 +1741,7 @@ next_fd:
continue next_fd
}
}
dst, _ := os.Readlink("/proc/self/fd/" + fd1)
dst, _ := os.Readlink(utils.ProcThreadSelf("fd/" + fd1))
for _, ex := range excludedPaths {
if ex == dst {
continue next_fd
Expand Down
25 changes: 20 additions & 5 deletions libcontainer/mount_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,15 @@ import (
"fmt"
"io/fs"
"os"
"runtime"
"strconv"

"github.com/sirupsen/logrus"
"golang.org/x/sys/unix"

"github.com/opencontainers/runc/libcontainer/configs"
"github.com/opencontainers/runc/libcontainer/userns"
"github.com/opencontainers/runc/libcontainer/utils"
)

// mountSourceType indicates what type of file descriptor is being returned. It
Expand All @@ -23,7 +25,7 @@ const (
// An open_tree(2)-style file descriptor that needs to be installed using
// move_mount(2) to install.
mountSourceOpenTree mountSourceType = "open_tree"
// A plain file descriptor that can be mounted through /proc/self/fd.
// A plain file descriptor that can be mounted through /proc/thread-self/fd.
mountSourcePlain mountSourceType = "plain-open"
)

Expand Down Expand Up @@ -90,7 +92,7 @@ func mount(source, target, fstype string, flags uintptr, data string) error {
// will mount it according to the mountSourceType of the file descriptor.
//
// The dstFd argument, if non-empty, is expected to be in the form of a path to
// an opened file descriptor on procfs (i.e. "/proc/self/fd/NN").
// an opened file descriptor on procfs (i.e. "/proc/thread-self/fd/NN").
//
// If a file descriptor is used instead of a source or a target path, the
// corresponding path is only used to add context to an error in case the mount
Expand All @@ -101,19 +103,32 @@ func mountViaFds(source string, srcFile *mountSource, target, dstFd, fstype stri
logrus.Debugf("mount source passed along with MS_REMOUNT -- ignoring srcFile")
srcFile = nil
}

dst := target
if dstFd != "" {
dst = dstFd
}
src := source
isMoveMount := srcFile != nil && srcFile.Type == mountSourceOpenTree
if srcFile != nil {
src = "/proc/self/fd/" + strconv.Itoa(int(srcFile.file.Fd()))
// If we're going to use the /proc/thread-self/... path for classic
// mount(2), we need to lock ourselves to the current thread, to make
// sure that this thread stays alive while we are using
// /proc/thread-self. Otherwise, Go could switch our running thread and
// kill the original thread underneath us at any point where we might
// yield to other goroutines.
//
// This isn't needed for move_mount(2) because in that case the thread
// is only used for error info.
if !isMoveMount {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
}
src = utils.ProcThreadSelf("fd/" + strconv.Itoa(int(srcFile.file.Fd())))
}

var op string
var err error
if srcFile != nil && srcFile.Type == mountSourceOpenTree {
if isMoveMount {
op = "move_mount"
err = unix.MoveMount(int(srcFile.file.Fd()), "",
unix.AT_FDCWD, dstFd,
Expand Down
Loading

0 comments on commit be892f0

Please sign in to comment.