A safe MkdirAll implementation #13

A fairly common usage of SecureJoin is of the form: path, _ := securejoin.SecureJoin(root, nonExistentPath) _ = os.MkdirAll(path, 0o755) Unfortunately, such usage is not safe if the root can be modified between SecureJoin and MkdirAll. This is a known issue with SecureJoin (using path strings for the API means we cannot be sure the path is safe). However, by carefully using file handles it is possible to adapt SecureJoin to be safe to use in a race-free way with MkdirAll by merging them into a single function. The implementation is designed so that we can easily add openat2 support, which is done in a follow-up commit. MkdirAllHandle is a variant of MkdirAll which takes the root as an *os.File and returns an *os.File for the final part of the created directory tree. This is a preferable interface to use if you need a handle to the final directory. We have to update the minimum Go version to 1.20 so that we can have errors that contain more than one wrapped error. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

An attacker could swap out the directory we created between us creating it and opening it. However, the worst thing they could do is force us to make a directory inside a directory with different permissions or a different mode. If they swap our directory with a directory that has the same owner and permissions as the one we created, they didn't gain anything. So, verify that the directory looks like the one we would've created. Unfortunately, there is no way to atomically create a directory and open it. I discussed this with Al a while ago and I believe he said he didn't like it (though I'm not sure if it was a blanket veto or not). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

Rather than saving a handle to /proc/self/fd, save a /proc handle and use /proc/thread-self/fd. This is necessary to handle Go programs with locked threads that have unshare(CLONE_FS), such as runc. This is modelled on similar logic that exists in runc (ProcThreadSelf), and is necessary if we want to use MkdirAll in runc. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

While adding openat2 support to SecureJoin doesn't make much sense (the API is wrong for handling race-safe paths), MkdirAll can use openat2 internally to make sure we safely operate on /proc and more efficiently do the partial-open necessary for MkdirAll. The safe /proc handling is based on similar logic in libpathrs. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

…tation openat2(2) will fail when we try to follow a broken symlink, which means that partialOpenat2() will treat the final symlink as a non-existent path. In contrast, partialOpen() will happily return a handle and remainingPath in the middle of the symlink resolution. While either of the above semantics are acceptable, we need to unify the behaviour so that users don't get different behaviour based on their kernel version. We can't change the behaviour of openat2(2), and emulating the current partialOpen() behaviour using openat2(2) is just asking for trouble. So, make partialOpen() match the behaviour of partialOpenat2(). Unfortunately, implementing this properly (including handling recursive symlinks) requires emulating a symlink stack and making sure that we return the top symlink we are in the mille of resolving if we hit a non-existent or otherwise invalid path. If there is no last symlink, we return the partial open as normal. Yeah, this is a little ugly, but I don't see a nicer way of implementing these semantics... Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

These are primarily correctness tests (81.3% code coverage), to make sure we handle all of the ugly corner-cases with nested and dangling symlinks correctly. Almost all of the untested code is related to errors (most of which are annoying to mock). The one set of error paths it might be worth adding errors for is the isDeadInode() checks. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

Because we depend on procfs to be correct when operating on other filesystems, having a safe procfs handle is vital. Unfortunately, if we are an administrative program running inside a container that can modify its mount configuration, our current checks are not sufficient to avoid being tricked into thinking a path is real. Luckily, with fsopen() and open_tree() it is possible to create a completely private procfs instance that an attacker cannot modify. Note that while they are both safe, they are safe for different reasons: 1. fsopen() is safe because the created mount is completely separate to any other procfs mount, so any changes to mounts on the host are irrelevant. fsopen() can fail if we trip the mnt_too_revealing() check, so we may have to fall back to open_tree() in some cases. 2. open_tree() creates a clone of a snapshot of the mount tree (or just the top mount if can avoid using AT_RECURSIVE, but mnt_too_revealing() may force us to use AT_RECURSIVE). While the tree we clone might have been messed with by an attacker, after cloning there is no way for the attacker to affect our clone (even mount propagation won't propagate into a clone[1]). The only risk is whether there are over-mounts with AT_RECURSIVE. Because anonymous mounts don't show up in mountinfo, the best we can do is check the mount id through statx to see whether a target has an overmount. We only do this for symlink operations (in particular, readlink) because openat2 handles the non-magiclink case for us and statx gained STATX_MNT_ID support in Linux 5.8. We could do this check only for the open_tree(AT_RECURSIVE) case, but it's simpler to always do it (of course, an attacker could add the mount after the check if we use the host /proc the "standard" way, but at least there's a chance we'll detect it). listmounts(2) would let us detect any overmounts at creation time, but it's not clear whether you want to error out if there is a mount over a path you never use (lxcfs has overmounts of /proc/cpuinfo and /proc/meminfo, which we never use). Unfortunately, we can only use these when running as a privileged user. In theory we could create a user namespace to gain the necessary privileges to create these handles, but this would require spawning a proper subprocess (CLONE_NEWUSER must be done in a single-threaded program, which is not possible in Go without using Go's ForkExec) which won't always work when filepath-securejoin is used as a library. It's also far too complicated to deal with in practice. This is based on similar logic I'm working on for libpathrs. [1]: This is true since at least Linux 5.12. See commit ee2e3f50629f ("mount: fix mounting of detached mounts onto targets that reside on shared mounts"). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

This ensures we test all of the fallbacks even if we don't use them in practice when doing MkdirAll in our test suite. In addition, add some hooks to allow the tests to force the usage of fallbacks when using the standard getProcRoot() function everything else uses. This raises the test coverage to 87.1%. In terms of correctness tests, there are a couple of extra tests it's probably worth adding (see procfs_linux_test.go for a list). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

This lets us make sure we are correctly handling all sorts of lookups that aren't obvious just by looking at MkdirAll. Of particular note are the tail-chained and other deep symlink resolution stacks (MkdirAll will just return an error, but the remainingPath we get is quite important). In addition, add some explicit tests for the error paths in symlinkStack. We can't force these with actual lookups (since they indicate a programming bug), but at least make sure we're testing them properly. This raises the test coverage to 88.6% now that we are testing more of the error paths in partialLookupInRoot. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

By combining the coverage from all platforms as well as privileged and non-privileged Linux runs, the coverage is now 87.7% for the entire project. It was necessary to split the Windows CI scripts because mktemp doesn't work in Powershell and it's necessary to reimplement the whole thing to get what you need (not to mention that -test.gocoverdir needs to be quoted in a particular way to not make Powershell angry). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

In order to get some confidence in saying that our code safe against these kinds of racing attacks, add some tests to cover this. The full suite (with 50k runs of each new subtest) takes about 10 minutes, so we should use "go test -short" for the full matrix and add a single separate test run which can run for longer (we don't need to merge the coverage from the stress test because we hit the same codepaths as in the short test). While working on this, I found what seems to be a kernel bug with readlink(/proc/self/fd/$n). There is a workaround in the tests for this, but it is possible that the test will become flaky. This brings the test coverage up to 91.1% on Linux (91.3% combined). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A safe MkdirAll implementation #13

A safe MkdirAll implementation #13

Commits on Jul 9, 2024