Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A safe MkdirAll implementation #13

Merged
merged 11 commits into from
Jul 9, 2024
Merged

A safe MkdirAll implementation #13

merged 11 commits into from
Jul 9, 2024

Commits on Jul 9, 2024

  1. initial safe MkdirAll and MkdirAllHandle implementations

    A fairly common usage of SecureJoin is of the form:
    
      path, _ := securejoin.SecureJoin(root, nonExistentPath)
      _ = os.MkdirAll(path, 0o755)
    
    Unfortunately, such usage is not safe if the root can be modified
    between SecureJoin and MkdirAll. This is a known issue with SecureJoin
    (using path strings for the API means we cannot be sure the path is
    safe).
    
    However, by carefully using file handles it is possible to adapt
    SecureJoin to be safe to use in a race-free way with MkdirAll by merging
    them into a single function. The implementation is designed so that we
    can easily add openat2 support, which is done in a follow-up commit.
    
    MkdirAllHandle is a variant of MkdirAll which takes the root as an
    *os.File and returns an *os.File for the final part of the created
    directory tree. This is a preferable interface to use if you need a
    handle to the final directory.
    
    We have to update the minimum Go version to 1.20 so that we can have
    errors that contain more than one wrapped error.
    
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    54905ce View commit details
    Browse the repository at this point in the history
  2. mkdirall: verify that the newly created directory is sane

    An attacker could swap out the directory we created between us creating
    it and opening it. However, the worst thing they could do is force us to
    make a directory inside a directory with different permissions or a
    different mode. If they swap our directory with a directory that has the
    same owner and permissions as the one we created, they didn't gain
    anything. So, verify that the directory looks like the one we would've
    created.
    
    Unfortunately, there is no way to atomically create a directory and open
    it. I discussed this with Al a while ago and I believe he said he didn't
    like it (though I'm not sure if it was a blanket veto or not).
    
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    b80d695 View commit details
    Browse the repository at this point in the history
  3. procfs: use /proc/thread-self to check

    Rather than saving a handle to /proc/self/fd, save a /proc handle and
    use /proc/thread-self/fd. This is necessary to handle Go programs with
    locked threads that have unshare(CLONE_FS), such as runc.
    
    This is modelled on similar logic that exists in runc (ProcThreadSelf),
    and is necessary if we want to use MkdirAll in runc.
    
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    ab8607e View commit details
    Browse the repository at this point in the history
  4. mkdirall: add openat2 support

    While adding openat2 support to SecureJoin doesn't make much sense (the
    API is wrong for handling race-safe paths), MkdirAll can use openat2
    internally to make sure we safely operate on /proc and more efficiently
    do the partial-open necessary for MkdirAll.
    
    The safe /proc handling is based on similar logic in libpathrs.
    
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    d2ccd9b View commit details
    Browse the repository at this point in the history
  5. mkdirall: make partialOpen semantically identical to openat2 implemen…

    …tation
    
    openat2(2) will fail when we try to follow a broken symlink, which means
    that partialOpenat2() will treat the final symlink as a non-existent
    path. In contrast, partialOpen() will happily return a handle and
    remainingPath in the middle of the symlink resolution.
    
    While either of the above semantics are acceptable, we need to unify the
    behaviour so that users don't get different behaviour based on their
    kernel version. We can't change the behaviour of openat2(2), and
    emulating the current partialOpen() behaviour using openat2(2) is just
    asking for trouble. So, make partialOpen() match the behaviour of
    partialOpenat2().
    
    Unfortunately, implementing this properly (including handling recursive
    symlinks) requires emulating a symlink stack and making sure that we
    return the top symlink we are in the mille of resolving if we hit a
    non-existent or otherwise invalid path. If there is no last symlink, we
    return the partial open as normal. Yeah, this is a little ugly, but I
    don't see a nicer way of implementing these semantics...
    
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    3d3771c View commit details
    Browse the repository at this point in the history
  6. mkdirall: add tests for MkdirAll and MkdirAllHandle

    These are primarily correctness tests (81.3% code coverage), to make
    sure we handle all of the ugly corner-cases with nested and dangling
    symlinks correctly.
    
    Almost all of the untested code is related to errors (most of which are
    annoying to mock). The one set of error paths it might be worth adding
    errors for is the isDeadInode() checks.
    
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    0355f24 View commit details
    Browse the repository at this point in the history
  7. procfs: harden proc handle using fsopen() and open_tree()

    Because we depend on procfs to be correct when operating on other
    filesystems, having a safe procfs handle is vital. Unfortunately, if we
    are an administrative program running inside a container that can modify
    its mount configuration, our current checks are not sufficient to avoid
    being tricked into thinking a path is real.
    
    Luckily, with fsopen() and open_tree() it is possible to create a
    completely private procfs instance that an attacker cannot modify. Note
    that while they are both safe, they are safe for different reasons:
    
     1. fsopen() is safe because the created mount is completely separate to
        any other procfs mount, so any changes to mounts on the host are
        irrelevant. fsopen() can fail if we trip the mnt_too_revealing()
        check, so we may have to fall back to open_tree() in some cases.
    
     2. open_tree() creates a clone of a snapshot of the mount tree (or just
        the top mount if can avoid using AT_RECURSIVE, but
        mnt_too_revealing() may force us to use AT_RECURSIVE). While the
        tree we clone might have been messed with by an attacker, after
        cloning there is no way for the attacker to affect our clone (even
        mount propagation won't propagate into a clone[1]).
    
        The only risk is whether there are over-mounts with AT_RECURSIVE.
        Because anonymous mounts don't show up in mountinfo, the best we can
        do is check the mount id through statx to see whether a target has
        an overmount. We only do this for symlink operations (in particular,
        readlink) because openat2 handles the non-magiclink case for us and
        statx gained STATX_MNT_ID support in Linux 5.8.
    
        We could do this check only for the open_tree(AT_RECURSIVE) case,
        but it's simpler to always do it (of course, an attacker could add
        the mount after the check if we use the host /proc the "standard"
        way, but at least there's a chance we'll detect it).
    
        listmounts(2) would let us detect any overmounts at creation time,
        but it's not clear whether you want to error out if there is a mount
        over a path you never use (lxcfs has overmounts of /proc/cpuinfo and
        /proc/meminfo, which we never use).
    
    Unfortunately, we can only use these when running as a privileged user.
    In theory we could create a user namespace to gain the necessary
    privileges to create these handles, but this would require spawning a
    proper subprocess (CLONE_NEWUSER must be done in a single-threaded
    program, which is not possible in Go without using Go's ForkExec) which
    won't always work when filepath-securejoin is used as a library. It's
    also far too complicated to deal with in practice.
    
    This is based on similar logic I'm working on for libpathrs.
    
    [1]: This is true since at least Linux 5.12. See commit ee2e3f50629f
         ("mount: fix mounting of detached mounts onto targets that reside
         on shared mounts").
    
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    4f530fd View commit details
    Browse the repository at this point in the history
  8. procfs: add tests for our safe /proc helpers

    This ensures we test all of the fallbacks even if we don't use them in
    practice when doing MkdirAll in our test suite. In addition, add some
    hooks to allow the tests to force the usage of fallbacks when using the
    standard getProcRoot() function everything else uses. This raises the
    test coverage to 87.1%.
    
    In terms of correctness tests, there are a couple of extra tests it's
    probably worth adding (see procfs_linux_test.go for a list).
    
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    45ab189 View commit details
    Browse the repository at this point in the history
  9. lookup: add tests for partialLookupInRoot

    This lets us make sure we are correctly handling all sorts of lookups
    that aren't obvious just by looking at MkdirAll. Of particular note are
    the tail-chained and other deep symlink resolution stacks (MkdirAll will
    just return an error, but the remainingPath we get is quite important).
    
    In addition, add some explicit tests for the error paths in
    symlinkStack. We can't force these with actual lookups (since they
    indicate a programming bug), but at least make sure we're testing them
    properly.
    
    This raises the test coverage to 88.6% now that we are testing more of
    the error paths in partialLookupInRoot.
    
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    b6aed59 View commit details
    Browse the repository at this point in the history
  10. gha: collate and calculate coverage for all test runs

    By combining the coverage from all platforms as well as privileged and
    non-privileged Linux runs, the coverage is now 87.7% for the entire
    project.
    
    It was necessary to split the Windows CI scripts because mktemp doesn't
    work in Powershell and it's necessary to reimplement the whole thing to
    get what you need (not to mention that -test.gocoverdir needs to be
    quoted in a particular way to not make Powershell angry).
    
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    633f3d6 View commit details
    Browse the repository at this point in the history
  11. tests: add racing tests for partialLokupInRoot and MkdirAllHandle

    In order to get some confidence in saying that our code safe against
    these kinds of racing attacks, add some tests to cover this.
    
    The full suite (with 50k runs of each new subtest) takes about 10
    minutes, so we should use "go test -short" for the full matrix and add a
    single separate test run which can run for longer (we don't need to
    merge the coverage from the stress test because we hit the same
    codepaths as in the short test).
    
    While working on this, I found what seems to be a kernel bug with
    readlink(/proc/self/fd/$n). There is a workaround in the tests for this,
    but it is possible that the test will become flaky.
    
    This brings the test coverage up to 91.1% on Linux (91.3% combined).
    
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    ac32743 View commit details
    Browse the repository at this point in the history