Skip to content
This repository has been archived by the owner on Jun 11, 2020. It is now read-only.

[18.06 backport] remove hot-fix, and apply latest upstream patch for CVE-2019-5736 #8

Closed

Commits on Feb 13, 2019

  1. Revert "Merge pull request #11 from seemethere/apply_patches_1806"

    This reverts commit a592beb, reversing
    changes made to 69663f0.
    
    Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
    thaJeztah committed Feb 13, 2019
    Configuration menu
    Copy the full SHA
    cc33f92 View commit details
    Browse the repository at this point in the history
  2. nsenter: clone /proc/self/exe to avoid exposing host binary to container

    There are quite a few circumstances where /proc/self/exe pointing to a
    pretty important container binary is a _bad_ thing, so to avoid this we
    have to make a copy (preferably doing self-clean-up and not being
    writeable).
    
    We require memfd_create(2) -- though there is an O_TMPFILE fallback --
    but we can always extend this to use a scratch MNT_DETACH overlayfs or
    tmpfs. The main downside to this approach is no page-cache sharing for
    the runc binary (which overlayfs would give us) but this is far less
    complicated.
    
    This is only done during nsenter so that it happens transparently to the
    Go code, and any libcontainer users benefit from it. This also makes
    ExtraFiles and --preserve-fds handling trivial (because we don't need to
    worry about it).
    
    Fixes: CVE-2019-5736
    Co-developed-by: Christian Brauner <christian.brauner@ubuntu.com>
    Signed-off-by: Aleksa Sarai <asarai@suse.de>
    (cherry picked from commit 0a8e411)
    Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
    cyphar authored and thaJeztah committed Feb 13, 2019
    Configuration menu
    Copy the full SHA
    f87d8bb View commit details
    Browse the repository at this point in the history

Commits on Mar 12, 2019

  1. nsexec (CVE-2019-5736): avoid parsing environ

    My first attempt to simplify this and make it less costly focussed on
    the way constructors are called. I was under the impression that the ELF
    specification mandated that arg, argv, and actually even envp need to be
    passed to functions located in the .init_arry section (aka
    "constructors"). Actually, the specifications is (cf. [2]):
    
    SHT_INIT_ARRAY
    This section contains an array of pointers to initialization functions,
    as described in ``Initialization and Termination Functions'' in Chapter
    5. Each pointer in the array is taken as a parameterless procedure with
    a void return.
    
    which means that this becomes a libc specific decision. Glibc passes
    down those args, musl doesn't. So this approach can't work. However, we
    can at least remove the environment parsing part based on POSIX since
    [1] mandates that there should be an environ variable defined in
    unistd.h which provides access to the environment. See also the relevant
    Open Group specification [1].
    
    [1]: http://pubs.opengroup.org/onlinepubs/9699919799/
    [2]: http://www.sco.com/developers/gabi/latest/ch4.sheader.html#init_array
    
    Fixes: CVE-2019-5736
    Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
    (cherry picked from commit bb7d8b1)
    Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
    Christian Brauner authored and thaJeztah committed Mar 12, 2019
    Configuration menu
    Copy the full SHA
    497c526 View commit details
    Browse the repository at this point in the history
  2. nsenter: cloned_binary: detect and handle short copies

    For a variety of reasons, sendfile(2) can end up doing a short-copy so
    we need to just loop until we hit the binary size. Since /proc/self/exe
    is tautologically our own binary, there's no chance someone is going to
    modify it underneath us (or changing the size).
    
    Signed-off-by: Aleksa Sarai <asarai@suse.de>
    (cherry picked from commit 5b775bf)
    Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
    cyphar authored and thaJeztah committed Mar 12, 2019
    Configuration menu
    Copy the full SHA
    7b7e23d View commit details
    Browse the repository at this point in the history
  3. nsenter: cloned_binary: expand and add pre-3.11 fallbacks

    In order to get around the memfd_create(2) requirement, 0a8e411
    ("nsenter: clone /proc/self/exe to avoid exposing host binary to
    container") added an O_TMPFILE fallback. However, this fallback was
    flawed in two ways:
    
     * It required O_TMPFILE which is relatively new (having been added to
       Linux 3.11).
    
     * The fallback choice was made at compile-time, not runtime. This
       results in several complications when it comes to running binaries
       on different machines to the ones they were built on.
    
    The easiest way to resolve these things is to have fallbacks work in a
    more procedural way (though it does make the code unfortunately more
    complicated) and to add a new fallback that uses mkotemp(3).
    
    Signed-off-by: Aleksa Sarai <asarai@suse.de>
    (cherry picked from commit 2429d59)
    Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
    cyphar authored and thaJeztah committed Mar 12, 2019
    Configuration menu
    Copy the full SHA
    22abfb2 View commit details
    Browse the repository at this point in the history
  4. nsenter: cloned_binary: use the runc statedir for O_TMPFILE

    Writing a file to tmpfs actually incurs a memcg penalty, and thus the
    benefit of being able to disable memfd_create(2) with
    _LIBCONTAINER_DISABLE_MEMFD_CLONE is fairly minimal -- though it should
    be noted that quite a few distributions don't use tmpfs for /tmp (and
    instead have it as a regular directory or subvolume of the host
    filesystem).
    
    Since runc must have write access to the state directory anyway (and the
    state directory is usually not on a tmpfs) we can use that instead of
    /tmp -- avoiding potential memcg costs with no real downside.
    
    Signed-off-by: Aleksa Sarai <asarai@suse.de>
    (cherry picked from commit af9da0a)
    Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
    cyphar authored and thaJeztah committed Mar 12, 2019
    Configuration menu
    Copy the full SHA
    e8ffbab View commit details
    Browse the repository at this point in the history
  5. nsenter: cloned_binary: try to ro-bind /proc/self/exe before copying

    The usage of memfd_create(2) and other copying techniques is quite
    wasteful, despite attempts to minimise it with _LIBCONTAINER_STATEDIR.
    memfd_create(2) added ~10M of memory usage to the cgroup associated with
    the container, which can result in some setups getting OOM'd (or just
    hogging the hosts' memory when you have lots of created-but-not-started
    containers sticking around).
    
    The easiest way of solving this is by creating a read-only bind-mount of
    the binary, opening that read-only bindmount, and then umounting it to
    ensure that the host won't accidentally be re-mounted read-write. This
    avoids all copying and cleans up naturally like the other techniques
    used. Unfortunately, like the O_TMPFILE fallback, this requires being
    able to create a file inside _LIBCONTAINER_STATEDIR (since bind-mounting
    over the most obvious path -- /proc/self/exe -- is a *very bad idea*).
    
    Unfortunately detecting this isn't fool-proof -- on a system with a
    read-only root filesystem (that might become read-write during "runc
    init" execution), we cannot tell whether we have already done an ro
    remount. As a partial mitigation, we store a _LIBCONTAINER_CLONED_BINARY
    environment variable which is checked *alongside* the protection being
    present.
    
    Signed-off-by: Aleksa Sarai <asarai@suse.de>
    (cherry picked from commit 16612d7)
    Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
    cyphar authored and thaJeztah committed Mar 12, 2019
    Configuration menu
    Copy the full SHA
    293faf4 View commit details
    Browse the repository at this point in the history
  6. nsenter: cloned_binary: userspace copy fallback if sendfile fails

    There are some circumstances where sendfile(2) can fail (one example is
    that AppArmor appears to block writing to deleted files with sendfile(2)
    under some circumstances) and so we need to have a userspace fallback.
    It's fairly trivial (and handles short-writes).
    
    Signed-off-by: Aleksa Sarai <asarai@suse.de>
    (cherry picked from commit 2d4a37b)
    Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
    cyphar authored and thaJeztah committed Mar 12, 2019
    Configuration menu
    Copy the full SHA
    e89ebf3 View commit details
    Browse the repository at this point in the history