Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle seccomp policies that don't include ptrace(2) #846

Closed
apyrgio opened this issue Jun 26, 2024 · 3 comments · Fixed by #847
Closed

Handle seccomp policies that don't include ptrace(2) #846

apyrgio opened this issue Jun 26, 2024 · 3 comments · Fixed by #847
Labels
bug Something isn't working container security
Milestone

Comments

@apyrgio
Copy link
Contributor

apyrgio commented Jun 26, 2024

Problem

Our recent gVisor integration (#590) requires allowlisting the ptrace(2) system call in the outer container, in order to spawn the inner container with runsc. Nowadays, this is the default [1], but we have encountered systems that don't allow this system call, and thus Dangerzone cannot run in them, at least out of the box.

Affected systems are:

  • Ubuntu Focal (via OpenSUSE's repo), with Podman version 3.4.2
  • Ubuntu Jammy, with Podman version 3.4.4
  • Debian Bullseye, with Podman version 3.0.1
  • Older Docker Desktop releases, e.g., with runc version 1.1.5

Background

Before explaining how we plan to fix this issue, we'll give some background on the ptrace(2) system call.

First of all, why is this syscall dangerous in the first place? The main reason is that a malicious process can use it in order to escalate its privileges, or thwart some system protections. A real-life example is CVE-2019-2054. This CVE is the reason why ptrace(2) is not allowed in Linux kernels < 4.8, but it's not the only ptrace-related CVE that has been reported.

In order to control the scope of ptrace(2) system call, the Linux kernel offers the following mechanisms:

  1. The CAP_SYS_PTRACE Linux capability. If this capability is enabled, then the process can have full tracing capabilities, such as tracing other processes that it has not started. If this capability is not granted, then the usage of ptrace(2) is still allowed, but restricted through the mechanisms listed below.
  2. Disabling the system call (or arguments to it) via a seccomp policy. For instance:
    • Docker originally had disabled ptrace(2) in their seccomp policy, and then re-enabled it for kernels >= 4.8.
    • Podman similarly lifted this restriction a few years later.
    • Containerd did so around the same time.
  3. The YAMA Linux Security Module ptrace_scope setting. This setting controls the behavior of ptrace(2) system-wide. In the Linux platforms we support, the default seems to be 1, i.e., allow ptrace(2) only for processes that the parent has direct relationship with (e.g., child processes).

[1] See Podman's seccomp policy, Docker's seccomp policy, and containerd's seccomp policy.

@apyrgio apyrgio added bug Something isn't working container security labels Jun 26, 2024
@apyrgio apyrgio added this to the 0.7.0 milestone Jun 26, 2024
@apyrgio
Copy link
Contributor Author

apyrgio commented Jun 26, 2024

Requirements

Our solution must take into account the following:

  1. It must work on kernels >= 4.8.
  2. It must work with the default ptrace_scope on Linux systems.
  3. It must work on older Podman and Docker Desktop releases.
    • Yes, these releases may be insecure by now, but if we don't support them and our users cannot update to newer ones, they will just open the suspicious file.
  4. The user must not interact with the system in order to make Dangerzone work.

On (1), we have verified that none of the systems we support has Linux kernel < 4.8. This applies also to Windows (WSL2) and macOS (HyperKit). On (2), we have seen that the default ptrace_scope is 1 in platforms we support. This scope is supported by gVisor.

Solution

For Podman versions < 4, we already have a workaround in our code that starts the process with Podman's default seccomp policy as of June 6th, 2024 (see seccomp.json):

if Container.get_runtime_version() < (4, 0):
seccomp_json_path = get_resource_path("seccomp.gvisor.json")
security_args += ["--security-opt", f"seccomp={seccomp_json_path}"]

For Docker Desktop, we have not a similar workaround, because we don't know exactly when was this restriction lifted. We do know that Containerd 1.6.7 first allowed the ptrace() syscall, and that Docker Desktop 4.12.0 included this Containerd version. However, we have tested with Docker Desktop release 4.19.0 on macOS, and the ptrace() syscall was disabled, so we're not sure.

So, our suggestion is to:

  1. Check if the Docker Desktop release is recent. We have had good results with Docker Desktop 4.27.0, for example.
  2. If the release is older, spawn a container using the stored seccomp.json file we have for Podman as well.

This way, older releases will use our Podman seccomp policy, which will guarantee that ptrace(2) will be allowed. In case an older Docker Desktop release allows the ptrace(2) system call, our seccomp policy will mask it, but the differences should be negligible.

Newer releases will use their default seccomp policy, and thus we will not mask any security-related fixes that happen in the future.

Alternatives

Docker also allows the ptrace(2) system call, if CAP_SYS_PTRACE is specified in the container invocation. Note that we don't add this Linux capability in the current implementation:

security_args += ["--cap-drop", "all"]
security_args += ["--cap-add", "SYS_CHROOT"]

Why is that? Because using the CAP_SYS_PTRACE capability, the outer container will be able to trace any process, which significantly increases our attack surface.

For this reason, we choose not to go down that path, and simply pass our own seccomp policy.

@apyrgio
Copy link
Contributor Author

apyrgio commented Jun 26, 2024

It seems that docker version gives an output that is not friendly to parsing, if we just want the Docker Desktop release (i.e., the 4.27.2 part):

$ docker version -f {{.Server.Platform.Name}}
Docker Desktop 4.27.2 (137060)
$ docker version -f json
{
    "Client": {
        "CloudIntegration": "v1.0.35+desktop.10",
        "Version": "25.0.3",
        "ApiVersion": "1.44",
        "DefaultAPIVersion": "1.44",
        "GitCommit": "4debf41",
        "GoVersion": "go1.21.6",
        "Os": "darwin",
        "Arch": "arm64",
        "BuildTime": "Tue Feb  6 21:13:26 2024",
        "Context": "default"
    },
    "Server": {
        "Platform": {
            "Name": "Docker Desktop 4.27.2 (137060)"
        },
        "Components": [
            {
                "Name": "Engine",
                "Version": "25.0.3",
                "Details": {
                    "ApiVersion": "1.44",
                    "Arch": "arm64",
                    "BuildTime": "Tue Feb  6 21:14:22 2024",
                    "Experimental": "false",
                    "GitCommit": "f417435",
                    "GoVersion": "go1.21.6",
                    "KernelVersion": "6.6.12-linuxkit",
                    "MinAPIVersion": "1.24",
                    "Os": "linux"
                }
            },
            {
                "Name": "containerd",
                "Version": "1.6.28",
                "Details": {
                    "GitCommit": "ae07eda36dd25f8a1b98dfbf587313b99c0190bb"
                }
            },
            {
                "Name": "runc",
                "Version": "1.1.12",
                "Details": {
                    "GitCommit": "v1.1.12-0-g51d5e94"
                }
            },
            {
                "Name": "docker-init",
                "Version": "0.19.0",
                "Details": {
                    "GitCommit": "de40ad0"
                }
            }
        ],
        "Version": "25.0.3",
        "ApiVersion": "1.44",
        "MinAPIVersion": "1.24",
        "GitCommit": "f417435",
        "GoVersion": "go1.21.6",
        "Os": "linux",
        "Arch": "arm64",
        "KernelVersion": "6.6.12-linuxkit",
        "BuildTime": "2024-02-06T21:14:22.000000000+00:00"
    }
}

We can use the Docker Engine version instead:

$ docker version -f {{.Server.Version}}
25.0.3

Most likely, we can consider anything greater than 25.0 as safe.

apyrgio added a commit that referenced this issue Jun 26, 2024
We are aware that some Docker Desktop releases before 25.0.0 ship with a
seccomp policy which disables the `ptrace(2)` system call. In such
cases, we opt to use our own seccomp policy which allows this system
call. This seccomp policy is the default one in the latest releases of
Podman, and we use it in Linux distributions where Podman version is <
4.0.

Fixes #846
@apyrgio apyrgio changed the title Handle seccomp policies that don't include ptrace() Handle seccomp policies that don't include ptrace(2) Jun 26, 2024
@apyrgio
Copy link
Contributor Author

apyrgio commented Sep 24, 2024

Based on discussions in #865, we will actually enforce this seccomp policy across all container engines. There are two reasons for doing so:

  1. The seccomp policies that are shipped by default in various container engines tend to get more lax over time, but our application needs a very specific set of syscalls, now that we have integrated gVisor. So, we can freeze the list of allowed syscalls, and thus not broaden the attack surface of the outer container.
  2. There seem to be container engines (like Orbstack, see All conversions fail with "Unspecified error", using Orbstack #908) which use stricter seccomp policies. Our detection method that works for Docker Desktop, and enables our custom seccomp policy, does not work for them. By uniformly setting our own seccomp policy, we can interoperate with these container engines as well, even though they are not officially supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working container security
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant