Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Kubernetes UserNamespacesSupport alpha feature gate #3436

Open
dgl opened this issue Nov 28, 2023 · 2 comments
Open

Support Kubernetes UserNamespacesSupport alpha feature gate #3436

dgl opened this issue Nov 28, 2023 · 2 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@dgl
Copy link
Contributor

dgl commented Nov 28, 2023

What happened:

I'm working on parts of the Kubernetes user namespace support (currently an alpha feature). I'd like to use kind for testing it.

I enabled the UserNamespacesSupport feature gate. Pods that set hostUsers: false fail with:

Warning  FailedCreatePodSandBox  6s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox "75fd33edcf39433911025ac0e045581bd19688190cd1e5f7166d279056dc592c": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "sysfs" to rootfs at "/sys": mount src=sysfs, dst=/sys, dstFD=/proc/self/fd/10, flags=0xf: operation not permitted: unknown

After fixing that (below), I also saw:

Warning  Failed                  5s    kubelet            Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running createContainer hook #0: fork/exec /kind/bin/mount-product-files.sh: permission denied: unknown

What you expected to happen:

Sweet user namespace based isolation.

How to reproduce it (as minimally and precisely as possible):

Update runc to main in the base image, but also set runc_nodmz (because of the bug I reported in opencontainers/runc#4125):

--- a/images/base/Dockerfile
+++ b/images/base/Dockerfile
@@ -135,13 +135,13 @@ RUN git clone --filter=tree:0 "${CONTAINERD_CLONE_URL}" /containerd \
 # stage for building runc
 FROM go-build as build-runc
 ARG TARGETARCH GO_VERSION
-ARG RUNC_VERSION="v1.1.9"
+ARG RUNC_VERSION="main"
 ARG RUNC_CLONE_URL="https://github.com/opencontainers/runc"
 RUN git clone --filter=tree:0 "${RUNC_CLONE_URL}" /runc \
     && cd /runc \
     && git checkout "${RUNC_VERSION}" \
     && eval "$(gimme "${GO_VERSION}")" \
-    && export GOARCH=$TARGETARCH && export CC=$(target-cc) && export CGO_ENABLED=1 \
+    && export GOARCH=$TARGETARCH && export CC=$(target-cc) && export CGO_ENABLED=1 && export EXTRA_BUILDTAGS=runc_nodmz \
     && make runc \
     && GOARCH=$TARGETARCH go-licenses save --save_path=/_LICENSES .

Also use containerd v2.0.0-pre version. make quick, build a node image based on a recent Kubernetes (something like kind build node-image ~/Code/kubernetes --image kindest/node:runc-main --base-image=gcr.io/k8s-staging-kind/base:v20231124-6a461ab5-dirty).

Create a kind cluster with:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
   "UserNamespacesSupport": true
nodes:
- role: control-plane
  image: kindest/node:latest

Run a pod something like:

apiVersion: v1
kind: Pod
metadata:
  name: userns
spec:
  restartPolicy: Never
  hostUsers: false
  containers:
  - name: debian
    image: debian
    command: ["sh"]
    args: ["-c", "sleep infinity"]

Fixes

sysfs

The first sysfs mount failed can be fixed by running:

docker exec kind-control-plane sh -c "mkdir /mnt/sysfs; mount -t sysfs none /mnt/sysfs"

This is because sysfs is mounted with "masks" -- the /sys/devices/virtual/dmi/id/product_name files which kind bind mounts over, except in that case the kernel does not let us mount a sysfs filesystem in a user namespace, because it is seen as masked. By (additionally) mounting sysfs elsewhere we can make the kernel's check succeed.

(Still needs some thought/testing as to whether that should be readonly or readwrite, I suspect it should be rw, but that does seem to go against systemd's container interface, but for good reason.)

/kind/bin permissions

This just looks like a Dockerfile mistake, the directory isn't executable. A simple:

docker exec kind-control-plane chmod 755 /kind/bin

Fixes it.

Anything else we need to know?:

Mostly filing an issue for tracking and so other people might find this based on errors, if they try to use it. I'll open some PRs.

Environment:

  • kind version: (use kind version): latest main
  • Runtime info: (use docker info or podman info): docker 20.10.25
  • OS (e.g. from /etc/os-release): NixOS 23.05 (Stoat)
  • Kubernetes version: (use kubectl version): v1.30.0-alpha.0.5+d61cbac69aae97
  • Any proxies or other special environment settings?: as above
@dgl dgl added the kind/bug Categorizes issue or PR as related to a bug. label Nov 28, 2023
@BenTheElder
Copy link
Member

I need to read on the runc DMZ option, we avoid non-defaults since kind is for testing the project first and foremost, the other build options we set elsewhere so far are compiling out unused snapshotters or things of that nature.

the directory permissions seem like an oversight

more generally we intend to upgrade runc + containerd but have to be careful about it. I'm sure we'll get on it eventually but we normally only get on prerelease versions when we need a critical bug fix

@BenTheElder BenTheElder added kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed kind/bug Categorizes issue or PR as related to a bug. labels Nov 28, 2023
@Andreagit97
Copy link

I faced the same failure with a similar setup.

  1. I created a custom kind base image with the following changes
diff --git a/images/base/Dockerfile b/images/base/Dockerfile
index 63060aee..5f1e6832 100644
--- a/images/base/Dockerfile
+++ b/images/base/Dockerfile
@@ -122,7 +122,7 @@ RUN eval "$(gimme "${GO_VERSION}")" \
 # stage for building containerd
 FROM go-build AS build-containerd
 ARG TARGETARCH GO_VERSION
-ARG CONTAINERD_VERSION="v1.7.18"
+ARG CONTAINERD_VERSION="v2.0.0-rc.3"
 ARG CONTAINERD_CLONE_URL="https://github.com/containerd/containerd"
 # we don't build with optional snapshotters, we never select any of these
 # they're not ideal inside kind anyhow, and we save some disk space
@@ -140,7 +140,7 @@ RUN git clone --filter=tree:0 "${CONTAINERD_CLONE_URL}" /containerd \
 # stage for building runc
 FROM go-build AS build-runc
 ARG TARGETARCH GO_VERSION
-ARG RUNC_VERSION="v1.1.13"
+ARG RUNC_VERSION="v1.2.0-rc.2"
 ARG RUNC_CLONE_URL="https://github.com/opencontainers/runc"
 RUN git clone --filter=tree:0 "${RUNC_CLONE_URL}" /runc \
     && cd /runc \
  1. I created a new kind node image based on the above base image and k8s v1.30
kind build node-image  --base-image="..." --type release v1.30.0
  1. I used the following config to create a kind cluster
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
   "UserNamespacesSupport": true
nodes:
- role: control-plane
  image: <above-bulit-image>
  1. I created this pod in the cluster
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  hostUsers: false
  containers:
  - name: nginx
    image: nginx:1.27.0
    ports:
    - containerPort: 80

The kubelet reported the following error (the same one described in the initial issue)

 Warning  FailedCreatePodSandBox  2m53s             kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox "a8e2d0f7722c1bcbe361325dc1c264c6d0fe524d3a3214a387c5494bfd83fccd": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "sysfs" to rootfs at "/sys": mount src=sysfs, dst=/sys, dstFd=/proc/thread-self/fd/8, flags=0xf: operation not permitted: unknown

I can confirm that the workaround provided by @dgl fixes the issue

docker exec kind-control-plane sh -c "mkdir /mnt/sysfs; mount -t sysfs none /mnt/sysfs"

To be honest, after this issue, I faced another one (exactly this containerd/containerd#10598) but this has probably nothing to do with KinD!

Environment

  • kind version: kind v0.24.0 go1.22.6 linux/amd64
  • custom kind-node image as described above
  • Kubernetes version: v1.30.0
  • runc version: 1.2.0-rc.2
  • containerd version: v2.0.0-rc.3
  • OS of the kind-node image: Debian GNU/Linux 12 (bookworm)
  • OS of the host running kind: Ubuntu 22.04.4 LTS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

3 participants