Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[potential regression in v0.12] unable to start container on /sys/fs/cgroup/* mount #4108

Closed
ljmf00 opened this issue Aug 4, 2023 · 13 comments · Fixed by #4308
Closed

[potential regression in v0.12] unable to start container on /sys/fs/cgroup/* mount #4108

ljmf00 opened this issue Aug 4, 2023 · 13 comments · Fixed by #4308

Comments

@ljmf00
Copy link

ljmf00 commented Aug 4, 2023

With the latest stable moby/builkit image buildx-stable-1, I get the following when trying to run a docker buildx build with my buildkit builder:

#0 0.092 runc run failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/foobar (via /proc/self/fd/6), flags: 0xf, data: foobar: invalid argument

Reverting to v0.11.6 works fine, suspecting this being the trigger of such failure #4003 .

@AkihiroSuda
Copy link
Member

Please provide a complete reproducer and the host info

@AkihiroSuda AkihiroSuda changed the title unable to start container on /sys/fs/cgroup/* mount [potential regression in v0.12] unable to start container on /sys/fs/cgroup/* mount Aug 5, 2023
@ljmf00
Copy link
Author

ljmf00 commented Aug 7, 2023

Distro: Ubuntu 20.04.4 LTS
Docker version 20.10.17
Linux kernel 5.4.0 (AWS image)
Cgroup Driver: cgroupfs
Cgroup Version: 1

No special kernel flags regarding cgroups. What additional information should I gather to help on reproduction?

The steps are pretty simple, create a builder with docker buildx create using the docker-container driver, keeping the other flags by default will get us the moby/buildkit:buildx-stable-1 image pulled.

After that, just do docker buildx build --no-cache --builder <builder>. Make sure the image is not cached, because it needs to start a container. When it tries to start the container to execute the requested layer command, it hangs with the following error:

#0 0.092 runc run failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/foobar (via /proc/self/fd/6), flags: 0xf, data: foobar: invalid argument

I don't know any additional configurations we do other than we are forced to use cgroups v1, due to dependency requirements.

@crazy-max
Copy link
Member

Looks similar to docker/buildx#1986 (comment)

@RealHarshThakur
Copy link

RealHarshThakur commented Sep 5, 2023

faced this just now. I am using the latest tag with remote driver configured on docker.. Got this error:

------
 > [builder 5/9] RUN go mod download:
0.154 runc run failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/6), flags: 0xf, data: openrc: invalid argument

Dockerfile:12
--------------------
  10 |     # cache deps before building and copying source so that we don't need to re-download as much
  11 |     # and so that source changes don't invalidate our downloaded layer
  12 | >>> RUN go mod download
  13 |
  14 |     # Copy the go source
--------------------
ERROR: failed to solve: process "/bin/sh -c go mod download" did not complete successfully: exit code: 1

For ref, the Dockerfile:

# Build the manager binary
FROM golang:1.20 as builder
ARG TARGETOS
ARG TARGETARCH

WORKDIR /workspace
# Copy the Go Modules manifests
COPY go.mod go.mod
COPY go.sum go.sum
# cache deps before building and copying source so that we don't need to re-download as much
# and so that source changes don't invalidate our downloaded layer
RUN go mod download

# Copy the go source
COPY cmd/main.go cmd/main.go
COPY api/ api/
COPY pkg/controller/ pkg/controller/

# Build
# the GOARCH has not a default value to allow the binary be built according to the host where the command
# was called. For example, if we call make docker-build in a local env which has the Apple Silicon M1 SO
# the docker BUILDPLATFORM arg will be linux/arm64 when for Apple x86 it will be linux/amd64. Therefore,
# by leaving it empty we can ensure that the container and binary shipped on it will have the same platform.
RUN CGO_ENABLED=0 GOOS=${TARGETOS:-linux} GOARCH=${TARGETARCH} go build -a -o manager cmd/main.go

# Use distroless as minimal base image to package the manager binary
# Refer to https://github.com/GoogleContainerTools/distroless for more details
FROM gcr.io/distroless/static:nonroot
WORKDIR /
COPY --from=builder /workspace/manager .
USER 65532:65532

ENTRYPOINT ["/manager"]

Pod spec (trimmed down), securityContext is privileged.


spec:
  containers:
  - args:
    - --addr
    - unix:///run/buildkit/buildkitd.sock
    - --addr
    - tcp://0.0.0.0:1234
    - --tlscacert
    - /certs/ca.crt
    - --tlscert
    - /certs/server.crt
    - --tlskey
    - /certs/server.key
    image: moby/buildkit:latest
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command:
        - buildctl
        - debug
        - workers
      failureThreshold: 3
      initialDelaySeconds: 1
      periodSeconds: 30
      successThreshold: 1
      timeoutSeconds: 1
    name: buildkitd
    ports:
    - containerPort: 1234
      name: buildkitd
      protocol: TCP
    securityContext:
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/buildkit
      name: config
    - mountPath: /var/lib/buildkit
      name: var-lib-buildkit
    - mountPath: /certs
      name: certs
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-mf2c5
      readOnly: true
  schedulerName: default-scheduler

@ljmf00
Copy link
Author

ljmf00 commented Sep 7, 2023

@AkihiroSuda can the referred change be reverted?

@AkihiroSuda
Copy link
Member

@AkihiroSuda can the referred change be reverted?

Probably.
Has anyone identified the regression commit?

@ljmf00
Copy link
Author

ljmf00 commented Sep 7, 2023

As I mentioned, probably this #4003

@AkihiroSuda
Copy link
Member

As I mentioned, probably this #4003

Maybe it should be just disabled for cgroup v1 hosts or something?
But I'm not sure, as I can't repro the issue with Ubuntu 20.04.

Can anybody provide a minimal Vagrantfile or Lima yaml to repro the issue?

@RealHarshThakur
Copy link

RealHarshThakur commented Sep 7, 2023

I've reverted back to old buildkit tag for now.
Here's my containerd config tho(seems like I'm on cgroupsv1), does this indicate what you suspect?

version = 2

[plugins."io.containerd.internal.v1.opt"]
  path = "/var/lib/rancher/k3s/agent/containerd"
[plugins."io.containerd.grpc.v1.cri"]
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"
  enable_selinux = false
  enable_unprivileged_ports = true
  enable_unprivileged_icmp = true
  sandbox_image = "rancher/mirrored-pause:3.6"

[plugins."io.containerd.grpc.v1.cri".containerd]
  snapshotter = "overlayfs"
  disable_snapshot_annotations = true

[plugins."io.containerd.grpc.v1.cri".cni]
  bin_dir = "/var/lib/rancher/k3s/data/24a53467e274f21ca27cec302d5fbd58e7176daf0a47a2c9ce032ee877e0979a/bin"
  conf_dir = "/var/lib/rancher/k3s/agent/etc/cni/net.d"


[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = false

mook-as added a commit to mook-as/alpine-lima that referenced this issue Sep 7, 2023
Buildkit 0.12.0 has issues with OpenRC + hybrid cgroups; see
moby/buildkit#4108

Signed-off-by: Mark Yen <mark.yen@suse.com>
@mook-as
Copy link
Contributor

mook-as commented Sep 7, 2023

I don't have a Lima YAML, but I have a shell script that runs lima (using the alpine template and downloading nerdctl-full-1.5.0 because that has all the things including buildkitd v0.12), if that helps?

(For GitHub reasons, it's attached as a .txt) repro.sh

@BenTheElder
Copy link

@AkihiroSuda xref kubernetes-sigs/kind#3277, I haven't had a chance to get into this far yet but it seems on alpine hosts (openrc?) enabling cgroupns=private causes similar issues for KIND.

Rancher Desktop is switching to cgroupsv2, which gets past runc failing but then the cgroups are seemingly not writeable within the container in the kind case. AIUI they think the buildkit issues are resolved by v2 however.

I think the root cause is related and the key thing to reproduce is to use an alpine VM, colima has this issue for example.

On my backlog to investigate further, but v1+cgroupns=private+runc seems to be totally broken on alpine/openrc. And v2 is at least broken for KIND.

@mook-as
Copy link
Contributor

mook-as commented Oct 3, 2023

Here's a Lima yaml that reproduces (based on Ubuntu 20.04):

Lima yaml for reproducing; uses system containerd
images:
  - location: "https://cloud-images.ubuntu.com/releases/20.04/release-20230922/ubuntu-20.04-server-cloudimg-amd64.img"
    arch: "x86_64"
    digest: "sha256:8ff74b99d636158fa10a0daf21c78c70227a13779013ac457050d86803540d61"
  - location: "https://cloud-images.ubuntu.com/releases/20.04/release-20230922/ubuntu-20.04-server-cloudimg-arm64.img"
    arch: "aarch64"
    digest: "sha256:9f4caa044824483baef2aeb7b239d07986a2a16f0dcc865de0d9e60deacf3843"
containerd:
  system: true
  user: false

provision:
- mode: boot
  script: |
    if ! [ -d /sys/fs/cgroup/pikachu ]; then
      mount -o remount,rw /sys/fs/cgroup
      mkdir /sys/fs/cgroup/pikachu
      mount -n -t cgroup -o none,nodev,noexec,nosuid,name=pikachu pikachu /sys/fs/cgroup/pikachu
      mount -o remount,ro /sys/fs/cgroup
    fi
- mode: user
  script: |
    mkdir -p /tmp/a
    echo FROM alpine > /tmp/a/Dockerfile
    echo RUN true >> /tmp/a/Dockerfile
    sudo nerdctl build /tmp/a
Some notes:
  • This doesn't happen with pure cgroups v2 (only hybrid mode)
  • This doesn't happen with rootless containerd
  • I copied the mount flags from OpenRC.

I do expect that this would be fixed if it's disabled for cgroups v1.

As a side note:
Rancher Desktop is seeing this (that's why I'm here), and our current (very unsatisfying) workaround is to roll back to buildkit <0.12. Going to cgroups v2 helped with this issue, but broke compatibility with other things that relied on cgroups v1 that we wanted to support for now.

mook-as added a commit to mook-as/buildkit that referenced this issue Oct 3, 2023
Fixes moby#4108

Signed-off-by: Mark Yen <mark.yen@suse.com>
@Gowiem
Copy link

Gowiem commented Oct 6, 2023

For those that need to downgrade until #4308 is merged and shipped, the command you're looking for is the following:

docker buildx create --use --driver-opt image=moby/buildkit:v0.11.6

jedevc pushed a commit to jedevc/buildkit that referenced this issue Oct 13, 2023
Fixes moby#4108

Signed-off-by: Mark Yen <mark.yen@suse.com>
(cherry picked from commit d48bf06)
Signed-off-by: Justin Chadwell <me@jedevc.com>
nxmatic pushed a commit to nxmatic/buildkit that referenced this issue Dec 3, 2023
Fixes moby#4108

Signed-off-by: Mark Yen <mark.yen@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants