Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman run with remote service is failing #14573

Closed
ryshoooo opened this issue Jun 11, 2022 · 16 comments · Fixed by #14787
Closed

Podman run with remote service is failing #14573

ryshoooo opened this issue Jun 11, 2022 · 16 comments · Fixed by #14787
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. macos MacOS (OSX) related remote Problem is in podman-remote

Comments

@ryshoooo
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Unable to run containers via a podman tcp service. The ability to pull and build is there, but not running.

Steps to reproduce the issue:

  1. In one terminal run docker run --privileged -p 8888:8888 quay.io/podman/stable:latest podman system service --time=0 tcp://0.0.0.0:8888

  2. Setup local podman to use the service podman system connection add local tcp://localhost:8888

  3. Run podman run docker.io/library/alpine echo hello

Describe the results you received:
An error occurred about some cgroups.

Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob sha256:b3c136eddcbf2003d3180787cef00f39d46b9fd9e4623178282ad6a8d63ad3b0
Copying blob sha256:b3c136eddcbf2003d3180787cef00f39d46b9fd9e4623178282ad6a8d63ad3b0
Copying config sha256:6e30ab57aeeef1ebca8ac5a6ea05b5dd39d54990be94e7be18bb969a02d10a3f
Writing manifest to image destination
Storing signatures
Error: error preparing container 0fab41034db4b59a0f1791aeda18a1ef185d38667da6f06019ca23341bbaaeb2 for attach: crun: writing file `/sys/fs/cgroup/libpod_parent/libpod-0fab41034db4b59a0f1791aeda18a1ef185d38667da6f06019ca23341bbaaeb2/cgroup.procs`: Operation not supported: OCI runtime error

Describe the results you expected:
The container would start and be running.

Additional information you deem important (e.g. issue happens only occasionally):
The commands podman pull, podman images list, podman ps and podman build seem to work just as expected. It's only podman run that always fails (also for other images). The logs from the service container contain:

time="2022-06-11T13:09:44Z" level=warning msg="Failed to add conmon to cgroupfs sandbox cgroup: error creating cgroup path /libpod_parent/conmon: write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy"

Output of podman version:

Client:       Podman Engine
Version:      4.1.0
API Version:  4.1.0
Go Version:   go1.18.1
Built:        Thu May  5 22:07:47 2022
OS/Arch:      darwin/arm64

Server:       Podman Engine
Version:      4.1.0
API Version:  4.1.0
Go Version:   go1.18.2
Built:        Mon May 30 18:04:32 2022
OS/Arch:      linux/arm64

Output of podman info --debug:

host:
  arch: arm64
  buildahVersion: 1.26.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-2.fc36.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpuUtilization:
    idlePercent: 91
    systemPercent: 2.72
    userPercent: 6.29
  cpus: 4
  distribution:
    distribution: fedora
    variant: container
    version: "36"
  eventLogger: file
  hostname: 43d769c5d79b
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.10.104-linuxkit
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 1026088960
  memTotal: 10434674688
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.4.5-1.fc36.aarch64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.5
      commit: c381048530aa750495cf502ddb7181f2ded5b400
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: tcp://0.0.0.0:8888
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.aarch64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 930541568
  swapTotal: 1073737728
  uptime: 210h 25m 17.05s (Approximately 8.75 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 0
    stopped: 1
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /var/lib/shared
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.8.1-3.fc36.aarch64
      Version: |-
        fusermount3 version: 3.10.5
        fuse-overlayfs: version 1.8.1
        FUSE library version 3.10.5
        using FUSE kernel interface version 7.31
    overlay.mountopt: nodev,fsync=0
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 62725623808
  graphRootUsed: 35777851392
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.1.0
  Built: 1653926672
  BuiltTime: Mon May 30 16:04:32 2022
  GitCommit: ""
  GoVersion: go1.18.2
  Os: linux
  OsArch: linux/arm64
  Version: 4.1.0

Package info (e.g. output of rpm -q podman or apt list podman):

/opt/homebrew/Cellar/podman/4.1.0/bin/podman
/opt/homebrew/Cellar/podman/4.1.0/bin/podman-mac-helper
/opt/homebrew/Cellar/podman/4.1.0/bin/podman-remote
/opt/homebrew/Cellar/podman/4.1.0/etc/bash_completion.d/podman
/opt/homebrew/Cellar/podman/4.1.0/libexec/gvproxy
/opt/homebrew/Cellar/podman/4.1.0/share/fish/vendor_completions.d/podman.fish
/opt/homebrew/Cellar/podman/4.1.0/share/man/ (163 files)
/opt/homebrew/Cellar/podman/4.1.0/share/zsh/site-functions/_podman

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):
Using Arm64 MacOS platform.

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 11, 2022
@github-actions github-actions bot added macos MacOS (OSX) related remote Problem is in podman-remote labels Jun 11, 2022
@mheon
Copy link
Member

mheon commented Jun 11, 2022

If you do a docker exec into the container with Podman and run podman run (locally, not using the API) does it work?

@rhatdan @giuseppe PTAL

@ryshoooo
Copy link
Author

If you do a docker exec into the container with Podman and run podman run (locally, not using the API) does it work?

@rhatdan @giuseppe PTAL

Yes it does, I can run podman run alpine echo hello inside of the container and it works just fine.

@rhatdan
Copy link
Member

rhatdan commented Jun 12, 2022

Run podman run docker.io/library/alpine echo hello

You are talking podman -remote here?


If you run podman -remote run  docker.io/library/alpine echo hello inside of the container talking to the podman service, does it work

@ryshoooo
Copy link
Author

ryshoooo commented Jun 12, 2022

Run podman run docker.io/library/alpine echo hello

You are talking podman -remote here?


If you run podman -remote run  docker.io/library/alpine echo hello inside of the container talking to the podman service, does it work

No, I get the same behavior.

richardnemeth@Richards-MBP-2 workspace % podman system connection list
Name                         URI                                                         Identity                                          Default
local                        tcp://localhost:8888                                                                                          true
richardnemeth@Richards-MBP-2 workspace % podman --remote run alpine echo hi
Resolved "alpine" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob sha256:b3c136eddcbf2003d3180787cef00f39d46b9fd9e4623178282ad6a8d63ad3b0
Copying blob sha256:b3c136eddcbf2003d3180787cef00f39d46b9fd9e4623178282ad6a8d63ad3b0
Copying config sha256:6e30ab57aeeef1ebca8ac5a6ea05b5dd39d54990be94e7be18bb969a02d10a3f
Writing manifest to image destination
Storing signatures
Error: error preparing container 544332fc16214619df439d8e8be73161bc49ae3ba5b913ff22f2fe99ef2e552f for attach: crun: writing file `/sys/fs/cgroup/libpod_parent/libpod-544332fc16214619df439d8e8be73161bc49ae3ba5b913ff22f2fe99ef2e552f/cgroup.procs`: Operation not supported: OCI runtime error

Interestingly though I did the same setup in minikube and it works there without any problem, the main difference is that my base environment is ubuntu, not macos. Scratch that, the same behavior appears there as well, it just happens during the podman build instead of podman run, if the Containerfile contains a RUN statement. Quite strange.

@ryshoooo
Copy link
Author

I believe this to be a problem with MacOS/Arm64 only. I tried to replicate the problem on purely linux amd64, but there everything works as intended.

@giuseppe
Copy link
Member

can you show me the cgroup where the podman daemon is running in?

You can look at it through the /proc/$PODMAN_DAEMON_PID/cgroup file.

@ryshoooo
Copy link
Author

ryshoooo commented Jun 13, 2022

can you show me the cgroup where the podman daemon is running in?

You can look at it through the /proc/$PODMAN_DAEMON_PID/cgroup file.

I'm not sure what the PODMAN_DAEMON_PID is in the container, but I've checked all of the available ones, and all of them contain

0::/

@1player
Copy link

1player commented Jun 29, 2022

I believe this to be a problem with MacOS/Arm64 only. I tried to replicate the problem on purely linux amd64, but there everything works as intended.

Like with #14517 this is not a macOS only issue, and I have no idea how to replicate. But I'm seeing this exact issue using VSCode in a toolbox on a Linux host. Opening a dev container fails with:

Error: crun: writing file `/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/user.slice/user-libpod_pod_06d32d4c30512b8d0e66123d687a4d481f7f66c2e8886f61a462a1aef369a8f6.slice/libpod-94c1bbd344d6e435fe759de566ee3c070af535ef0a434f1f59a8387542ec57a0.scope/container/cgroup.procs`: No such file or directory: OCI runtime attempted to invoke a command that was not found

It has not happened before, I have encountered it a few times, and it's always gone away by itself (or perhaps after some reboot or recreating the container, no idea).

Whatever it is, there is some heisenbug going on with podman remote which is not macOS only.

EDIT: doing podman --remote run alpine echo hi as suggested above works. In fact all my containers work, but somehow vscode isn't able to exec into one of those containers, failing with that mysterious error.

@rhatdan
Copy link
Member

rhatdan commented Jun 29, 2022

@giuseppe PTAL

@giuseppe
Copy link
Member

that happens because podman service is running in the parent cgroup so child cgroups cannot be used.

You can fix it manually with (the list of cgroup controllers could be different on your system):

$ podman exec -lti bash
# mkdir /sys/fs/cgroup/init
# echo -cpuset -cpu -pids > /sys/fs/cgroup/cgroup.subtree_control
# cat /sys/fs/cgroup/cgroup.procs > /sys/fs/cgroup/init/cgroup.procs
# echo +memory +cpuset +cpu +pids > /sys/fs/cgroup/cgroup.subtree_control

@1player
Copy link

1player commented Jun 30, 2022

But how would this happen?

@giuseppe
Copy link
Member

docker run --privileged -p 8888:8888 quay.io/podman/stable:latest podman system service --time=0 tcp://0.0.0.0:8888

The nested podman system service is running in the root cgroup

@giuseppe
Copy link
Member

Podman could do that automatically, but I'd prefer to avoid the cost of reading and parsing /proc/self/cgroup for every Podman invocation, since it is a rare corner case.

Maybe we could do that just for podman system service

@giuseppe
Copy link
Member

I am mostly worried about network file systems (e.g. a volume on NFS) where reading the xattr could add a significant cost.

@rhatdan
Copy link
Member

rhatdan commented Jun 30, 2022

Seems like doing it for podman system service, makes the most sense.

giuseppe added a commit to giuseppe/libpod that referenced this issue Jun 30, 2022
at startup, when running on a cgroup v2 system, check if the current
process is running in the root cgroup and move it to a sub-cgroup,
otherwise Podman is not able to create cgroups and move processes
there.

Closes: containers#14573

[NO NEW TESTS NEEDED] it needs nested podman

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member

PR here: #14787

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. macos MacOS (OSX) related remote Problem is in podman-remote
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants