Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman socket stuck in 4.5.1 #18862

Closed
p-fruck opened this issue Jun 12, 2023 · 11 comments
Closed

podman socket stuck in 4.5.1 #18862

p-fruck opened this issue Jun 12, 2023 · 11 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@p-fruck
Copy link
Contributor

p-fruck commented Jun 12, 2023

Issue Description

Since running Podman version 4.5.1 I am facing an issue where the podman socket just randomly stops responding to docker-compose and has to be restarted

Steps to reproduce the issue

Steps to reproduce the issue

  1. Create a random compose project
  2. export DOCKER_HOST=unix://${XDG_RUNTIME_DIR}/podman/podman.sock
  3. Try to execute docker-compose <up|down> a couple of times
  4. Notice podman socket getting stuck

Describe the results you received

The command just hangs and doesn't return any output

Describe the results you expected

Compose working as expected

podman info output

host:
  arch: amd64
  buildahVersion: 1.30.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-2.fc38.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 92.95
    systemPercent: 1.63
    userPercent: 5.42
  cpus: 8
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: silverblue
    version: "38"
  eventLogger: journald
  hostname: spectre
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
  kernel: 6.3.6-200.fc38.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 306589696
  memTotal: 7916462080
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.8.5-1.fc38.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.5
      commit: b6f80f766c9a89eb7b1440c0a70ab287434b17ed
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-12.fc38.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 5483261952
  swapTotal: 7915696128
  uptime: 7h 40m 54.00s (Approximately 0.29 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/home/philipp/.config/containers/storage.conf
  containerStore:
    number: 26
    paused: 0
    running: 7
    stopped: 19
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/philipp/.local/share/containers/storage
  graphRootAllocated: 998483427328
  graphRootUsed: 127509688320
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 58
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /var/home/philipp/.local/share/containers/storage/volumes
version:
  APIVersion: 4.5.1
  Built: 1685123928
  BuiltTime: Fri May 26 19:58:48 2023
  GitCommit: ""
  GoVersion: go1.20.4
  Os: linux
  OsArch: linux/amd64
  Version: 4.5.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Docker Compose version v2.17.2

Additional information

No response

@p-fruck p-fruck added the kind/bug Categorizes issue or PR as related to a bug. label Jun 12, 2023
@vrothberg
Copy link
Member

Thanks for reaching out, @p-fruck.

Can you share the compose file you used to reproduce the issue?

@Luap99
Copy link
Member

Luap99 commented Jun 13, 2023

Also if the service hangs again can you kill -ABRT <podman service PID> and then take a look in the service logs, it should contain the full stack trace so we can see where in the code it hangs.

@p-fruck
Copy link
Contributor Author

p-fruck commented Jun 13, 2023

I was able to reproduce the behaviour with this minimal compose file:

version: "3"
services:
  web:
    image: nginx:alpine

Commands used:

systemctl --user restart podman
docker-compose up -d
# wait >5 seconds
docker-compose down

Running journalctl -xe --user -u podman:

Jun 13 21:39:07 spectre podman[14345]: @ - - [13/Jun/2023:21:39:07 +0200] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22com.docker.compose.oneoff%3DFalse%22%3Atrue%2C%22com.docker.compose.project%3Dpodman%22%3Atrue%7D%7D HTTP/1.1" 200 1464 "" "Docker-Client/unknown-version (linux)"
Jun 13 21:39:07 spectre podman[14345]: time="2023-06-13T21:39:07+02:00" level=debug msg="IdleTracker:idle 1m+0h/2t connection(s)" X-Reference-Id=0xc000015308
Jun 13 21:39:07 spectre podman[14345]: time="2023-06-13T21:39:07+02:00" level=debug msg="IdleTracker:closed 1m+0h/2t connection(s)" X-Reference-Id=0xc000015308
Jun 13 21:39:12 spectre podman[14345]: time="2023-06-13T21:39:12+02:00" level=debug msg="API service(s) shutting down, idle for 5s"
Jun 13 21:39:12 spectre podman[14345]: time="2023-06-13T21:39:12+02:00" level=debug msg="API service shutdown, 0/2 connection(s)"
Jun 13 21:39:12 spectre podman[14345]: time="2023-06-13T21:39:12+02:00" level=debug msg="API service forced shutdown, ignoring timeout Duration"
Jun 13 21:39:12 spectre podman[14345]: time="2023-06-13T21:39:12+02:00" level=debug msg="Called service.PersistentPostRunE(/usr/bin/podman --log-level=debug system service)"
Jun 13 21:39:12 spectre podman[14345]: time="2023-06-13T21:39:12+02:00" level=debug msg="Shutting down engines"

The entire service is actually being shut down after a short time of inactivity, so I am not able to execute the kill command as the process is already stopped

@julioln
Copy link

julioln commented Jun 14, 2023

I've seen this issue with socket activation on systemd 253.5-1 (Arch)

I use rootless podman via socket, using systemd as such: systemctl --user start podman.socket. After updating systemd to 253.5-1, the first activation of podman.service via socket works, but after the server shuts down due to inactivity the service does not activate again. Forcing a restart with systemctl --user restart podman.socket fixes it, until the server is shut down again.

Downgrading systemd to 253.4-1 fixes the issue. Running podman system service -t 0 in another terminal is also a workaround.

@vrothberg
Copy link
Member

Thanks for sharing, @julioln!

@p-fruck, could you run the Podman socket manually via podman system service -t0 and try to reproduce? If the issue does not appear there, it would support @julioln's observation of systemd socket-activation bug.

@vrothberg
Copy link
Member

FWIW, I am unable to reproduce with systemd-253.5-1.fc38

@julioln
Copy link

julioln commented Jun 15, 2023

This systemd version also seems be causing problems with libvirtd socket-activation, looks like there is a PR with a fix out already systemd/systemd#27953

@yajo
Copy link

yajo commented Jun 19, 2023

if the service hangs again can you kill -ABRT <podman service PID> and then take a look in the service logs

This can't be done because podman service PID is stopped:

> systemctl --user status podman.service
[...]
    Process: 135690 ExecStart=/usr/bin/podman $LOGGING system service (code=exited, status=0/SUCCESS)
   Main PID: 135690 (code=exited, status=0/SUCCESS)
[...]

This also resurrect the service:

podman stop $(podman ps -aq)

Some live stuff. The button I click on vscode is actually using docker compose to talk to podman, and it has been working perfectly until some recent update:

Grabacion.de.pantalla.desde.2023-06-19.12-45-33.webm.mp4

@EvaristeGalois11
Copy link
Contributor

I encountered the same problem with testcontainers: if the test running takes more time than the podman system service timeout when it tries to reconnect to it everything is completely stuck. Even pinging with curl the socket hangs indefinitely.

Luckily the fix that @julioln mentioned is already being backported to Arch https://gitlab.archlinux.org/archlinux/packaging/packages/systemd/-/commit/997fc66a38dfcc25363534b94bae1b427b6a9c0e

I installed systemd and systemd-libs 253.5-2 from the testing repo and everything started working again!

If you're on Arch too or can compile systemd from source I would highly suggest trying if the patch fixes your cases too.

@Luap99
Copy link
Member

Luap99 commented Jun 20, 2023

If this is a systemd bug then there is nothing we can do here, as @julioln and @vrothberg pointed out you can run the service manually without systemd to work around that for the time being: podman system service -t0.

@Luap99 Luap99 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 20, 2023
@julioln
Copy link

julioln commented Jun 20, 2023

I can confirm the issue is gone with systemd 253.5-2

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

6 participants