Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting a container via the socket causes all socket calls to hang after that (podman 4.6.0 and systemd 252-16) #19625

Closed
carlosrodfern opened this issue Aug 14, 2023 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@carlosrodfern
Copy link

carlosrodfern commented Aug 14, 2023

Issue Description

Starting a container via the socket causes all socket calls to hang after that. This happens specifically with systemd 252-16.

Steps to reproduce the issue

Steps to reproduce the issue

The environment is a CentOS Stream 9 VM (e.g. qcow2). On a non-root user.

  1. Install specifically this version (the version 4.6.0-3 has other issue https://bugzilla.redhat.com/show_bug.cgi?id=2231975): sudo dnf install podman-2:4.6.0-1.el9
  2. Ensure systemd 252-15 is the one installed: sudo dnf install systemd-252-15.el9
  3. Enable socket systemctl --user enable --now podman.socket
  4. Start nginx podman --url unix://run/user/$(id -u)/podman/podman.sock run --name nginx-test -p 8080:80 -d docker.io/nginx
  5. List podman --url unix://run/user/$(id -u)/podman/podman.sock ps. Run it multiple times. They all work.
  6. Remove test container: podman container rm -f nginx-test
  7. Update systemd to 252-16: sudo dnf install systemd-252-16.el9
  8. Reboot
  9. Start nginx podman --url unix://run/user/$(id -u)/podman/podman.sock run --name nginx-test -p 8080:80 -d docker.io/nginx
  10. List podman --url unix://run/user/$(id -u)/podman/podman.sock ps. This one hangs.

Sometimes, the step 10 has to be run multiple times before it hangs for good.

What I'm seeing is that the processes started in step 9 are included in the podman.service cgroup, and even though podman itself already exited, the podman.service appears as "active (running)", which seems to prevent systemd from starting podman to attend the subsequent socket requests.

podman.service - Podman API Service
     Loaded: loaded (/usr/lib/systemd/user/podman.service; disabled; preset: disabled)
     Active: active (running) since Mon 2023-08-14 18:03:31 EDT; 9min ago
TriggeredBy: ● podman.socket
       Docs: man:podman-system-service(1)
    Process: 1516 ExecStart=/usr/bin/podman $LOGGING system service (code=exited, status=0/SUCCESS)
   Main PID: 1516 (code=exited, status=0/SUCCESS)
      Tasks: 13 (limit: 10774)
     Memory: 30.2M
        CPU: 75ms
     CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/podman.service
             ├─1526 /usr/bin/slirp4netns --disable-host-loopback --mtu=65520 --enable-sandbox --enable-seccomp --enable-ipv6 -c -r 3 -e 4 --netns-type=path /run/user/1000/n>
             ├─1529 rootlessport
             └─1538 rootlessport-child

Describe the results you received

It hangs when using the socket to list containers

Describe the results you expected

I can list containers using the socket.

podman info output

host:
  arch: amd64
  buildahVersion: 1.31.0
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-2.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: 3c1e4548e778d692b2b7c2848d83708223a148b9'
  cpuUtilization:
    idlePercent: 99.61
    systemPercent: 0.2
    userPercent: 0.19
  cpus: 2
  databaseBackend: boltdb
  distribution:
    distribution: '"centos"'
    version: "9"
  eventLogger: journald
  freeLocks: 2047
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.14.0-333.el9.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 1349750784
  memTotal: 1864036352
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.5.0-2.el9.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.5.0
    package: netavark-1.5.0-2.el9.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.5.0
  ociRuntime:
    name: crun
    package: crun-1.8.6-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.6
      commit: 73f759f4a39769f60990e7d225f561b4f4f06bcf
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-3.el9.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 0
  swapTotal: 0
  uptime: 0h 10m 16.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /home/centos/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/centos/.local/share/containers/storage
  graphRootAllocated: 8321499136
  graphRootUsed: 1567428608
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/centos/.local/share/containers/storage/volumes
version:
  APIVersion: 4.6.0
  Built: 1689909559
  BuiltTime: Thu Jul 20 23:19:19 2023
  GitCommit: ""
  GoVersion: go1.20.4
  Os: linux
  OsArch: linux/amd64
  Version: 4.6.0


### Podman in a container

No

### Privileged Or Rootless

Rootless

### Upstream Latest Release

No

### Additional environment details

The environment is a CentOS Stream 9 VM (e.g. [qcow2](https://cloud.centos.org/centos/9-stream/x86_64/images/CentOS-Stream-GenericCloud-9-20220829.0.x86_64.qcow2)). On a non-root user.



### Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting
@carlosrodfern carlosrodfern added the kind/bug Categorizes issue or PR as related to a bug. label Aug 14, 2023
@carlosrodfern carlosrodfern changed the title starting a container via the socket causes all socket calls to hang after that (podman 4.6.0 and systemd 252-16) Starting a container via the socket causes all socket calls to hang after that (podman 4.6.0 and systemd 252-16) Aug 14, 2023
@Luap99
Copy link
Member

Luap99 commented Aug 15, 2023

This sounds like #18862 which is a systemd bug not podman.

@carlosrodfern
Copy link
Author

carlosrodfern commented Aug 15, 2023

I could post it in systemd instead. Just want to make sure it is not that there was a change in systemd that forces podman to change some parameters in podman.socket and podman.service to get it working again.

@carlosrodfern
Copy link
Author

I reported it here as well: systemd/systemd#28843

@Luap99
Copy link
Member

Luap99 commented Aug 15, 2023

All I can tell that there was this issue with systemd in 253.5 as mention in the issue and it has since been fixed. I have no idea if the problematic patches made into centos9, the best way would be to report this would be the centos9 bug tracker not the upstream bug tracker unless you can reproduce with the latest version.

@Luap99
Copy link
Member

Luap99 commented Aug 15, 2023

Closing as I don't think this is a podman problem.

@Luap99 Luap99 closed this as not planned Won't fix, can't repro, duplicate, stale Aug 15, 2023
@carlosrodfern
Copy link
Author

@Luap99 , I reported it in the centos 9 bugzilla as well. This issue appears specifically between 252-15 and 252-16. I'll be looking into reported issues and fixes in systemd to see if it is the same.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Nov 14, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

2 participants