Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System boot hangs indefinitely on unclean shutdown with transient mode #22984

Closed
lambinoo opened this issue Jun 12, 2024 · 0 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@lambinoo
Copy link
Contributor

lambinoo commented Jun 12, 2024

Issue Description

I have a setup with multiple quadlet files setup to manage long term services, one shot jobs, pods and volumes. All of this is running on a CentOS 9 platform, with podman in transient mode and a separate filesystem for container storage.

This is running in a system where we can have unclean shutdowns quite frequently.

We've encountered a bug quite recently, where the system seems to hang indefinitely at boot, waiting on a pod/volume/oneshot container service from quadlet forever. Current workaround is to install appropriate timeouts, and have systemd restart the services in that case. This seem to happen after an unclean shutdown.

I have opened a PR that attempt to fix that issue: #22985

Steps to reproduce the issue

  1. Install the quadlet files linked to the issue on the system in /etc/containers/systemd, reboot the system once and wait for all the services to be
  2. Hard-Reboot the system (eg. reboot -f)
  3. Login and run systemctl list-jobs to observe that either the pod or volume service are hanging the system

Quadlet files:

# pod.pod
[Pod]
PodName=mypod
# myvolume.volume
[Unit]
Description=Create volume

[Volume]
Copy=false
GlobalArgs=--log-level=debug
# cntr.container
[Container]
Image=docker.io/library/ubuntu:latest
Volume=myvolume.volume:/vol
Pod=pod.pod
Exec=sleep infinity

[Install]
WantedBy=multi-user.target

Describe the results you received

System hangs forever during the boot phase

Describe the results you expected

Boot completes without hanging

podman info output

host:
  arch: amd64
  buildahVersion: 1.36.0
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.12-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: 7ba5bd6c81ff2c10e07aee8c4281d12a2878fa12'
  cpuUtilization:
    idlePercent: 75.44
    systemPercent: 5.62
    userPercent: 18.93
  cpus: 12
  databaseBackend: sqlite
  distribution:
    distribution: centos
    version: "9"
  eventLogger: journald
  freeLocks: 2031
  hostname: HOSTNAME
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.14.0-430.el9.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 6276349952
  memTotal: 16339382272
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.9.0-1.el9.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.9.0
    package: netavark-1.11.0-1.el9.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.11.0
  ociRuntime:
    name: crun
    package: crun-1.15-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.15
      commit: e6eacaf4034e84185fd8780ac9262bbf57082278
      rundir: /run/user/0/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20231204.gb86afe3-1.el9.x86_64
    version: |
      pasta 0^20231204.gb86afe3-1.el9.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.3.1-1.el9.x86_64
    version: |-
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 8589930496
  swapTotal: 8589930496
  uptime: 3h 34m 39.00s (Approximately 0.12 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 8
    paused: 0
    running: 8
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 554240225280
  graphRootUsed: 18993541120
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 15
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.1.0
  Built: 1717411100
  BuiltTime: Mon Jun  3 10:38:20 2024
  GitCommit: ""
  GoVersion: go1.22.3 (Red Hat 1.22.3-2.el9)
  Os: linux
  OsArch: linux/amd64
  Version: 5.1.0

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

No

Additional environment details

Podman 5.1.0 in transient mode on a Centos 9 based, with a separate filesystem for the container storage in /var/lib/containers. Unclean shutdowns are frequent.

Additional information

No response

@lambinoo lambinoo added the kind/bug Categorizes issue or PR as related to a bug. label Jun 12, 2024
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 8, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Oct 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

1 participant