Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman ps sometimes fails if container process is stopped and removed during call #11810

Closed
cvennel opened this issue Sep 30, 2021 · 0 comments · Fixed by #11820
Closed

podman ps sometimes fails if container process is stopped and removed during call #11810

cvennel opened this issue Sep 30, 2021 · 0 comments · Fixed by #11820
Assignees
Labels
In Progress This issue is actively being worked by the assignee, please do not work on this at this time. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@cvennel
Copy link

cvennel commented Sep 30, 2021

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

podman ps can sometimes fail if a container is being removed during the method call.

Steps to reproduce the issue:

  1. In one terminal window run:
    while true; do podman ps --format="{{.Names}} > /dev/null || echo "FAILED WITH CODE $?"; done (This seems to more reliably occur when using the --format argument and while podman ps -w 1 may be sufficient, the above seems more likely to run into this race condition)

  2. In a second terminal window start a bunch of arbitrary containers
    for x in {1..30}; do podman run --rm -dt docker.io/alpine; done (I think the --rm flag is important to delete the container from the database that podman ps is using to get the info about the container)

  3. Watch output in the first terminal while killing all the arbitrary containers in the second podman kill --all (For me, this shows the error most of the time, though occasionally we don't hit the race condition)

Describe the results you received:
running the above steps gives me:

Error: container <sha> does not exist in database: no such container
FAILED WITH CODE 125

Describe the results you expected:

I expected that the podman ps command would output running processes. At worst case if a container is removed while podman ps is running, I would expect output to list the container even though it stopped, I wouldn't expect the entire command to error out.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

podman version 3.2.3

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.21.3
  cgroupControllers: []
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.29-1.module_el8.4.0+886+c9a8d9ad.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.29, commit: 97bba1e91aaab5be2e93bacd34ec4e66655a02ae'
  cpus: 4
  distribution:
    distribution: '"centos"'
    version: "8"
  eventLogger: file
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 4.18.0-240.22.1.el8_3.x86_64
  linkmode: dynamic
  memFree: 4867231744
  memTotal: 9193594880
  ociRuntime:
    name: runc
    package: runc-1.0.0-74.rc95.module_el8.4.0+886+c9a8d9ad.x86_64
    path: /usr/bin/runc
    version: |-
      runc version spec: 1.0.2-dev
      go: go1.15.14
      libseccomp: 2.5.1
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.8-1.module_el8.4.0+641+6116a774.x86_64
    version: |-
      slirp4netns version 1.1.8
      commit: d361001f495417b880f20329121e3aa431a8f90f
      libslirp: 4.3.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.1
  swapFree: 4764725248
  swapTotal: 4764725248
  uptime: 26h 58m 44.96s (Approximately 1.08 days)
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /home/cvennel/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.6-1.module_el8.4.0+886+c9a8d9ad.x86_64
      Version: |-
        fusermount3 version: 3.2.1
        fuse-overlayfs: version 1.6
        FUSE library version 3.2.1
        using FUSE kernel interface version 7.26
  graphRoot: /home/cvennel/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 2
  runRoot: /run/user/1000/containers
  volumePath: /home/cvennel/.local/share/containers/storage/volumes
version:
  APIVersion: 3.2.3
  Built: 1632432139
  BuiltTime: Thu Sep 23 17:22:19 2021
  GitCommit: ""
  GoVersion: go1.15.14
  OsArch: linux/amd64
  Version: 3.2.3

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.2.3-0.11.module_el8.4.0+942+d25aada8.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

No

Additional environment details (AWS, VirtualBox, physical, etc.):
Reproduced on VirtualBox & Physical

@mheon mheon added the kind/bug Categorizes issue or PR as related to a bug. label Sep 30, 2021
@jwhonce jwhonce self-assigned this Sep 30, 2021
@jwhonce jwhonce added the In Progress This issue is actively being worked by the assignee, please do not work on this at this time. label Sep 30, 2021
jwhonce added a commit to jwhonce/podman that referenced this issue Oct 1, 2021
* Ignore condition when containers are removed while listing them for
ps output.

  No tests added at this time as they would create a race condition for
CI.

* Updated godocs

See containers#11810 for reproducer.

Fixes containers#11810

Signed-off-by: Jhon Honce <jhonce@redhat.com>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
In Progress This issue is actively being worked by the assignee, please do not work on this at this time. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants