Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interrupted build leaves hard-to-remove containers running #14523

Open
muhmuhten opened this issue Jun 7, 2022 · 7 comments
Open

Interrupted build leaves hard-to-remove containers running #14523

muhmuhten opened this issue Jun 7, 2022 · 7 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@muhmuhten
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

In the vein of #11472

Steps to reproduce the issue:

  1. podman run -d busybox sleep 1d
  2. printf '%s\n' 'FROM busybox' 'RUN cat /dev/zero > a' | podman build -f - /var/empty
  3. ^C
  4. podman image prune -af --external (This doesn't do anything)
  5. podman container prune -f (This doesn't do anything)
  6. podman system prune -af (This doesn't do anything)
  7. watch du -hd1 ~/.local/share/containers/storage

Describe the results you received:

None of the prune commands remove anything. Disk usage continues rising as fast as the build container can write zeroes.

Describe the results you expected:

Interrupting podman build should not have left the build container alive!

Failing that, I'd expect to have a more obviously reasonable clean up build containers than the handful I could find:

  • Invoke buildah rm directly. This is the most obviously reasonable way to recover from the situation. Unfortunately, CoreOS does not ship the buildah CLI, so this isn't an option there.
  • Do some kind of dance with podman ps --external and podman rm the offending containers. My best attempt is podman rm -f $(podman ps --external -qf status=unknown), which seems hideously obscure and potentially dangerous.
  • podman rmi -f build container's image. This is kind of a bad option, because that image can be an ancestor of other containers you don't want to be deleting (e.g. the sleep container in above example).
  • podman system reset. This does make the problem go away, but has obvious consequences.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Client:       Podman Engine
Version:      4.1.0
API Version:  4.1.0
Go Version:   go1.18
Built:        Fri May  6 16:15:54 2022
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.26.1
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-2.fc36.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpuUtilization:
    idlePercent: 97.39
    systemPercent: 2.03
    userPercent: 0.58
  cpus: 2
  distribution:
    distribution: fedora
    variant: coreos
    version: "36"
  eventLogger: journald
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.17.9-300.fc36.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 1483071488
  memTotal: 2059120640
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.4.4-1.fc36.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.4
      commit: 6521fcc5806f20f6187eb933f9f45130c86da230
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 2145906688
  swapTotal: 2147479552
  uptime: 40m 55.85s
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphRootAllocated: 57444118528
  graphRootUsed: 90775552
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/user/1000/containers
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 4.1.0
  Built: 1651853754
  BuiltTime: Fri May  6 16:15:54 2022
  GitCommit: ""
  GoVersion: go1.18
  Os: linux
  OsArch: linux/amd64
  Version: 4.1.0

Package info (e.g. output of rpm -q podman or apt list podman):

podman-4.1.0-1.fc36.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 7, 2022
@mheon
Copy link
Member

mheon commented Jun 9, 2022

We've seen quite a few issues with Build leaving containers around, but I've never heard of it leaving running containers. That's very problematic for Podman as we don't really have a way to stop them ourselves, not knowing the PID of the container. @nalind Interrupting the build should kill the build container, right?

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jul 11, 2022

@giuseppe @flouthoc PTAL
I have verified that this still happens, we need buildah run to notice when its parent dies, it should also die.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@flouthoc flouthoc self-assigned this Aug 11, 2022
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Sep 12, 2022

@flouthoc still working on this one?

@g-suraj
Copy link

g-suraj commented Sep 10, 2024

This seems to be similar to what I was running into here: #23683

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants