podman exec: periodic 30-second CPU-hogging hangs #10701

edsantiago · 2021-06-16T20:24:52Z

$ ./bin/podman run -d --name foo quay.io/libpod/testimage:20210610 top
0af6b438059a10b47bc78d15226433c3a794cdb32754c608dc1756ad9dd5901d
$ while :;do ./bin/podman exec foo date;done
Wed Jun 16 20:20:57 UTC 2021
Wed Jun 16 20:20:58 UTC 2021
Wed Jun 16 20:20:58 UTC 2021

Now just keep watching, it might take a minute or so. When you hear your CPU fan go loud, hit RETURN so you can mark the spot where it hangs. The next entry will be timestamped 30 seconds after the previous. Example:

Wed Jun 16 20:21:13 UTC 2021
Wed Jun 16 20:21:13 UTC 2021

Wed Jun 16 20:21:41 UTC 2021

Wed Jun 16 20:22:11 UTC 2021
Wed Jun 16 20:22:12 UTC 2021

master @ 2509a81, root and rootless.

The text was updated successfully, but these errors were encountered:

rhatdan · 2021-06-16T20:40:30Z

One would think we are leaking something. Perhaps mount points.

Have you tried this as root? If this works fine as root and not as rootless, that might point to something with fuse-overlay.

edsantiago · 2021-06-16T20:43:47Z

Yep, tried root and rootless (see final line of OP). No leaked mounts that I can see (I even checked while the exec hang is in place; no new mounts)

mheon · 2021-06-16T21:05:44Z

Can't reproduce. Had it running for the last 5 minutes without any effect? F34, latest main branch.

edsantiago · 2021-06-16T21:10:59Z

Oh, weird. It reproduces within seconds for me as root, and ~1 minute rootless. Also f34, 5.11.17-300.fc34. With root I can't even get three iterations of the loop without a hang.

edsantiago · 2021-06-16T21:12:33Z

$ sudo ./bin/podman info
host:
  arch: amd64
  buildahVersion: 1.21.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.27-2.fc34.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.27, commit: '
  cpus: 8
  distribution:
    distribution: fedora
    version: "34"
  eventLogger: journald
  hostname: f.edsantiago.com
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.11.17-300.fc34.x86_64
  linkmode: dynamic
  memFree: 2526560256
  memTotal: 33506283520
  ociRuntime:
    name: crun
    package: crun-0.20.1-1.fc34.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.20.1
      commit: 0d42f1109fd73548f44b01b3e84d04a279e99d2e
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 8543531008
  swapTotal: 8589930496
  uptime: 1053h 44m 24.35s (Approximately 43.88 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 1
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.3.0-dev
  Built: 1623872418
  BuiltTime: Wed Jun 16 13:40:18 2021
  GitCommit: 2509a81c343314fa452a0215d05e9d74ab4ec15c
  GoVersion: go1.16.3
  OsArch: linux/amd64
  Version: 3.3.0-dev

edsantiago · 2021-06-16T22:17:44Z

Well, drat. I brought up a 1mt VM, 5.11.15-300.fc34, and can't reproduce in almost an hour of looping (using test-and-break so I could leave it unattended):

# while :;do t=$SECONDS;podman exec foo date;if [[ $(($SECONDS - $t)) -gt 5 ]]; then break;fi;done

edsantiago · 2021-06-17T16:12:45Z

Hypothesis: it might be the systemd journal GC.

# journalctl --disk-usage
Archived and active journals take up 4.0G in the file system.

If I add --events-backend=file to my reproducer (both podman-run and podman-exec), the problem goes away. I cannot reproduce the hang, even over the course of many minutes.

I have a 1mt VM running, and am trying my best to fill up the journal by piping binaries through logger. It's not filling up fast enough. Any suggestions?

rhatdan · 2021-06-18T11:24:11Z

Does journald do throttling?

vrothberg · 2021-06-21T11:52:16Z

@msekletar, have you seen such symptoms where writes to the journal are substantially delayed?

msekletar · 2021-06-21T15:46:01Z

@vrothberg I don't recall any recent issue regarding delayed writes. In general writing to journald may cause clients to block (as with any other syslog daemon). Hence logging code should be able to deal with this condition. Answer to question, "Is message that we are about to log critical enough to stall the entire application in case logging subsystem is busy or otherwise no responsive?" should be answered (and logging implemented accordingly) by the application developers. Btw, I know it is easy to ask to question and much harder to come up with the correct implementation of the answer.

github-actions · 2021-07-22T00:03:01Z

A friendly reminder that this issue had no activity for 30 days.

edsantiago · 2021-07-22T01:09:46Z

I can still reproduce this on my laptop, which has been rebooted since the last time I checked in here.

...
Thu Jul 22 01:07:38 UTC 2021

Thu Jul 22 01:08:04 UTC 2021
Thu Jul 22 01:08:04 UTC 2021
Thu Jul 22 01:08:05 UTC 2021

Thu Jul 22 01:08:31 UTC 2021

Thu Jul 22 01:08:59 UTC 2021
Thu Jul 22 01:08:59 UTC 2021
...

vrothberg · 2021-07-27T09:47:10Z

I bisected a bit and it looks like it has started with commit 341e6a1.

Commit 341e6a1 made sure that all exec sessions are getting cleaned up. But it also came with a peformance penalty. Fix that penalty by spawning the cleanup process to really only cleanup the exec session without attempting to remove the container. [NO TESTS NEEDED] since we have no means to test such performance issues in CI. Fixes: containers#10701 Signed-off-by: Valentin Rothberg <rothberg@redhat.com>

edsantiago added the flakes Flakes from Continuous Integration label Jun 16, 2021

edsantiago mentioned this issue Jun 16, 2021

podman logs -f: misses final container output #10675

Closed

edsantiago mentioned this issue Jun 17, 2021

podman-exec tty resize: stty: standard input #10710

Open

github-actions bot added the stale-issue label Jul 22, 2021

edsantiago removed the stale-issue label Jul 22, 2021

edsantiago mentioned this issue Jul 26, 2021

support container to container copy #11049

Merged

vrothberg mentioned this issue Jul 27, 2021

exec: fix cleanup #11055

Merged

openshift-merge-robot closed this as completed in #11055 Jul 27, 2021

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman exec: periodic 30-second CPU-hogging hangs #10701

podman exec: periodic 30-second CPU-hogging hangs #10701

edsantiago commented Jun 16, 2021

rhatdan commented Jun 16, 2021

edsantiago commented Jun 16, 2021

mheon commented Jun 16, 2021

edsantiago commented Jun 16, 2021

edsantiago commented Jun 16, 2021

edsantiago commented Jun 16, 2021

edsantiago commented Jun 17, 2021

rhatdan commented Jun 18, 2021

vrothberg commented Jun 21, 2021

msekletar commented Jun 21, 2021

github-actions bot commented Jul 22, 2021

edsantiago commented Jul 22, 2021

vrothberg commented Jul 27, 2021

podman exec: periodic 30-second CPU-hogging hangs #10701

podman exec: periodic 30-second CPU-hogging hangs #10701

Comments

edsantiago commented Jun 16, 2021

rhatdan commented Jun 16, 2021

edsantiago commented Jun 16, 2021

mheon commented Jun 16, 2021

edsantiago commented Jun 16, 2021

edsantiago commented Jun 16, 2021

edsantiago commented Jun 16, 2021

edsantiago commented Jun 17, 2021

rhatdan commented Jun 18, 2021

vrothberg commented Jun 21, 2021

msekletar commented Jun 21, 2021

github-actions bot commented Jul 22, 2021

edsantiago commented Jul 22, 2021

vrothberg commented Jul 27, 2021