cannot remove container [...]: given PIDs did not die within timeout #17142

edsantiago · 2023-01-17T16:18:04Z

New flake, failed twice in one PR:

not ok 137 podman container rm --force doesn't leave running processes
...
# podman-remote --url unix:/tmp/podman_tmp_i9cG rm -f cTG9q6WAKk9c0Bvs343yuzK8rbPiBTs
Error: cannot remove container <sha> as it could not be stopped: given PIDs did not die within timeou

The text was updated successfully, but these errors were encountered:

vrothberg · 2023-01-17T16:22:05Z

Currently, the timeout is 5 seconds. That may not be enough on very busy nodes. In CI, 20 seconds seem to be the current magic limit. Shall we bump it to that?

@giuseppe @mheon WDYT?

mheon · 2023-01-17T16:45:42Z

I'm a little reluctant here, because it will basically result in a frozen Podman for most of a minute if the process is legitimately not going to die (zombie process or similar) - and I am somehow doubting that the CI VMs are slow enough that it takes more than 5 seconds for a SIGKILL to be processed (but then again, maybe I'm wrong).

vrothberg · 2023-01-17T16:49:10Z

There may very well be (another) bug? Let's keep an eye open and wait for more reports from @edsantiago. CI is sick at the moment, so there may be other things interfering (although unlikely).

edsantiago · 2023-01-18T14:49:32Z

Uh-oh

not ok 521 podman kube play - multi-pod YAML
...
        # $ podman pod stop pod1 pod2
         # 3878f14ab92abfba94526a0df961abce9183781525442b4906c4f2103009af14
         # 7568d3ac6f56544faabafea7c984e568f401122e9031c43c0c75b6bf82de09a3
         # open pidfd: No such process
         # time="2023-01-18T08:24:09-06:00" level=error msg="Stopping service container 7da6d2bccb1d253b273ee65a083145215dabd290115edee7f8a0025dfa88465a: given PIDs did not die within timeout"
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #|     FAIL: podman pod stop pod1 pod2
         # #| expected: !~ 'Stopping'
         # #|   actual:    '3878f14ab92abfba94526a0df961abce9183781525442b4906c4f2103009af14'
         # #|         >    '7568d3ac6f56544faabafea7c984e568f401122e9031c43c0c75b6bf82de09a3'
         # #|         >    'open pidfd: No such process'
         # #|         >    'time="2023-01-18T08:24:09-06:00" level=error msg="Stopping service container 7da6d2bccb1d253b273ee65a083145215dabd290115edee7f8a0025dfa88465a: given PIDs did not die within timeout"'
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vrothberg · 2023-01-18T14:58:08Z

I'll tackle this one tomorrow. I had a look today already but then moved on to the bud flake

vrothberg · 2023-01-19T09:39:15Z

it smells so badly like a race

vrothberg · 2023-01-19T10:10:19Z

it smells so badly like a race

#17165 ... same race as before but an improvement of the previous PR (details in the PR description)

Commit 067442b improved stopping/killing a container by detecting whether the cleanup process has already fired and changed the state of the container. Further improve on that by returning early instead of trying to wait for the PID to finish. At that point we know that the container has exited but the previous PID may have been recycled already by the kernel. [NO NEW TESTS NEEDED] - the absence of the two flaking tests recorded in containers#17142 will tell. Fixes: containers#17142 Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>

edsantiago added the flakes Flakes from Continuous Integration label Jan 17, 2023

edsantiago mentioned this issue Jan 17, 2023

sig-proxy system test: bump timeout #17141

Merged

vrothberg mentioned this issue Jan 19, 2023

StopContainer: return if cleanup process changed state #17165

Merged

openshift-merge-robot closed this as completed in #17165 Jan 19, 2023

odockal mentioned this issue Aug 4, 2023

Stopping an alpine container on GHA ubuntu (22.04) runner shows an error during e2e test implementation podman-desktop/podman-desktop#3434

Closed

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 3, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cannot remove container [...]: given PIDs did not die within timeout #17142

cannot remove container [...]: given PIDs did not die within timeout #17142

edsantiago commented Jan 17, 2023

vrothberg commented Jan 17, 2023

mheon commented Jan 17, 2023

vrothberg commented Jan 17, 2023

edsantiago commented Jan 18, 2023

vrothberg commented Jan 18, 2023

vrothberg commented Jan 19, 2023

vrothberg commented Jan 19, 2023

cannot remove container [...]: given PIDs did not die within timeout #17142

cannot remove container [...]: given PIDs did not die within timeout #17142

Comments

edsantiago commented Jan 17, 2023

vrothberg commented Jan 17, 2023

mheon commented Jan 17, 2023

vrothberg commented Jan 17, 2023

edsantiago commented Jan 18, 2023

vrothberg commented Jan 18, 2023

vrothberg commented Jan 19, 2023

vrothberg commented Jan 19, 2023