Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot remove container [...]: given PIDs did not die within timeout #17142

Closed
edsantiago opened this issue Jan 17, 2023 · 7 comments · Fixed by #17165
Closed

cannot remove container [...]: given PIDs did not die within timeout #17142

edsantiago opened this issue Jan 17, 2023 · 7 comments · Fixed by #17165
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

New flake, failed twice in one PR:

not ok 137 podman container rm --force doesn't leave running processes
...
# podman-remote --url unix:/tmp/podman_tmp_i9cG rm -f cTG9q6WAKk9c0Bvs343yuzK8rbPiBTs
Error: cannot remove container <sha> as it could not be stopped: given PIDs did not die within timeou
@edsantiago edsantiago added the flakes Flakes from Continuous Integration label Jan 17, 2023
@vrothberg
Copy link
Member

Currently, the timeout is 5 seconds. That may not be enough on very busy nodes. In CI, 20 seconds seem to be the current magic limit. Shall we bump it to that?

@giuseppe @mheon WDYT?

@mheon
Copy link
Member

mheon commented Jan 17, 2023

I'm a little reluctant here, because it will basically result in a frozen Podman for most of a minute if the process is legitimately not going to die (zombie process or similar) - and I am somehow doubting that the CI VMs are slow enough that it takes more than 5 seconds for a SIGKILL to be processed (but then again, maybe I'm wrong).

@vrothberg
Copy link
Member

There may very well be (another) bug? Let's keep an eye open and wait for more reports from @edsantiago. CI is sick at the moment, so there may be other things interfering (although unlikely).

@edsantiago
Copy link
Member Author

Uh-oh

not ok 521 podman kube play - multi-pod YAML
...
        # $ podman pod stop pod1 pod2
         # 3878f14ab92abfba94526a0df961abce9183781525442b4906c4f2103009af14
         # 7568d3ac6f56544faabafea7c984e568f401122e9031c43c0c75b6bf82de09a3
         # open pidfd: No such process
         # time="2023-01-18T08:24:09-06:00" level=error msg="Stopping service container 7da6d2bccb1d253b273ee65a083145215dabd290115edee7f8a0025dfa88465a: given PIDs did not die within timeout"
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #|     FAIL: podman pod stop pod1 pod2
         # #| expected: !~ 'Stopping'
         # #|   actual:    '3878f14ab92abfba94526a0df961abce9183781525442b4906c4f2103009af14'
         # #|         >    '7568d3ac6f56544faabafea7c984e568f401122e9031c43c0c75b6bf82de09a3'
         # #|         >    'open pidfd: No such process'
         # #|         >    'time="2023-01-18T08:24:09-06:00" level=error msg="Stopping service container 7da6d2bccb1d253b273ee65a083145215dabd290115edee7f8a0025dfa88465a: given PIDs did not die within timeout"'
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

@vrothberg
Copy link
Member

I'll tackle this one tomorrow. I had a look today already but then moved on to the bud flake

@vrothberg
Copy link
Member

it smells so badly like a race

@vrothberg
Copy link
Member

it smells so badly like a race

#17165 ... same race as before but an improvement of the previous PR (details in the PR description)

vrothberg added a commit to vrothberg/libpod that referenced this issue Jan 19, 2023
Commit 067442b improved stopping/killing a container by detecting
whether the cleanup process has already fired and changed the state of
the container.  Further improve on that by returning early instead of
trying to wait for the PID to finish.  At that point we know that the
container has exited but the previous PID may have been recycled
already by the kernel.

[NO NEW TESTS NEEDED] - the absence of the two flaking tests recorded
in containers#17142 will tell.

Fixes: containers#17142
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 3, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants