-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman ps: hangs #658
Comments
Whoops: podman-0.4.4.1524346805-gitcf1d884.fc27.x86_64 |
Looks like it's hanging on the lock. |
Podman processes, I mean |
|
They're all stuck on a specific container's lock. Alright. So we have a race with many ps running against each other. |
10563 seems to have the lock. It looks like it's stuck in the critical section and not releasing the lock. |
Yes, that got things unstuck. |
OK. It's 10563 holding that lock. That's |
I have a suspicion that it might be https://github.com/projectatomic/libpod/blob/master/libpod/container_api.go#L411-L416 in It's one of only two places where |
The obvious conclusion is that we're getting stuck somewhere on the boltdb lock, so we never hit the Unlock() in attach.go |
That actually can't be true. If it was the case, ps would not even get this far - it would fail to even retrieve the container. The lock issue has to be the container lock itself. |
We must be getting stuck in |
Wait, |
This is strange. First: yes, the container was indeed stopped at the time of this situation; I had to But... this should not be part of the test. 19 is very definitely excluded from the signal list. The debug logs do not show any part of the tests sending that signal. I've tried to reproduce manually and can't figure out how the container ended up stopped. But yes, it looks like this is a smoking gun. Continuing to look; will update if I find anything. |
@mheon is there any possibility that |
@edsantiago I don't think it's |
I'm seeing the same thing, but it is hanging on a different lock.
|
That looks like a c/storage lock, so it should be a separate issue, but let's keep it here until we can get it localized. @ipmb Can you give the commands you're using to recreate this? |
I'm setting up the containers in systemd with a config mgmt system. The tasks shouldn't be happening in parallel, but in very rapid succession:
This happens a handful of times as each container spins up. I've only seen it in testing on my mac which has a bunch of virtualization (hyperkit -> Docker -> podman on vfs), so lots of overhead. I've noticed I need to sleep for a few seconds to reliably inspect a container after the run command. On a "real" machine I can run them back-to-back with |
It seems we're blocking on c/storage's |
I believe this is fixed now, please reopen if it still happens. |
I saw this today when automating something with the podman varlink API.
Again, killing one of the offending processes fixed the issue:
The deadlock seemed to happen when I tried to call |
I'm experiencing the
Issuing a Maybe this is unrelated, but when a container PID is killed on the host, podman is unable to remove it too.
|
@pgporada have a reproducer we can try? |
Yes, start a container as follows
Kill the PID of the container on the host via
|
It's conmon getting killed before the container, the container dying before
conmon should be fine.
…On Sun, Dec 22, 2019, 13:06 Brent Baude ***@***.***> wrote:
@pgporada <https://github.com/pgporada> when you say kill the pid of the
container, what exactly are you killing?
@mheon <https://github.com/mheon> dont we have issues when things get
killed from underneath conmon?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#658>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3AOCFGDASDCEG7GT2U7OTQZ6UCPANCNFSM4E4AQJAQ>
.
|
Conmon dying could hang us as we wait for the exit file, though.
…On Sun, Dec 22, 2019, 16:29 Matthew Heon ***@***.***> wrote:
It's conmon getting killed before the container, the container dying
before conmon should be fine.
On Sun, Dec 22, 2019, 13:06 Brent Baude ***@***.***> wrote:
> @pgporada <https://github.com/pgporada> when you say kill the pid of the
> container, what exactly are you killing?
>
> @mheon <https://github.com/mheon> dont we have issues when things get
> killed from underneath conmon?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#658>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AB3AOCFGDASDCEG7GT2U7OTQZ6UCPANCNFSM4E4AQJAQ>
> .
>
|
Every so often my system gets into a state where
podman ps
hangs. It's in such a state right now, and I was able to strace it. I think this is the relevant portion:^^^ this is where it's hanging
The text was updated successfully, but these errors were encountered: