-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests are failing with timeout error #425
Comments
After bisecting podman, this is the first commit where podman-py tests hang containers/podman@8a94331 |
@Luap99 PTAL |
I am not familiar with the python code here but if you do not call the wait endpoint then the linked commit should not change anything as the code is only used in there unless I am overlooking something. Looking at your reproducer it doesn't even start the container so there is no reason to call wait either, and I cannot reproduce with your small python example either unless I did something wrong? I just copied your reproducer to a file in this repo and then just run it. |
apologies I deleted the comment with the reproducer already because it was incorrect.
your comment shed some light, now I actually have something that can reproduce it from podman import PodmanClient
import subprocess
uri = "unix:///run/user/100/podman/podman.sock"
client = PodmanClient(base_uri=uri)
c = client.containers.create('nginx:latest', name="test", detach=True)
c.start() works c = client.containers.run('nginx:latest', name="test", detach=True) hangs I think I know where to look now. |
Yes that makes more sense, I was able to reproduce with the test case once I figure out the dependencies. I see |
not sure about this line. I found new issues related to getting the right status. it seems like the call will indeed wait forever since the status looks to be
|
In theory exited cannot happen in your reproducer however the wait condition should trigger on running as well so something must be wrong with that. The logic should return if any state was hit however it currently blocks forever. I see what is causing this on the podman side and will look into a fix on the podman side tomorrow. However I would recommend you drop the wait call there, it is not doing anything useful AFAICS.
You will have to call reload() because the python code of course doesn't really understand state changes on the podman side, the API has no real way to update the state fields on its own so you must do an explicit reload which in turn does an inspect on the server and then just updates the attrs filed with that. c.state itself just returns self.attrs["State"]["Status"] |
@Luap99 The wait call is blocking on the container on the server to be in the correct state. So a reload() is not required. These semantics are to be compatible. |
Well if you call start the container is always running on the server, calling wait for running afterwards is just pointless. |
Should the I think |
In the non detached case it already calls wait() (without args so it waits for exit) later which is correct, |
I plead very bad git history, sorry guys. I added that condition without explanation. I want to say the idea was to have the reload() be correct and there was a race condition. I infer this because the wait() was added with integration tests comment. Over the last three years the server semantics may have changed enough that the race condition no longer exists. |
@jwhonce, we recently had an update in our docs where Actually, to improve getting the correct status, what about trying to reload directly within in the status property? |
Note that an inspect call is rather expensive. I am not sure how the code is used in general but if one does if c.status == "foo" || c.status == "bar" than having the status change in between might be a bad thing? |
yeah, that can be an issue. let's not make the status property too smart |
As it turns on things are not so simple after all... In podman-py it was reported[1] that waiting might hang, per our docs wait on multiple conditions should exit once the first one is hit and not all of them. However because the new wait logic never checked if the context was cancelled the goroutine kept running until conmon exited and because we used a waitgroup to wait for all of them to finish it blocked until that happened. First we can remove the waitgroup as we only need to wait for one of them anyway via the channel. While this alone fixes the hang it would still leak the other goroutine. As there is no way to cancel a goroutine all the code must check for a cancelled context in the wait loop to no leak. Fixes 8a94331 ("libpod: simplify WaitForExit()") [1] containers/podman-py#425 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Fixes: containers#425 Signed-off-by: Nicola Sella <nsella@redhat.com>
Reproduces in GH and locally
make tests
tests get stuck at
podman/tests/integration/test_containers.py::ContainersIntegrationTest::test_container_crud
See https://github.com/containers/podman-py/pull/420/checks?check_run_id=28860969899
The text was updated successfully, but these errors were encountered: