Tests are failing with timeout error #425

inknos · 2024-09-12T12:44:08Z

Reproduces in GH and locally

pull upstream
make tests

tests get stuck at podman/tests/integration/test_containers.py::ContainersIntegrationTest::test_container_crud

See https://github.com/containers/podman-py/pull/420/checks?check_run_id=28860969899

The text was updated successfully, but these errors were encountered:

inknos · 2024-09-16T10:06:28Z

After bisecting podman, this is the first commit where podman-py tests hang containers/podman@8a94331

rhatdan · 2024-09-16T15:41:07Z

@Luap99 PTAL

Luap99 · 2024-09-16T16:04:40Z

I am not familiar with the python code here but if you do not call the wait endpoint then the linked commit should not change anything as the code is only used in there unless I am overlooking something.

Looking at your reproducer it doesn't even start the container so there is no reason to call wait either, and I cannot reproduce with your small python example either unless I did something wrong? I just copied your reproducer to a file in this repo and then just run it.

inknos · 2024-09-16T16:13:20Z

apologies I deleted the comment with the reproducer already because it was incorrect.

if you do not call the wait endpoint then the linked commit should not change anything as the code is only used in there unless I am overlooking something

your comment shed some light, now I actually have something that can reproduce it

from podman import PodmanClient
import subprocess

uri = "unix:///run/user/100/podman/podman.sock"
client = PodmanClient(base_uri=uri)

c = client.containers.create('nginx:latest', name="test", detach=True)
c.start()

works

c = client.containers.run('nginx:latest', name="test", detach=True)

hangs

I think I know where to look now. run calls start, wait and reload in sequence

Luap99 · 2024-09-16T16:29:11Z

Yes that makes more sense, I was able to reproduce with the test case once I figure out the dependencies.

I see container.wait(condition=["running", "exited"]) being called there but I really really do not understand the purpose of this call. After start is done it is safe to assume the container is running (or well it could have any other state as well as in the meantime the process could have exited or any other podman command might have changed the state) so this call doesn't seem to do anything useful. In fact the caller asked for detach and we still wait?

inknos · 2024-09-16T16:33:31Z

not sure about this line. I found new issues related to getting the right status. it seems like the call will indeed wait forever since the status looks to be created every time.

>>> c = client.containers.create('nginx:latest', detach=True)
>>> c.status
'created'
>>> c.start()
>>> c.status
'created'

Luap99 · 2024-09-16T16:38:52Z

In theory exited cannot happen in your reproducer however the wait condition should trigger on running as well so something must be wrong with that. The logic should return if any state was hit however it currently blocks forever.

I see what is causing this on the podman side and will look into a fix on the podman side tomorrow.

However I would recommend you drop the wait call there, it is not doing anything useful AFAICS.

the status looks to be created every time.

You will have to call reload() because the python code of course doesn't really understand state changes on the podman side, the API has no real way to update the state fields on its own so you must do an explicit reload which in turn does an inspect on the server and then just updates the attrs filed with that. c.state itself just returns self.attrs["State"]["Status"]

jwhonce · 2024-09-16T16:48:35Z

@Luap99 The wait call is blocking on the container on the server to be in the correct state. So a reload() is not required. These semantics are to be compatible.

Luap99 · 2024-09-16T16:52:36Z

@Luap99 The wait call is blocking on the container on the server to be in the correct state.

Well if you call start the container is always running on the server, calling wait for running afterwards is just pointless.

inknos · 2024-09-16T16:55:24Z

Should the run function then call wait only in certain cases? I can see that for a detached container does not make much sense, but maybe it does for a container which is not detached. in my opinion, the behaviors should be consistent, so I would cut it tbh.

I think wait makes a lot of sense for scripting purposes, but not within the run function context. after all isn't run just create+start?

Luap99 · 2024-09-16T16:59:56Z

In the non detached case it already calls wait() (without args so it waits for exit) later which is correct, exit_status = container.wait(). I think run should largely follow what podman run is doing.

jwhonce · 2024-09-16T17:18:25Z

I plead very bad git history, sorry guys. I added that condition without explanation. I want to say the idea was to have the reload() be correct and there was a race condition. I infer this because the wait() was added with integration tests comment.

Over the last three years the server semantics may have changed enough that the race condition no longer exists.

inknos · 2024-09-17T13:18:24Z

@jwhonce, we recently had an update in our docs where reload() was written explicitly for getting the status of a container. I believe wait can be removed and maybe reload too.

Actually, to improve getting the correct status, what about trying to reload directly within in the status property?

Luap99 · 2024-09-17T13:23:38Z

Actually, to improve getting the correct status, what about trying to reload directly within in the status property?

Note that an inspect call is rather expensive. I am not sure how the code is used in general but if one does if c.status == "foo" || c.status == "bar" than having the status change in between might be a bad thing?

inknos · 2024-09-17T13:40:44Z

yeah, that can be an issue. let's not make the status property too smart

As it turns on things are not so simple after all... In podman-py it was reported[1] that waiting might hang, per our docs wait on multiple conditions should exit once the first one is hit and not all of them. However because the new wait logic never checked if the context was cancelled the goroutine kept running until conmon exited and because we used a waitgroup to wait for all of them to finish it blocked until that happened. First we can remove the waitgroup as we only need to wait for one of them anyway via the channel. While this alone fixes the hang it would still leak the other goroutine. As there is no way to cancel a goroutine all the code must check for a cancelled context in the wait loop to no leak. Fixes 8a94331 ("libpod: simplify WaitForExit()") [1] containers/podman-py#425 Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Fixes: containers#425 Signed-off-by: Nicola Sella <nsella@redhat.com>

inknos self-assigned this Sep 12, 2024

inknos changed the title ~~Tiests are failing in a timeout~~ Tiests are failing with timeout error Sep 12, 2024

inknos mentioned this issue Sep 12, 2024

Deploy new PyPi release #424

Closed

inknos changed the title ~~Tiests are failing with timeout error~~ Tests are failing with timeout error Sep 12, 2024

milanbalazs mentioned this issue Sep 13, 2024

Extend the parameters of 'images.load' and 'login' methods #426

Closed

inknos mentioned this issue Sep 16, 2024

Remove wait condition in run() #428

Merged

Luap99 mentioned this issue Sep 17, 2024

wait: fix handling of multiple conditions with exited containers/podman#23985

Merged

openshift-merge-bot bot closed this as completed in #428 Sep 17, 2024

openshift-merge-bot bot closed this as completed in 75430fa Sep 17, 2024

inknos added a commit to inknos/podman-py that referenced this issue Sep 25, 2024

Remove wait condition in run()

51dee0b

Fixes: containers#425 Signed-off-by: Nicola Sella <nsella@redhat.com>

inknos mentioned this issue Oct 17, 2024

TypeError when trying to get container instance status #446

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests are failing with timeout error #425

Tests are failing with timeout error #425

inknos commented Sep 12, 2024 •

edited

Loading

inknos commented Sep 16, 2024

rhatdan commented Sep 16, 2024

Luap99 commented Sep 16, 2024

inknos commented Sep 16, 2024 •

edited

Loading

Luap99 commented Sep 16, 2024

inknos commented Sep 16, 2024

Luap99 commented Sep 16, 2024

jwhonce commented Sep 16, 2024

Luap99 commented Sep 16, 2024

inknos commented Sep 16, 2024

Luap99 commented Sep 16, 2024

jwhonce commented Sep 16, 2024

inknos commented Sep 17, 2024

Luap99 commented Sep 17, 2024

inknos commented Sep 17, 2024

Tests are failing with timeout error #425

Tests are failing with timeout error #425

Comments

inknos commented Sep 12, 2024 • edited Loading

inknos commented Sep 16, 2024

rhatdan commented Sep 16, 2024

Luap99 commented Sep 16, 2024

inknos commented Sep 16, 2024 • edited Loading

Luap99 commented Sep 16, 2024

inknos commented Sep 16, 2024

Luap99 commented Sep 16, 2024

jwhonce commented Sep 16, 2024

Luap99 commented Sep 16, 2024

inknos commented Sep 16, 2024

Luap99 commented Sep 16, 2024

jwhonce commented Sep 16, 2024

inknos commented Sep 17, 2024

Luap99 commented Sep 17, 2024

inknos commented Sep 17, 2024

inknos commented Sep 12, 2024 •

edited

Loading

inknos commented Sep 16, 2024 •

edited

Loading