start/stop operations are not atomic #369

holmanb · 2024-05-06T22:49:09Z

Calling instance.wait_for_stop() followed by instance.start() will sometimes result in the following exception:

>           raise RuntimeError(errmsg)
E           RuntimeError: Failure (rc=1): Error: The instance is already running

I think that lxc is reporting STOPPED prior to the image actually being (re)bootable. I've seen the same issue on occasion when manually doing a stop/start on images, but didn't realize that this was a problem for our integration tests until debugging a flaky integration test which does this.

In the observed case that I've seen this, the shutdown is initiated by the image. I don't know whether a pycloudlib-initiated shutdown will produce the same effect, but I think that it would, given the effect.

I've filed an issue against lxd, but until it gets fixed we could probably introduce a retry loop. I don't see this tested in the integration test, so we should probably add it too. The comment in that test "Test is unstable but most stable on lxd containers" leads me to suspect that other platforms are similarly affected.

The text was updated successfully, but these errors were encountered:

This was referenced May 6, 2024

lxc start fails despite stopped state canonical/lxd#13453

Closed

fix flaky integration test canonical/cloud-init#5269

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

start/stop operations are not atomic #369

start/stop operations are not atomic #369

holmanb commented May 6, 2024 •

edited

Loading

start/stop operations are not atomic #369

start/stop operations are not atomic #369

Comments

holmanb commented May 6, 2024 • edited Loading

holmanb commented May 6, 2024 •

edited

Loading