Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start/stop operations are not atomic #369

Open
holmanb opened this issue May 6, 2024 · 0 comments
Open

start/stop operations are not atomic #369

holmanb opened this issue May 6, 2024 · 0 comments

Comments

@holmanb
Copy link
Member

holmanb commented May 6, 2024

Calling instance.wait_for_stop() followed by instance.start() will sometimes result in the following exception:

>           raise RuntimeError(errmsg)
E           RuntimeError: Failure (rc=1): Error: The instance is already running

I think that lxc is reporting STOPPED prior to the image actually being (re)bootable. I've seen the same issue on occasion when manually doing a stop/start on images, but didn't realize that this was a problem for our integration tests until debugging a flaky integration test which does this.

In the observed case that I've seen this, the shutdown is initiated by the image. I don't know whether a pycloudlib-initiated shutdown will produce the same effect, but I think that it would, given the effect.

I've filed an issue against lxd, but until it gets fixed we could probably introduce a retry loop. I don't see this tested in the integration test, so we should probably add it too. The comment in that test "Test is unstable but most stable on lxd containers" leads me to suspect that other platforms are similarly affected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant