fix(lxc): race condition in container clone / reboot operations #2320

shamilovstas · 2025-11-04T14:22:35Z

Contributor's Note

I have added / updated documentation in /docs for any user-facing features or additions.
I have added / updated acceptance tests in /fwprovider/tests for any new or updated resources / data sources.
I have ran make example to verify that the change works as expected.

It seems that some tests running in parallel and using the same container were causing some of the lock timeouts on the proxmox host used for testing. The single acceptance test, parts of which were run in parallel and used the same container, into several independent tests using their own containers. This should reduce the overall flakiness of the tests.

Besides, a few tests were fixed. The tests for ipv4/ipv6 were using linux bridge "vmbr1", which wasn't created beforehand.
Also, removed some of the WaitForContainerStatus calls done after updating the container since containerUpdate method now waits for the container to be rebooted if the reboot was triggered.

Also, the template used for running containers was changed from Ubuntu 24.04 to Alpine 3.22. This makes the downloading of the template faster as the Alpine template is much smaller.

Proof Of Work

❯ for i in {1..10}; do echo "Run $i/10"; ./testacc TestAccResourceContainer; done
Run 1/10
ok  	github.com/bpg/terraform-provider-proxmox/fwprovider/test	54.947s
Run 2/10
ok  	github.com/bpg/terraform-provider-proxmox/fwprovider/test	55.139s
Run 3/10
ok  	github.com/bpg/terraform-provider-proxmox/fwprovider/test	52.196s
Run 4/10
ok  	github.com/bpg/terraform-provider-proxmox/fwprovider/test	46.814s
Run 5/10
ok  	github.com/bpg/terraform-provider-proxmox/fwprovider/test	51.870s
Run 6/10
ok  	github.com/bpg/terraform-provider-proxmox/fwprovider/test	53.305s
Run 7/10
ok  	github.com/bpg/terraform-provider-proxmox/fwprovider/test	51.909s
Run 8/10
ok  	github.com/bpg/terraform-provider-proxmox/fwprovider/test	49.757s
Run 9/10
ok  	github.com/bpg/terraform-provider-proxmox/fwprovider/test	51.239s
Run 10/10
ok  	github.com/bpg/terraform-provider-proxmox/fwprovider/test	55.078s

Community Note

Please vote on this pull request by adding a 👍 reaction to the original pull request comment to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for pull request followers and do not help prioritize the request

Closes #2313

add waiting for status "running" after update that triggered reboot Signed-off-by: Stanislav Shamilov <shamilovstas@protonmail.com>

It seems that some tests running in parallel and using the same container were causing some of the lock timeouts on the proxmox host used for testing. This commit splits a single test, parts of which were run in parallel and used the same container, into several independent tests using their own containers. This should reduce the overall flakiness of the tests. Besides, a few tests were fixed. The tests for ipv4/ipv6 were using linux bridge "vmbr1", which wasn't created beforehand. Also, removed some of the WaitForContainerStatus calls done after updating the container since previous commit introduced a wait in the `updateContainer` method which waits for the container to be rebooted. Also, the template used for running containers was changed from Ubuntu 24.04 to Alpine 3.22. This makes the downloading of the template faster as the Alpine template is much smaller. Signed-off-by: Stanislav Shamilov <shamilovstas@protonmail.com>

bpg · 2025-11-07T21:05:47Z

proxmoxtf/resource/container/container.go

+		if e := containerAPI.WaitForContainerStatus(ctx, "running"); e != nil {
+			return diag.FromErr(e)
+		}
 	}


I tried that before, but it didn’t change anything, the container is reported as “running” almost immediately after the reboot call. I guess it depends on the PVE host though. On slower hosts / IO or in some edge cases that might actually help.

So do you think I should leave it or delete it?

if this adds stability in your tests, then let's leave it.

still racy tho:

=== CONT TestAccResourceContainerIpv4Ipv6 === CONT TestAccResourceContainerMountPoint === NAME TestAccResourceContainerDnsBlock resource_container_test.go:329: Step 1/6 error: Error running apply: exit status 1 Error: error starting container: error starting container: received an HTTP 500 response - Reason: CT 121253 already running with proxmox_virtual_environment_container.test_container, on terraform_plugin_test.tf line 25, in resource "proxmox_virtual_environment_container" "test_container": 25: resource "proxmox_virtual_environment_container" "test_container" { --- PASS: TestAccResourceContainerIpv4Ipv6 (6.61s) --- PASS: TestAccResourceContainerMountPoint (10.95s) === NAME TestAccResourceContainer resource_container_test.go:50: Step 3/3 error: Error running apply: exit status 1 Error: error updating container: received an HTTP 500 response - Reason: can't lock file '/run/lock/lxc/pve-config-135546.lock' - got timeout with proxmox_virtual_environment_container.test_container, on terraform_plugin_test.tf line 25, in resource "proxmox_virtual_environment_container" "test_container": 25: resource "proxmox_virtual_environment_container" "test_container" { === NAME TestAccResourceContainerHostname resource_container_test.go:565: Error running post-test destroy, there may be dangling resources: exit status 1 Error: error waiting for container shut down: task "UPID:pve:001BABE4:02948904:69132015:vzshutdown:104154:root@pam:" failed to complete with exit code: can't lock file '/run/lock/lxc/pve-config-104154.lock' - got timeout --- FAIL: TestAccResourceContainerHostname (21.57s) --- FAIL: TestAccResourceContainerDnsBlock (21.97s) --- PASS: TestAccResourceContainerClone (27.57s)

I think WaitForContainerStatus(...) is still needed in tests. It's possible that when we call it right after a reboot, the actual reboot hasn't completed yet, it seems to be asynchronous. So the call in the update might return "running" for a container that hasn't actually been rebooted just yet.

No harm in keeping it there tho.

I'll make the changes in this PR as I'm testing this branch atm.

As for the WaitContainerForStatus, I removed it from the tests as the "update" method of the resource now waits for the container if it was rebooted.

As for the flakiness thing, I described it here: #2313 (comment)

Overall, my changes didn't remove flakiness completely, but seem to have reduced it. Before it, I couldn't run the tests successfully even once. The remaining flakiness might be a symptom of some bug in "update" method of the resource, but I'm not sure about that

alright, the proper fix here is to wait for reboot / create / etc task to complete

Signed-off-by: Pavel Boldyrev <627562+bpg@users.noreply.github.com>

bpg

Thanks for raising this @shamilovstas!

I've played with the tests a bit and I think I've fixed the root cause of the flakiness -- a missing wait on container's reboot / clones tasks.
Seems to be working well now 🤞🏼

LGTM! 🚀

Signed-off-by: Pavel Boldyrev <627562+bpg@users.noreply.github.com>

shamilovstas added 2 commits November 4, 2025 00:42

disable parallelism for lxc tests

8833b90

add waiting for status "running" after update that triggered reboot Signed-off-by: Stanislav Shamilov <shamilovstas@protonmail.com>

pull-request-size bot added the size/XL label Nov 4, 2025

bpg reviewed Nov 7, 2025

View reviewed changes

shamilovstas mentioned this pull request Nov 11, 2025

misc(test): use ubuntu minimal cloud image in tests #2311

Merged

3 tasks

shamilovstas marked this pull request as ready for review November 11, 2025 11:20

bpg changed the title ~~Lxc acceptance tests~~ fix(lxc): race condition in container create / update operations Nov 11, 2025

bpg changed the title ~~fix(lxc): race condition in container create / update operations~~ fix(lxc): race condition in container clone / reboot operations Nov 11, 2025

fix(lxc): race condition in container clone / reboot operations

11f7445

Signed-off-by: Pavel Boldyrev <627562+bpg@users.noreply.github.com>

bpg previously approved these changes Nov 11, 2025

View reviewed changes

linter

f42aabb

Signed-off-by: Pavel Boldyrev <627562+bpg@users.noreply.github.com>

bpg dismissed their stale review via f42aabb November 11, 2025 13:07

bpg approved these changes Nov 11, 2025

View reviewed changes

bpg merged commit ce358d5 into bpg:main Nov 11, 2025
5 checks passed

bpg-autobot bot mentioned this pull request Nov 11, 2025

chore(main): release 0.87.0 #2306

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(lxc): race condition in container clone / reboot operations #2320

fix(lxc): race condition in container clone / reboot operations #2320

Uh oh!

shamilovstas commented Nov 4, 2025 •

edited by bpg

Loading

Uh oh!

bpg Nov 7, 2025

Uh oh!

shamilovstas Nov 11, 2025

Uh oh!

bpg Nov 11, 2025

Uh oh!

bpg Nov 11, 2025

Uh oh!

shamilovstas Nov 11, 2025

Uh oh!

shamilovstas Nov 11, 2025 •

edited

Loading

Uh oh!

bpg Nov 11, 2025

Uh oh!

bpg left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix(lxc): race condition in container clone / reboot operations #2320

fix(lxc): race condition in container clone / reboot operations #2320

Uh oh!

Conversation

shamilovstas commented Nov 4, 2025 • edited by bpg Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contributor's Note

Proof Of Work

Community Note

Uh oh!

bpg Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

shamilovstas Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

bpg Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

bpg Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

shamilovstas Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

shamilovstas Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bpg Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

bpg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shamilovstas commented Nov 4, 2025 •

edited by bpg

Loading

shamilovstas Nov 11, 2025 •

edited

Loading