Behaviour of the clone block vs explicit device configuration in VM resources #2324

TobiPeterG · 2025-11-07T14:14:25Z

TobiPeterG
Nov 7, 2025

Summary

In PR #2260, contributor @hsnprsd introduced logic to detach and delete network devices from a VM when the corresponding network_device config block is removed.

However, during review the repository owner @bpg raised concerns about how this change intersects with the existing semantics of the clone block in VM resources; particularly the expectation that a full clone from a template should inherit all devices by default.

Key questions

When using the clone block (i.e., cloning a VM template) should all devices (e.g., network interfaces, disks) be automatically inherited even if not declared explicitly in the resource block?
If so, how should the provider treat the removal of a network_device block by the user in a cloned VM?
Should the semantics of “no config block => device removed” apply uniformly for cloned VMs and regular VM resource creation, or should cloning behave differently?
What constitutes a breaking change in this context, and how should that be communicated/documented?

Challenges

From a user perspective: If I clone a VM template that has, say, two network interfaces, I expect the cloned VM to get those interfaces unless I deliberately remove or override them. If the provider silently removes one because I omitted its block, that may surprise me.
From a provider semantics perspective: Terraform (and forked tools like OpenTofu) expects that what is declared in the config corresponds to what exists in reality; and if something exists but is not declared, it is typically removed or flagged. The “inherit‐by‐default” behaviour of cloning is somewhat orthogonal to that expectation.
Backward compatibility: As @bpg notes, the provider historically treated clones as full copies of templates (devices included). Altering that behaviour risks breaking many users’ existing deployments.

Possible discussion points

Define the expected behaviour for clones

Option A: Cloned VMs inherit all devices from the template unless the user explicitly removes or overrides them.

Option B: Cloned VMs inherit only those devices declared in the resource block (explicit union of template + config) — omission means removal.

Option C: Deprecate the "clone" operation and provide alternatives for clone use cases.

Option D: Solve this problem in a different way.

Which model aligns best with both Proxmox’s underlying API behaviour and Terraform semantics?

How should “removal” work for (e.g. network) devices in cloned VMs?

If operating under Option A, then a user who omits a network_device block should not expect the cloned VM to drop that device (since it existed on the template).

Under Option B, the omission would mean removal; consistent with Terraform’s “what’s undeclared should be removed” ideal, but possibly surprising for clone use-cases.

With Option C, the VM resource shouldn't be used to clone complete VMs, but only devices from other VMs, e.g. network devices or storage devices. This way, it is clear which devices should exist and from where they should be cloned. The proxmox clone operation could become its own resource to still allow cloning VMs, but they should have limited configuration options to prevent the inherited vs. configured devices conflict.

Communication and documentation

If a breaking change or behavioural change is introduced (e.g., moving from Option A to Option B), it must be clearly documented in the provider’s update notes and user guide under the “clone” section.

Tests: Must cover both scenarios (regular VM + clone VM) to ensure no regressions.

Technical/implementation concerns

Representing devices as lists (vs maps) complicates identifying which device corresponds to which block when doing “remove” logic. As noted in the PR, this makes deletion logic trickier.

It may be valuable to gather real‐world use cases from users: how many network devices, how often removal is needed post‐clone, etc.

My tentative take

I lean toward Option C. The current clone operation can't be combined nicely with how the clone works in Proxmox and how Terraform expects configs to work. Instead, it should be investigated why users opt to clone VMs instead of copying the Terraform config of an existing VM to create a "clone". I only use the clone operation to get the disk content from my template to the new VM. This could be done in Terraform by only specifying that the disk should be cloned instead of the complete VM. Other use cases could also be covered this way, resolving the conflict we currently observe.
With a new dedicated clone resource, the actual "clone" operation of Proxmox would stay available.

I hope I didn't forget major points, I'm looking forward to hearing everyone’s experiences and thoughts, especially from folks using the clone block in large deployments. :)

elias314 · 2025-11-13T20:08:28Z

elias314
Nov 13, 2025
Sponsor

I think this cloning issue is really interesting because it's such an important and core feature of Proxmox, yet it's implemented in Proxmox in a way that feels fundamentally incompatible with Terraform....

Coding infrastructure in Terraform is really about describing the end state of what that infrastructure should look like. Changes to that infrastructure can be made over time, sometimes destructively, sometimes non-destructively depending on the capabilities of the infrastructure provider. But the reason this works is because Terraform can track the state of those resources, so it knows if the state of the resources it provisions matches the desired state or if there's some drift it needs to correct.

For me, the core issue here is that Proxmox cloning doesn't describe a state; it's a function. The Proxmox UI and associated API for cloning are intentionally designed to make the cloning function as lightweight and easy as possible. This is great for humans and custom scripting, but it doesn't inherently allow an IaC tool like Terraform to track the state of a VM through its lifecycle including changes that may need to be made. An inherited property of a cloned resource is essentially a property with an unknown state. It can be tracked, but with no code to enforce what that state should be, Terraform isn't able to do anything with it. This creates weird issues when trying to remove something like a NIC that has been inherited but has no enforced codified state.

So I think cloning a VM should be treated like performing a terraform import of an existing VM with all the required properties. Any property not specified but part of the source VM being cloned would be removed, anything different would be changed. The clone {} block just tells the provider to call the clone API first to create the VM, then verify that the end state matches the codified state. If a change is later made that requires recreating the VM, the VM can be recreated from the source clone.

TLDR; I'd advocate for Option B. It most closely represents how Terraform is intended to work, and follows the pattern of how Terraform imports work. If someone is using Terraform to create a clone, then presumably they want Terraform to be able to act on that clone. It should be a requirement that the proxmox_virtual_environment_vm block also be a clone of all the properties of the source VM as well. While this is drastically different from the experience with the UI or clone API, that should be ok since the whole point of using Terraform is to enforce state, and in order to do so, that state needs to be properly codified.

0 replies

bpg · 2025-11-17T12:59:51Z

bpg
Nov 17, 2025
Maintainer

I vote for "C".

Cloning support is the major source of pain in the provider. Almost all of the weirdness in the code comes from that. It's the reason why so many attributes are defined as Computed, since they can be inherited from the clone and end up being created or updated silently by the provider because they already exist on the remote.

On the other hand, if I don't use clone, I don't need any of those attributes to be computed, since nothing really changes for my VM on the PVE side (IPs maybe, but that's another story). In that case I want the local VM config to follow the actual VM state in PVE, to be able to detect drift properly and so on.

Option "B" significantly diminish UX benefits of using clones. A practitioner must redefine everything that exists in the template again in the cloned VM, just to make a basic clone use case work without losing VM devices. They want option "A".

Introducing a special e.g. *_cloned_vm resource will solve this conflict. The resource could have something like add|update|delete {...} blocks that act on the cloned VM after creation. Functionally it would be similar to what the current vm resource does, but with explicit operation definitions.

Since this would be a new resource, we can implement it directly in the FWK provider and use proper data structures, like maps, for example net0 = {...}, which removes the ambiguity in update operations that we deal with today.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Behaviour of the clone block vs explicit device configuration in VM resources #2324

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Behaviour of the clone block vs explicit device configuration in VM resources #2324

Uh oh!

TobiPeterG Nov 7, 2025

Replies: 2 comments

Uh oh!

elias314 Nov 13, 2025 Sponsor

Uh oh!

bpg Nov 17, 2025 Maintainer

TobiPeterG
Nov 7, 2025

elias314
Nov 13, 2025
Sponsor

bpg
Nov 17, 2025
Maintainer