Behaviour of the clone block vs explicit device configuration in VM resources #2324
Replies: 2 comments
-
|
I think this cloning issue is really interesting because it's such an important and core feature of Proxmox, yet it's implemented in Proxmox in a way that feels fundamentally incompatible with Terraform.... Coding infrastructure in Terraform is really about describing the end state of what that infrastructure should look like. Changes to that infrastructure can be made over time, sometimes destructively, sometimes non-destructively depending on the capabilities of the infrastructure provider. But the reason this works is because Terraform can track the state of those resources, so it knows if the state of the resources it provisions matches the desired state or if there's some drift it needs to correct. For me, the core issue here is that Proxmox cloning doesn't describe a state; it's a function. The Proxmox UI and associated API for cloning are intentionally designed to make the cloning function as lightweight and easy as possible. This is great for humans and custom scripting, but it doesn't inherently allow an IaC tool like Terraform to track the state of a VM through its lifecycle including changes that may need to be made. An inherited property of a cloned resource is essentially a property with an unknown state. It can be tracked, but with no code to enforce what that state should be, Terraform isn't able to do anything with it. This creates weird issues when trying to remove something like a NIC that has been inherited but has no enforced codified state. So I think cloning a VM should be treated like performing a TLDR; I'd advocate for Option B. It most closely represents how Terraform is intended to work, and follows the pattern of how Terraform imports work. If someone is using Terraform to create a clone, then presumably they want Terraform to be able to act on that clone. It should be a requirement that the |
Beta Was this translation helpful? Give feedback.
-
|
I vote for "C". Cloning support is the major source of pain in the provider. Almost all of the weirdness in the code comes from that. It's the reason why so many attributes are defined as Computed, since they can be inherited from the clone and end up being created or updated silently by the provider because they already exist on the remote. On the other hand, if I don't use clone, I don't need any of those attributes to be computed, since nothing really changes for my VM on the PVE side (IPs maybe, but that's another story). In that case I want the local VM config to follow the actual VM state in PVE, to be able to detect drift properly and so on. Option "B" significantly diminish UX benefits of using clones. A practitioner must redefine everything that exists in the template again in the cloned VM, just to make a basic clone use case work without losing VM devices. They want option "A". Introducing a special e.g. Since this would be a new resource, we can implement it directly in the FWK provider and use proper data structures, like maps, for example |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
In PR #2260, contributor @hsnprsd introduced logic to detach and delete network devices from a VM when the corresponding network_device config block is removed.
However, during review the repository owner @bpg raised concerns about how this change intersects with the existing semantics of the clone block in VM resources; particularly the expectation that a full clone from a template should inherit all devices by default.
Key questions
Challenges
Possible discussion points
Define the expected behaviour for clones
Option A: Cloned VMs inherit all devices from the template unless the user explicitly removes or overrides them.
Option B: Cloned VMs inherit only those devices declared in the resource block (explicit union of template + config) — omission means removal.
Option C: Deprecate the "clone" operation and provide alternatives for clone use cases.
Option D: Solve this problem in a different way.
Which model aligns best with both Proxmox’s underlying API behaviour and Terraform semantics?
How should “removal” work for (e.g. network) devices in cloned VMs?
If operating under Option A, then a user who omits a network_device block should not expect the cloned VM to drop that device (since it existed on the template).
Under Option B, the omission would mean removal; consistent with Terraform’s “what’s undeclared should be removed” ideal, but possibly surprising for clone use-cases.
With Option C, the VM resource shouldn't be used to clone complete VMs, but only devices from other VMs, e.g. network devices or storage devices. This way, it is clear which devices should exist and from where they should be cloned. The proxmox clone operation could become its own resource to still allow cloning VMs, but they should have limited configuration options to prevent the inherited vs. configured devices conflict.
Communication and documentation
If a breaking change or behavioural change is introduced (e.g., moving from Option A to Option B), it must be clearly documented in the provider’s update notes and user guide under the “clone” section.
Tests: Must cover both scenarios (regular VM + clone VM) to ensure no regressions.
Technical/implementation concerns
Representing devices as lists (vs maps) complicates identifying which device corresponds to which block when doing “remove” logic. As noted in the PR, this makes deletion logic trickier.
It may be valuable to gather real‐world use cases from users: how many network devices, how often removal is needed post‐clone, etc.
My tentative take
I lean toward Option C. The current clone operation can't be combined nicely with how the clone works in Proxmox and how Terraform expects configs to work. Instead, it should be investigated why users opt to clone VMs instead of copying the Terraform config of an existing VM to create a "clone". I only use the clone operation to get the disk content from my template to the new VM. This could be done in Terraform by only specifying that the disk should be cloned instead of the complete VM. Other use cases could also be covered this way, resolving the conflict we currently observe.
With a new dedicated clone resource, the actual "clone" operation of Proxmox would stay available.
I hope I didn't forget major points, I'm looking forward to hearing everyone’s experiences and thoughts, especially from folks using the clone block in large deployments. :)
Beta Was this translation helpful? Give feedback.
All reactions