Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opennebula_cluster removes resources on secondary run #389

Closed
TGM opened this issue Jan 5, 2023 · 21 comments · Fixed by #390 or #395
Closed

opennebula_cluster removes resources on secondary run #389

TGM opened this issue Jan 5, 2023 · 21 comments · Fixed by #390 or #395

Comments

@TGM
Copy link

TGM commented Jan 5, 2023

OpenNebula version: 6.4.0
terraform-provider-opennebula: 1.1.0

On the second run the opennebula_cluster are removed from cluster.

Plan:

resource "opennebula_cluster" "cluster" {
  name = var.cluster_name
}

# using custom type until https://github.com/OpenNebula/terraform-provider-opennebula/issues/385 is fixed
resource "opennebula_host" "host" {
  for_each   = toset(var.host_name)

  name       = each.key
  type       = "custom"
  cluster_id = opennebula_cluster.cluster.id

  custom {
    virtualization = "qemu"
    information    = "qemu"
  }
}

resource "opennebula_datastore" "datastore" {
  for_each = var.datastore_name
  
  name = each.key
  type = each.value
  cluster_id = opennebula_cluster.cluster.id

  custom {
    datastore = "fs"
    transfer = "ssh"
  }
}

Variables:

variable "cluster_name" {
  type = string
  description = "Cluster managing a group of hosts. Naming convention: vh<team>-<env>"
  default = "vhinfra-dev"
}

variable "host_name" {
  type = list
  description = "Hosts part of this cluster"
  default = [
    "opennebula_backend_dev",
    "opennebula_backend_dev_2",
  ]
}

variable "datastore_name" {
  type = map
  description = "Datastore attached to hypervisors. This is the equivalent of /data/vm. Naming convention: vh<team>-<env>-small"
  default = {
    "vhinfra-dev-small" = "system",
    "vhinfra-dev-large" = "system",
    "vhinfra-dev-image" = "image",
  }
}

Run 1:

 terraform apply

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # opennebula_cluster.cluster will be created
  + resource "opennebula_cluster" "cluster" {
      + default_tags = (known after apply)
      + id           = (known after apply)
      + name         = "vhinfra-dev"
      + tags_all     = (known after apply)
    }

  # opennebula_datastore.datastore["vhinfra-dev-image"] will be created
  + resource "opennebula_datastore" "datastore" {
      + cluster_id             = (known after apply)
      + default_tags           = (known after apply)
      + driver                 = (known after apply)
      + id                     = (known after apply)
      + name                   = "vhinfra-dev-image"
      + restricted_directories = (known after apply)
      + safe_directories       = (known after apply)
      + tags_all               = (known after apply)
      + type                   = "image"

      + custom {
          + datastore = "fs"
          + transfer  = "ssh"
        }
    }

  # opennebula_datastore.datastore["vhinfra-dev-large"] will be created
  + resource "opennebula_datastore" "datastore" {
      + cluster_id             = (known after apply)
      + default_tags           = (known after apply)
      + driver                 = (known after apply)
      + id                     = (known after apply)
      + name                   = "vhinfra-dev-large"
      + restricted_directories = (known after apply)
      + safe_directories       = (known after apply)
      + tags_all               = (known after apply)
      + type                   = "system"

      + custom {
          + datastore = "fs"
          + transfer  = "ssh"
        }
    }

  # opennebula_datastore.datastore["vhinfra-dev-small"] will be created
  + resource "opennebula_datastore" "datastore" {
      + cluster_id             = (known after apply)
      + default_tags           = (known after apply)
      + driver                 = (known after apply)
      + id                     = (known after apply)
      + name                   = "vhinfra-dev-small"
      + restricted_directories = (known after apply)
      + safe_directories       = (known after apply)
      + tags_all               = (known after apply)
      + type                   = "system"

      + custom {
          + datastore = "fs"
          + transfer  = "ssh"
        }
    }

  # opennebula_host.host["opennebula_backend_dev"] will be created
  + resource "opennebula_host" "host" {
      + cluster_id   = (known after apply)
      + default_tags = (known after apply)
      + id           = (known after apply)
      + name         = "opennebula_backend_dev"
      + tags_all     = (known after apply)
      + type         = "custom"

      + custom {
          + information    = "qemu"
          + virtualization = "qemu"
        }
    }

  # opennebula_host.host["opennebula_backend_dev_2"] will be created
  + resource "opennebula_host" "host" {
      + cluster_id   = (known after apply)
      + default_tags = (known after apply)
      + id           = (known after apply)
      + name         = "opennebula_backend_dev_2"
      + tags_all     = (known after apply)
      + type         = "custom"

      + custom {
          + information    = "qemu"
          + virtualization = "qemu"
        }
    }

Plan: 6 to add, 0 to change, 0 to destroy.

Run 2:

terraform apply
opennebula_cluster.cluster: Refreshing state... [id=123]
opennebula_datastore.datastore["vhinfra-dev-small"]: Refreshing state... [id=203]
opennebula_host.host["opennebula_backend_dev"]: Refreshing state... [id=65]
opennebula_datastore.datastore["vhinfra-dev-large"]: Refreshing state... [id=202]
opennebula_datastore.datastore["vhinfra-dev-image"]: Refreshing state... [id=201]
opennebula_host.host["opennebula_backend_dev_2"]: Refreshing state... [id=64]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # opennebula_cluster.cluster will be updated in-place
  ~ resource "opennebula_cluster" "cluster" {
      ~ datastores       = [
          - 201,
          - 202,
          - 203,
        ]
      ~ hosts            = [
          - 64,
          - 65,
        ]
        id               = "123"
        name             = "vhinfra-dev"
        # (3 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Expected behavior:
No change is expected.

@treywelsh
Copy link
Collaborator

Thanks for the issue, good point, I forgot to check all inter resources relationships.

Actually it possible to tie a datastore/host to a cluster in two ways and the way it's implemented for now is problematic.

When you attach an host to a cluster via the cluster_id field like you did above, the hosts section of the cluster is still empty in the TF file, but when the provider read the cluster datas on cloud side it see that there is some hosts tied.
And then it's sees a diff on the cluster resource and believe it has to remove the hosts and the clusters.

@treywelsh
Copy link
Collaborator

treywelsh commented Jan 6, 2023

We need to remove the direct dependencies between the two resources.

Add Computed behavior (in addition to Optional) on the hosts section of the cluster solve the above case but this doesn't solve this case, there is still a diff after the initial apply:

resource "opennebula_cluster" "cluster" {
  name = "cluster1"

  hosts = [
    opennebula_host.host1.id,
  ]
}

resource "opennebula_host" "host1" {
  name       = "host1"
  type       = "custom"

  custom {
    virtualization = "dummy"
    information    = "dummy"
  }
}

resource "opennebula_host" "host2" {
  name       = "host2"
  type       = "custom"
  cluster_id = opennebula_cluster.cluster.id

  custom {
    virtualization = "dummy"
    information    = "dummy"
  }
}

This may be solved in reading only the configured elements of the hosts section of the cluster resource but with this solution we won't be able to import the hosts and datastores of the resource anymore.
In addition, this adds problems when updating the hosts fields of the cluster resource.

Some other ideas:

  • removing cluster_id from the host resource, or removing hosts section from the cluster resources.
    We had the same problem with user and group membership in B-190: group: filter users field #191 and this was solve in this way
  • create new resources cluster_host and cluster_datastore, to extract the cluster resource dependencies and deprecate hosts and datastores sections. However we still have the diff with the TF shown above

An other thing that is not consistent and may be confusing in the provider across resources is the way we delete a group VS the way we delete a cluster:

  • for now deleting a group will previously remove each users member
  • but deleting a cluster will fails because this wont remove each hosts and datastores members
    However this should be tracked in an other issue...

I'll probably open one or two issues (on group and users) after that to have a consistent behavior on membership management between resources

@TGM
Copy link
Author

TGM commented Jan 11, 2023

The proposed solution does not allow you to associate existing resources to the cluster.

@treywelsh
Copy link
Collaborator

treywelsh commented Jan 11, 2023

You still have the cluster_id field in datastore and in host resources to manage membership.
If we keep two distinct places to manage cluster membership we'll still have some diffs if we use both at the same time like I said with:

resource "opennebula_cluster" "cluster" {
  name = "cluster1"

  hosts = [
    opennebula_host.host1.id,
  ]
}

resource "opennebula_host" "host1" {
  name       = "host1"
  type       = "custom"

  custom {
    virtualization = "dummy"
    information    = "dummy"
  }
}

resource "opennebula_host" "host2" {
  name       = "host2"
  type       = "custom"
  cluster_id = opennebula_cluster.cluster.id

  custom {
    virtualization = "dummy"
    information    = "dummy"
  }
}

To retrieve the list of cluster member I propose to modify cluster fields hosts and datastores from Optional to Computed this would allow to know (but without being able to modify from this place) who is member of the cluster from the cluster resource.

@treywelsh
Copy link
Collaborator

treywelsh commented Jan 11, 2023

We could instead manage membership from the cluster resource but I did this way because:

  • this is more consistent with how I did with group and users similar membership problem
  • the XML-RPC method to allocate a datastore and an host take a cluster_id as a parameter. This parameter could be -1 to leave the resource alone, but managing membership from the cluster resource would add an additional step: allocate host/datastore with cluster_id to -1, then add the resource to the cluster.
  • manage membership from the cluster resource require a bit more complex code

I had to make a choice, I knew these arguments, but it's not a big deal to manage membership from the cluster resource so if everyone think it's better and I'm open to do like this.
If you have another idea feel free to drop an other comment I'm open to discuss and rewrite this PR

@TGM
Copy link
Author

TGM commented Jan 11, 2023

The proposed solution is good, but we allso need a way to associate existing resources.
For example, associate the default DS with a new cluster.

@treywelsh
Copy link
Collaborator

treywelsh commented Jan 11, 2023

Ok so membership management would be less exclusive from the cluster, PR updated.

@TGM
Copy link
Author

TGM commented Jan 11, 2023

Seems legit. Can you add it as a RC release or something so we can test it in pre-prod?

@frousselet
Copy link
Collaborator

frousselet commented Jan 12, 2023

The 1.1.1-rc1 will be released by the end of the week.

@TGM
Copy link
Author

TGM commented Jan 18, 2023

Partially fixed, new datastores are allocated to default cluster as well.

image

resource "opennebula_cluster" "cluster" {
  name       = "${var.cluster_name}-${var.cluster_env}-${var.cluster_cpu_vendor}"
  hosts      = [for host in opennebula_host.host: host.id]
  datastores = [for datastore in opennebula_datastore.datastore: datastore.id]
  # virtual_networks = [145]
}

resource "opennebula_datastore" "datastore" {
  count = var.cluster_datastores

  name       = "${var.cluster_name}-${var.cluster_env}-${var.cluster_cpu_vendor}-ds${count.index + 1}"
  type       = "system"

  custom {
    transfer  = "ssh"
  }
}

resource "opennebula_host" "host" {
  for_each = toset(var.cluster_hosts)

  name       = each.key
  type       = "kvm"
}

@treywelsh treywelsh reopened this Jan 18, 2023
@treywelsh
Copy link
Collaborator

treywelsh commented Jan 18, 2023

In the actual way to manage cluster membership that we just modified for the last RC release of the provider:

A new datastore is added to the default cluster due to these points:

  • the cluster ID is a mandatory parameter of the OpenNebula API when creating a datastore (same for host API), but it's no more user configurable in the last provider RC, so we set it to -1 by default (this add the datastore to the default cluster)
  • A datastore could be set in several clusters, so adding it to a cluster won't remove it from an other cluster (an host could be member of only one datastore at a time, it's why we don't have this behavior for the host resource).

To fix this we could:

  • import the default cluster to manage it's members
  • consider that as soon as we add the datastore to a cluster other than default, we could remove it from the default cluster

@TGM
Copy link
Author

TGM commented Jan 19, 2023

A few ideeas to consider:

  • verify that the resource is associated to another cluster on creation and do not allocate it to the default cluster
  • add a flag named remove from default and if it's added to another cluster than remove it from default

@treywelsh
Copy link
Collaborator

treywelsh commented Jan 19, 2023

Considering your first idea would mean that we use cluster_id in the datastore to specify an other cluster than default.
It's like we return to the initial problems of this issue and rollback our last changes.

We could go this way, i.e. manage the dependency from datastore/host side via the cluster_id field (and then deprecate hosts and datastores sections of the cluster resource).
Then we need to consider your comment when I already proposed this solution to see how we could solve it in another way:
The answer could be: you needs to import the default resources, and then manage their membership via the cluster_id attribute.

It's still possible to rollback these changes as the provider hasn't been release with it's related attributes changes...

@TGM
Copy link
Author

TGM commented Jan 19, 2023

Keep the current RC1 changes, they are way better than the previous version.
We could go this way in order to solve the problem without any further changes:

  1. data query for the default cluster.
  2. update the default cluster resources to null.

And this might be the best ideea yet. That way we don't loose our current gained flexibility and resources are still managed individually.

Update: just realized that updating the default cluster, won't work without importing the resource ...

@TGM
Copy link
Author

TGM commented Jan 19, 2023

It's interesting behavior, that when hosts are created, they are associated correctly only to the new cluster. I wonder why ...
image

@TGM
Copy link
Author

TGM commented Jan 19, 2023

Related?

opennebula_cluster.cluster: Destroying... [id=174]
╷
│ Error: Failed to delete
│ 
│ cluster (ID: 174): OpenNebula error [ACTION]: [one.cluster.delete] Cannot delete cluster. Cluster 174 is not empty, it contains 2 hosts.
╵

@treywelsh
Copy link
Collaborator

treywelsh commented Jan 19, 2023

Related?

opennebula_cluster.cluster: Destroying... [id=174]
╷
│ Error: Failed to delete
│ 
│ cluster (ID: 174): OpenNebula error [ACTION]: [one.cluster.delete] Cannot delete cluster. Cluster 174 is not empty, it contains 2 hosts.
╵

You can't delete cluster with host/datastore inside it's an OpenNebula constraint and the provider doesn't try add behavior to empty the cluster from it's member before deleting it.

@treywelsh
Copy link
Collaborator

treywelsh commented Jan 19, 2023

It's interesting behavior, that when hosts are created, they are associated correctly only to the new cluster. I wonder why ... image

As I said in this previous comment, an host is only in one cluster at a time (datastore could be in several clusters at the same time), so when you add it to a new cluster, OpenNebula automatically remove it from the other. This is not related to the provider.

treywelsh added a commit that referenced this issue Jan 20, 2023
treywelsh added a commit that referenced this issue Jan 23, 2023
treywelsh added a commit that referenced this issue Jan 23, 2023
treywelsh added a commit that referenced this issue Jan 23, 2023
treywelsh added a commit that referenced this issue Jan 23, 2023
treywelsh added a commit that referenced this issue Jan 23, 2023
treywelsh added a commit that referenced this issue Jan 23, 2023
treywelsh added a commit that referenced this issue Jan 24, 2023
frousselet pushed a commit that referenced this issue Jan 24, 2023
frousselet pushed a commit that referenced this issue Jan 24, 2023
frousselet pushed a commit that referenced this issue Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants