Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk downsize triggers recreate #8597

Closed
przsab opened this issue Mar 3, 2021 · 11 comments
Closed

Disk downsize triggers recreate #8597

przsab opened this issue Mar 3, 2021 · 11 comments
Assignees
Labels

Comments

@przsab
Copy link

przsab commented Mar 3, 2021

Terraform provider recreates google_compute_disk during attempt to downscale whereas regular API throws expected error.

When somebody misses a 0 in new disk size, all data is gone.

How API works:

$ gcloud compute disks describe disk01 --format="csv(sizeGb)"
60

$ gcloud compute disks resize disk01 --size=50g
ERROR: (gcloud.compute.disks.resize) Could not fetch resource:
 - Invalid value for field 'sizeGb': '50'. New disk size '50' GiB must be larger than existing size '60' GiB.

How Terraform works:

  # module.gcp-oracle.google_compute_disk.disks["disk01"] must be replaced
-/+ resource "google_compute_disk" "disks" {
      ...
      ~ size                      = 60 -> 50 # forces replacement
      ...
    }
@edwardmedia edwardmedia self-assigned this Mar 3, 2021
@edwardmedia edwardmedia added the bug label Mar 3, 2021
@edwardmedia
Copy link
Contributor

edwardmedia commented Mar 3, 2021

@przsab Can you share your Terraform config you are using and the steps to repro the error?

@przsab
Copy link
Author

przsab commented Mar 4, 2021

Absolutely, same results are reproduced on v0.12.29, v0.13.4 and v0.14.7 using latest plugin 3.58.0.

$ nl disk.tf
     1  resource "google_compute_disk" "disk01" {
     2      name = "disk01"
     3      size = 10
     4  }

$ terraform init

Initializing the backend...

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "google" (hashicorp/google) 3.58.0...

The following providers do not have any version constraints in configuration,
so the latest version was installed.

To prevent automatic upgrades to new major versions that may contain breaking
changes, it is recommended to add version = "..." constraints to the
corresponding provider blocks in configuration, with the constraint strings
suggested below.

* provider.google: version = "~> 3.58"

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

$ terraform version
Terraform v0.12.29
Terraform v0.13.4 <- same result
Terraform v0.14.7 <- same result
+ provider.google v3.58.0

Your version of Terraform is out of date! The latest version
is 0.14.7. You can update by downloading from https://www.terraform.io/downloads.html


$ terraform apply

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # google_compute_disk.disk01 will be created
  + resource "google_compute_disk" "disk01" {
      + creation_timestamp        = (known after apply)
      + id                        = (known after apply)
      + label_fingerprint         = (known after apply)
      + last_attach_timestamp     = (known after apply)
      + last_detach_timestamp     = (known after apply)
      + name                      = "disk01"
      + physical_block_size_bytes = (known after apply)
      + project                   = (known after apply)
      + self_link                 = (known after apply)
      + size                      = 10
      + source_image_id           = (known after apply)
      + source_snapshot_id        = (known after apply)
      + type                      = "pd-standard"
      + users                     = (known after apply)
      + zone                      = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

google_compute_disk.disk01: Creating...
google_compute_disk.disk01: Creation complete after 7s [id=projects/some-project-1234/zones/us-central1-a/disks/disk01]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Disk is scaled to 5 from 10:

$ nl disk.tf
     1  resource "google_compute_disk" "disk01" {
     2      name = "disk01"
     3      size = 5
     4  }

$ terraform apply
google_compute_disk.disk01: Refreshing state... [id=projects/some-project-1234/zones/us-central1-a/disks/disk01]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # google_compute_disk.disk01 must be replaced
-/+ resource "google_compute_disk" "disk01" {
      ~ creation_timestamp        = "2021-03-03T22:58:34.534-08:00" -> (known after apply)
      ~ id                        = "projects/some-project-1234/zones/us-central1-a/disks/disk01" -> (known after apply)
      ~ label_fingerprint         = "42WmSpB8rSM=" -> (known after apply)
      - labels                    = {} -> null
      + last_attach_timestamp     = (known after apply)
      + last_detach_timestamp     = (known after apply)
        name                      = "disk01"
      ~ physical_block_size_bytes = 4096 -> (known after apply)
      ~ project                   = "some-project-1234" -> (known after apply)
      ~ self_link                 = "https://www.googleapis.com/compute/v1/projects/some-project-1234/zones/us-central1-a/disks/disk01" -> (known after apply)
      ~ size                      = 10 -> 5 # forces replacement
      + source_image_id           = (known after apply)
      + source_snapshot_id        = (known after apply)
        type                      = "pd-standard"
      ~ users                     = [] -> (known after apply)
      ~ zone                      = "us-central1-a" -> (known after apply)
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

google_compute_disk.disk01: Destroying... [id=projects/some-project-1234/zones/us-central1-a/disks/disk01]
google_compute_disk.disk01: Destruction complete after 6s
google_compute_disk.disk01: Creating...
google_compute_disk.disk01: Creation complete after 5s [id=projects/some-project-1234/zones/us-central1-a/disks/disk01]

Apply complete! Resources: 1 added, 0 changed, 1 destroyed.

Same attempt with gcloud yields expected error:

$ gcloud compute disks describe disk01 --format="csv(sizeGb)"
size_gb
5

$ gcloud compute disks resize disk01 --size=4

This command increases disk size. This change is not reversible.
For more information, see:
https://cloud.google.com/sdk/gcloud/reference/compute/disks/resize

Do you want to continue (Y/n)?  y

ERROR: (gcloud.compute.disks.resize) Could not fetch resource:
 - Invalid value for field 'sizeGb': '4'. New disk size '4' GiB must be larger than existing size '5' GiB.

@ghost ghost removed waiting-response labels Mar 4, 2021
@przsab
Copy link
Author

przsab commented Mar 4, 2021

And from competition:

Terraform will perform the following actions:

  # aws_ebs_volume.disk1 will be updated in-place
  ~ resource "aws_ebs_volume" "disk1" {
        id                   = "vol-0995144fb9d83d909"
      ~ size                 = 10 -> 5
        tags                 = {}
        # (7 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_ebs_volume.disk1: Modifying... [id=vol-0995144fb9d83d909]

Error: InvalidParameterValue: New size cannot be smaller than existing size
        status code: 400, request id: 12afc30f-c50f-4839-94d6-c5715514d561

  on disk.tf line 1, in resource "aws_ebs_volume" "disk1":
   1: resource "aws_ebs_volume" "disk1" {

@edwardmedia
Copy link
Contributor

edwardmedia commented Mar 4, 2021

@przsab below is the plan I got in my test and it seems fine (note: dose not say recreate on disk). I also see the resize call without tearing down the disk in the log. There is also a specific test in the provider.

POST /compute/v1/projects/myproject/zones/us-central1-a/disks/tf-test-96dybihpfv/resize?alt=json HTTP/1.1
Host: compute.googleapis.com
{
 "sizeGb": 100
}
  # google_compute_disk.disk01 will be updated in-place
  ~ resource "google_compute_disk" "disk01" {
        creation_timestamp        = "2021-03-04T08:42:04.659-08:00"
        id                        = "projects/myproject/zones/us-central1-a/disks/issue8597"
        label_fingerprint         = "42WmSpB8rSM="
        labels                    = {}
        name                      = "issue8597"
        physical_block_size_bytes = 4096
        project                   = "myproject"
        self_link                 = "https://www.googleapis.com/compute/v1/projects/myproject/zones/us-central1-a/disks/issue8597"
      ~ size                      = 5 -> 10
        type                      = "pd-standard"
        users                     = []
        zone                      = "us-central1-a"
    }

@edwardmedia
Copy link
Contributor

edwardmedia commented Mar 4, 2021

@przsab downsizing disk is different and does trigger recreation which works as intended as GCP only allows upsizing through the API. Below is what I got. If you want to prevent destroy, you may add lifecycle.prevent_destroy in the config

  # google_compute_disk.disk01 must be replaced
-/+ resource "google_compute_disk" "disk01" {
      ~ creation_timestamp        = "2021-03-04T10:18:30.628-08:00" -> (known after apply)
      ~ id                        = "projects/myproject/zones/us-central1-a/disks/issue8597" -> (known after apply)
      ~ label_fingerprint         = "42WmSpB8rSM=" -> (known after apply)
      - labels                    = {} -> null
      + last_attach_timestamp     = (known after apply)
      + last_detach_timestamp     = (known after apply)
        name                      = "issue8597"
      ~ physical_block_size_bytes = 4096 -> (known after apply)
      ~ project                   = "myproject" -> (known after apply)
      ~ self_link                 = "https://www.googleapis.com/compute/v1/projects/myproject/zones/us-central1-a/disks/issue8597" -> (known after apply)
      ~ size                      = 10 -> 5 # forces replacement
      + source_image_id           = (known after apply)
      + source_snapshot_id        = (known after apply)
        type                      = "pd-standard"
      ~ users                     = [] -> (known after apply)
      ~ zone                      = "us-central1-a" -> (known after apply)
    }

@edwardmedia
Copy link
Contributor

edwardmedia commented Mar 5, 2021

@przsab I have updated its doc to reflect this behavior. It will show up here after the new version of the provider released.

@przsab
Copy link
Author

przsab commented Mar 5, 2021

This is not the correct behavior, downsizing disk should cause an error not replace disk with smaller, empty one.

Prevent destroy is static feature and can't be parameterized, thus breaks any attempts to automate via pipelines (no, we can't consider sed or awk a solution to this).

Maybe disk replacement is useful for hello-world examples where you serve welcome page from nginx container, but for anything serious Terraform should follow API behavior and error out, just like AWS provider does.

@ghost ghost removed the waiting-response label Mar 5, 2021
@rileykarson
Copy link
Collaborator

rileykarson commented Mar 5, 2021

Hey @przsab! Terraform is a higher-level tool than something like gcloud, and has different behaviour here. Instead of managing requests against resources it operates against the state of the resources where the requests taken are an implementation detail. So instead of declaring the operation/request you want it to perform like "resize my disk to XXGB" and having it return whether there was an error or not, you specify a desired end state for your resource and it brings it there (admittedly, to the best of its abilities- it is by no means a perfect abstraction) and only returns an error if it was unable to.

So in contrast with gcloud or the API, that means that when specifying a disk's size, you're stating an end state e.g. "my disk should have size XXGB". From there, for the google_compute_disk resource, Terraform is able to recognise that an underlying operation (resizing down) is impossible, and it suggests recreating it. At the plan stage, a user is able to see that resizing the disk down will recreate it, and choose not to proceed. Additionally, as @edwardmedia raised, lifecycle.prevent_changes can prevent it from being destroyed unless the element is removed from config. It's frustrating that it's not configurable through interpolation, I'll admit.

Our team definitely feels that destroying resources in these resources should be harder than it is, too. See hashicorp/terraform#24658.

@przsab
Copy link
Author

przsab commented Mar 8, 2021

Is the suggestion here to modify CD pipelines to scan for disk destroy operations in terraform plan to prevent recreate? Could you provide a set of regular expressions for each Terraform release to introduce such protection (POSIX or PCRE, 0.12 to 0.15)?

I understand you are fully aware of below unaddressed issues:
hashicorp/terraform#22544
hashicorp/terraform#17599

Is there anything else besides backwards-compatibility stopping you from implementing this change? What if solution relied on new lifecycle rule, i.e. prevent_downsize = true, that way you could retain compatibility and I wouldn't lose the data.

@ghost ghost removed waiting-response labels Mar 8, 2021
@edwardmedia
Copy link
Contributor

@przsab as we have explained this behavior is by design. It is not a bug. You may file an enhancement instead so your request will be triaged. I am closing this issue now.

@ghost
Copy link

ghost commented Apr 8, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Apr 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants