Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provider 2.20.0 -> 3.40.0 google_compute_instance_template scratch disk validation anomalies #7341

Closed
Assignees
Labels

Comments

@nyc3-mikekot
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform version is : 0.12.18
Original Google provider version used: 2.20.0; attempting to upgrade to 3.40.0

The issue is the following; from what I recall, we originally created our instance templates with 2.20.0 and did not specifically define a disk_size_gb of 375 for our scratch disks. In state, they show up with a disk_size_gb of "0" (I checked this). With the new provider, I of course am required to specify a disk_size_gb (so I put 375), but terraform plan insists that the instance template and everything that depends on it needs to be recreated since 0 is being changed to 375. I would like to avoid having to recreate everything, though, since nothing is actually being changed.

Here is the google_compute_instance_template resource example, it is slightly modified but I think it should still work for purposes of testing. Create it with 2.20.0, ideally I expect disk_size_gb to be reported as 0 for all of the scratch disks. Once switching over to 3.40.0, running another plan will fail since 0 is an invalid entry, and once you uncomment out the disk_size_gb = 375 line, you will see plans that say something like ~ disk_size_gb = 0 -> 375 # forces replacement

Please let me know if there's any more information that you'd like for me to add or if anything I've mentioned previously does not make sense. Additionally, I have tried editing the state file with these steps:

  1. Edited the state file in a text editor from all values of 0-> 375, then replaced the remote state file in GCS
  2. Ran terraform state show for the instance template in question. disk_size_gb was reported as 375
  3. Ran terraform refresh to reconcile real world with what the state file is showing since I modified it
  4. Ran terraform state show (same as step 2) on the instance template again and values had changed back to 0 again
resource "google_compute_instance_template" "default" {
  region               = "us-central1"
  name_prefix          = "sample-prefix"
  machine_type         = "n1-standard-1"

  // Create a new boot disk from an image
  disk {
      boot         = true
      auto_delete  = true
      source_image = "ubuntu-os-cloud/ubuntu-1804-lts"
      disk_size_gb = 100
      disk_type    = "pd-ssd"
  }

  // local-ssd nvme block
  dynamic "disk" {
    for_each = range(5)
    content {
      type         = "SCRATCH"
      disk_type    = "local-ssd"
      interface    = "NVME"
      #disk_size_gb = 375 # note - this was omitted in the original creation with 2.20.0, so don't include it when creating with 2.20.0
    }
  }

  network_interface {
    network = "default"
  }

  lifecycle {
    create_before_destroy = true
  }
}

Example plan:

  ~ disk {
            auto_delete  = true
          ~ boot         = false -> (known after apply)
          ~ device_name  = "local-ssd-0" -> (known after apply)
          ~ disk_size_gb = 0 -> 375 # forces replacement
            disk_type    = "local-ssd"
            interface    = "NVME"
          - labels       = {} -> null
          ~ mode         = "READ_WRITE" -> (known after apply)
          + source_image = (known after apply)
            type         = "SCRATCH"
        }
@edwardmedia
Copy link
Contributor

edwardmedia commented Sep 23, 2020

@nyc3-mikekot Below are the nodes of disk in the responses from api. The first one is the old node when you creates the instance without specifying the size, while the later was created with v3.40.0. Clearly the Cloud does not have the attribute of diskSizeGb (but it shows "375" on the GCP Console), therefore there is always a difference in diskSizeGb when Terraform prepares the plan.

To solve this problem, we need API to return the disk node which can reflect the value of diskSizeGb, or write 375gb in the provider if the API doesn't return a value.

      {
        "deviceName": "local-ssd-0", 
        "kind": "compute#attachedDisk", 
        "initializeParams": {
          "diskType": "local-ssd"
        }, 
        "autoDelete": true, 
        "index": 1, 
        "mode": "READ_WRITE", 
        "interface": "NVME", 
        "type": "SCRATCH"
      }, 
      {
        "deviceName": "local-ssd-0", 
        "kind": "compute#attachedDisk", 
        "initializeParams": {
          "diskSizeGb": "375", 
          "diskType": "local-ssd"
        }, 
        "autoDelete": true, 
        "index": 1, 
        "mode": "READ_WRITE", 
        "interface": "NVME", 
        "type": "SCRATCH"
      }, 

@nyc3-mikekot
Copy link
Author

nyc3-mikekot commented Sep 23, 2020

@edwardmedia So would something on the API end have to be changed to remediate this, and nothing can be done on the provider level?

On the terraform end, wouldn't it just be possible to relax the validation slightly for when disk_size_gb is 0 and default to 375 somehow? Scratch disks can ONLY be 375 GB anyway, so it seems superfluous to offer the option of passing in a value for disk_size_gb if it can only be one value regardless.

Error: 1 error occurred:
	* SCRATCH disks must be exactly 375GB, disk 1 is 0

Here is the link to the specific error validation in the source code:

return fmt.Errorf("SCRATCH disks must be exactly 375GB, disk %d is %d", i, diskSize)

@edwardmedia
Copy link
Contributor

@nyc3-mikekot ideally it would be a change in API end. But we can plan on write 375gb in the provider if the API doesn't return a value to take care of this issue.

@nyc3-mikekot
Copy link
Author

I think either or would be good enough (at least in terms of a resolution). Is the only other workaround at this point to simply recreate the resource altogether using the latest provider?

Additionally, Sorry if this might seem pressing, but is there any way to ETA when the fix could be live?

@c2thorn
Copy link
Collaborator

c2thorn commented Sep 24, 2020

@nyc3-mikekot I have a PR out to assume a value of 375 GB when the API doesn't return a value for scratch disks. v3.41 is already cut and scheduled for 9/28, so the next available release with this fix will be v3.42 scheduled for 10/5.

@ghost
Copy link

ghost commented Oct 26, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Oct 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.