Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Provider produced inconsistent final plan #8826

Closed
juandiegopalomino opened this issue Apr 2, 2021 · 21 comments
Closed

Error: Provider produced inconsistent final plan #8826

juandiegopalomino opened this issue Apr 2, 2021 · 21 comments
Labels

Comments

@juandiegopalomino
Copy link

I see an error when dynamically trying to set the backends for the google_compute_backend_service. Usually works on second run, but this is not good.

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

2021/04/01 22:12:54 [WARN] Log levels other than TRACE are currently unreliable, and are supported only for backward compatibility.
  Use TF_LOG=TRACE to see Terraform's internal logs.
  ----
2021/04/01 22:12:54 [INFO] Terraform version: 0.14.3
2021/04/01 22:12:54 [INFO] Go runtime version: go1.15.2
2021/04/01 22:12:54 [INFO] CLI args: []string{"/usr/local/bin/terraform", "version"}
2021/04/01 22:12:54 [DEBUG] Attempting to open CLI config file: /Users/juandiegopalomino/.terraformrc
2021/04/01 22:12:54 [DEBUG] File doesn't exist, but doesn't need to. Ignoring.
2021/04/01 22:12:54 [DEBUG] ignoring non-existing provider search directory terraform.d/plugins
2021/04/01 22:12:54 [DEBUG] ignoring non-existing provider search directory /Users/juandiegopalomino/.terraform.d/plugins
2021/04/01 22:12:54 [DEBUG] ignoring non-existing provider search directory /Users/juandiegopalomino/Library/Application Support/io.terraform/plugins
2021/04/01 22:12:54 [DEBUG] ignoring non-existing provider search directory /Library/Application Support/io.terraform/plugins
2021/04/01 22:12:54 [INFO] CLI command args: []string{"version"}
Terraform v0.14.3
+ provider registry.terraform.io/hashicorp/google v3.62.0
+ provider registry.terraform.io/hashicorp/helm v2.1.0
+ provider registry.terraform.io/hashicorp/random v3.1.0
+ provider registry.terraform.io/hashicorp/tls v3.1.0

Your version of Terraform is out of date! The latest version
is 0.14.9. You can update by downloading from https://www.terraform.io/downloads.html

Affected Resource(s)

google_compute_backend_service

Terraform Configuration Files

variable "layer_name" {
  description = "Layer name"
  type        = string
}

resource "google_compute_global_address" "load_balancer" {
  name    = "opta-${var.layer_name}"
}

resource "google_compute_health_check" "healthcheck" {
  name               = "opta-${var.layer_name}"
  http_health_check {
    port_specification = "USE_SERVING_PORT"
    request_path = "/healthz"
  }
}

data "google_compute_zones" "zones" {}

data "google_compute_network_endpoint_group" "http" {
  count = length(data.google_compute_zones.zones.names)
  name = "opta-${var.layer_name}-http"
  zone = data.google_compute_zones.zones.names[count.index]
  depends_on = [
    helm_release.ingress-nginx
  ]
}


data "google_compute_network_endpoint_group" "https" {
  count = length(data.google_compute_zones.zones.names)
  name = "opta-${var.layer_name}-https"
  zone = data.google_compute_zones.zones.names[count.index]
  depends_on = [
    helm_release.ingress-nginx
  ]
}

resource "google_compute_backend_service" "backend_service" {
  name        = "opta-${var.layer_name}"
  port_name   = "http"
  protocol    = "HTTP"

  health_checks = [google_compute_health_check.healthcheck.id]

  dynamic "backend" {
    for_each = local.negs
    content {
      balancing_mode = "RATE"
      max_rate_per_endpoint = 50
      group = backend.value
    }
  }
  depends_on = [helm_release.ingress-nginx] # A helm release that succeeded
}

Debug Output

https://gist.github.com/juandiegopalomino/b54500cc1bc98a6170d10322b818a481

Panic Output

Expected Behavior

No provider error

Actual Behavior

Error: Provider produced inconsistent final plan

Steps to Reproduce

  1. terraform apply

Important Factoids

@ghost ghost added the bug label Apr 2, 2021
@venkykuberan venkykuberan self-assigned this Apr 2, 2021
@venkykuberan
Copy link
Contributor

@juandiegopalomino I am not able to repro on my end.

Can you attach the debug log of Create (POST) request/response of google_compute_backend_service. I want to see what we got for the creation request. Also i don't see retry logic in the Read request, we can add to it if the issue is consistent.

@juandiegopalomino
Copy link
Author

@venkykuberan you mean the output.txt I captured was too short and I should return more lines? (I can cause I can endlessly repro this error, just double checking)

@ghost ghost removed waiting-response labels Apr 2, 2021
@venkykuberan
Copy link
Contributor

@juandiegopalomino It looks the issue is not with google_compute_backend_service. I see this resource doesn't exist
projects/jds-throwaway-2/zones/us-central1-a/networkEndpointGroups/opta-jd-gcp-test-2-https. Can you check your data source exist ?

@juandiegopalomino
Copy link
Author

I gave you an incomplete picture, local.negs is

locals {
  negs = var.delegated ? compact(concat(data.google_compute_network_endpoint_group.http.*.id, data.google_compute_network_endpoint_group.https.*.id))  : compact(data.google_compute_network_endpoint_group.http.*.id)
}

Delegated is a boolean saying whether we're ready to serve ssl or not (upon inspection I think we can get rid of it). The network endpoint groups are created by the helm chart the data source depends on. Furthermore if we ever receive null values then they should have been removed by the compact operation.

@ghost ghost removed the waiting-response label Apr 2, 2021
@juandiegopalomino
Copy link
Author

For some reason in the terraform plan it does think that the backend size is 1:

  + resource "google_compute_backend_service" "backend_service" {
      + connection_draining_timeout_sec = 300
      + creation_timestamp              = (known after apply)
      + fingerprint                     = (known after apply)
      + health_checks                   = (known after apply)
      + id                              = (known after apply)
      + load_balancing_scheme           = "EXTERNAL"
      + name                            = "opta-jd-gcp-test-5"
      + port_name                       = "http"
      + project                         = (known after apply)
      + protocol                        = "HTTP"
      + self_link                       = (known after apply)
      + session_affinity                = (known after apply)
      + timeout_sec                     = (known after apply)

      + backend {
          + balancing_mode        = (known after apply)
          + capacity_scaler       = 1
          + group                 = (known after apply)
          + max_rate_per_endpoint = (known after apply)
          + max_utilization       = 0.8
        }

      + cdn_policy {
          + signed_url_cache_max_age_sec = (known after apply)

          + cache_key_policy {
              + include_host           = (known after apply)
              + include_protocol       = (known after apply)
              + include_query_string   = (known after apply)
              + query_string_blacklist = (known after apply)
              + query_string_whitelist = (known after apply)
            }
        }

      + log_config {
          + enable      = (known after apply)
          + sample_rate = (known after apply)
        }
    }

even though I specified a dynamic backend

@juandiegopalomino
Copy link
Author

So for some reason it's trying to plan with a single backend even though its multiple and determined by a data resource that is not known yet

@juandiegopalomino
Copy link
Author

yup, successive runs has the correct amount of backends

  # module.gcpk8sbase.google_compute_backend_service.backend_service will be created
  + resource "google_compute_backend_service" "backend_service" {
      + connection_draining_timeout_sec = 300
      + creation_timestamp              = (known after apply)
      + fingerprint                     = (known after apply)
      + health_checks                   = [
          + "projects/jds-throwaway-5/global/healthChecks/opta-jd-gcp-test-5",
        ]
      + id                              = (known after apply)
      + load_balancing_scheme           = "EXTERNAL"
      + name                            = "opta-jd-gcp-test-5"
      + port_name                       = "http"
      + project                         = (known after apply)
      + protocol                        = "HTTP"
      + self_link                       = (known after apply)
      + session_affinity                = (known after apply)
      + timeout_sec                     = (known after apply)

      + backend {
          + balancing_mode        = "RATE"
          + capacity_scaler       = 1
          + group                 = "projects/jds-throwaway-5/zones/us-central1-a/networkEndpointGroups/opta-jd-gcp-test-5-http"
          + max_rate_per_endpoint = 50
          + max_utilization       = 0.8
        }
      + backend {
          + balancing_mode        = "RATE"
          + capacity_scaler       = 1
          + group                 = "projects/jds-throwaway-5/zones/us-central1-b/networkEndpointGroups/opta-jd-gcp-test-5-http"
          + max_rate_per_endpoint = 50
          + max_utilization       = 0.8
        }
      + backend {
          + balancing_mode        = "RATE"
          + capacity_scaler       = 1
          + group                 = "projects/jds-throwaway-5/zones/us-central1-c/networkEndpointGroups/opta-jd-gcp-test-5-http"
          + max_rate_per_endpoint = 50
          + max_utilization       = 0.8
        }

      + cdn_policy {
          + signed_url_cache_max_age_sec = (known after apply)

          + cache_key_policy {
              + include_host           = (known after apply)
              + include_protocol       = (known after apply)
              + include_query_string   = (known after apply)
              + query_string_blacklist = (known after apply)
              + query_string_whitelist = (known after apply)
            }
        }

      + log_config {
          + enable      = (known after apply)
          + sample_rate = (known after apply)
        }
    }

so the error really seems to be related to planning and thinking there is only 1 backend in the beginning

@dinvlad
Copy link

dinvlad commented Apr 5, 2021

Not sure if related, but it's also flip-flopping default_ttl and serve_while_stale settings when they're left at the default values.

@juandiegopalomino
Copy link
Author

@venkykuberan do you need any more details?

@slevenick
Copy link
Collaborator

Strange... I would guess that this has to do with Terraform core if it is an issue with plans changing between applies, especially when a datasource is used to populate a dynamic block

Have you been able to run this config with other versions of Terraform core? Can you either try 0.13.x or hard-code the negs variable to the expected result of the datasource call?

I can do some looking around to see if we do anything weird with the backends field, but I'm suspicious of the combination of datasources to populate a dynamic block

@juandiegopalomino
Copy link
Author

I'd imagine dynamically iterating over a hard-coded value would work (otherwise this error would be far more serious), but that's not a feasible solution at the moment-- we need to get the values as a values dependent of helm_release.ingress-nginx because it's the helm_release.ingress-nginx which creates them. I haven't tried it with v13 as downgrading is awfully difficult and would rather not. It also isn't the combination as the combination only happens some of the time, and not during the test runs I did.

@juandiegopalomino
Copy link
Author

Maybe it something to do with some terraform default behavior for for-each if the original value is not known before apply

@juandiegopalomino
Copy link
Author

@slevenick anything else I can provide to help?

@slevenick
Copy link
Collaborator

So, the problem to me seems to be with datasources + dynamic blocks. I'd like to be able to get a reproduction so that we can open an issue against Terraform core about this, as the problem is likely not caused within this provider. I'm having trouble reproducing it though

@juandiegopalomino
Copy link
Author

You mean it works well for you?

@slevenick
Copy link
Collaborator

No, I mean I haven't been able to get a working reproduction of this error using datasources + dynamic blocks. But it probably requires some extra setup that I haven't done

@juandiegopalomino
Copy link
Author

Hi, so at some point this past week, this error was overshadowed by this one: #8878 (comment)
which I still could not figure out.
As the show must go on, I decided to "manually" pass in the zone names as variables, hoping that that would solve the problem. Here is the updated code:

resource "google_compute_health_check" "healthcheck" {
  name               = "opta-${var.layer_name}"
  http_health_check {
    port_specification = "USE_SERVING_PORT"
    request_path = "/healthz"
  }
}

data "google_compute_network_endpoint_group" "http" {
  count = length(var.zone_names)
  name = "opta-${var.layer_name}-http"
  zone = var.zone_names[count.index]
  depends_on = [
    helm_release.ingress-nginx
  ]
}


data "google_compute_network_endpoint_group" "https" {
  count = length(var.zone_names)
  name = "opta-${var.layer_name}-https"
  zone = var.zone_names[count.index]
  depends_on = [
    helm_release.ingress-nginx
  ]
}

resource "google_compute_backend_service" "backend_service" {
  name        = "opta-${var.layer_name}"
  port_name   = "http"
  protocol    = "HTTP"

  health_checks = [google_compute_health_check.healthcheck.id]

  dynamic "backend" {
    for_each = local.negs
    content {
      balancing_mode = "RATE"
      max_rate_per_endpoint = 50
      group = backend.value
    }
  }
  depends_on = [helm_release.ingress-nginx, google_compute_health_check.healthcheck]
}

The network endpoint groups are still being created dynamically by another resource (linkerd setup which for convenience I have added here: https://gist.github.com/juandiegopalomino/077490e5fa409a2533ed8f927514ca93). Sadly, on the first run the error of the changing plan of the dynamic blocks persist. I hope that this helps in reproducing.

@melinath
Copy link
Collaborator

melinath commented May 4, 2021

serve_while_stale seems likely to be the root cause behind #8939; hopefully fixing that will eliminate one of the contributing factors.

@pawelJas
Copy link

pawelJas commented Jul 6, 2024

@melinath should this be closed now too (the root cause issue has been closed).

modular-magician added a commit to modular-magician/terraform-provider-google that referenced this issue Jul 11, 2024
[upstream:30ab2a2eea61cc34f439ddfe7cf840abf746ab1f]

Signed-off-by: Modular Magician <magic-modules@google.com>
modular-magician added a commit that referenced this issue Jul 11, 2024
[upstream:30ab2a2eea61cc34f439ddfe7cf840abf746ab1f]

Signed-off-by: Modular Magician <magic-modules@google.com>
@melinath
Copy link
Collaborator

Closing per comment from @pawelJas - please open a new ticket if you're still seeing this problem!

Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants