Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FrontDoor managed SSL cert provisioning takes long time and eventually times out #4470

Closed
naikajah opened this issue Oct 1, 2019 · 13 comments

Comments

@naikajah
Copy link
Contributor

naikajah commented Oct 1, 2019

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform Version : Terraform v0.12.8
Terraform AzureRM: 1.34.0

Affected Resource(s)

  • azurerm_frontdoor

Terraform Configuration Files

resource "azurerm_frontdoor" "example" {
  name                                         = "example-FrontDoor"
  location                                     = "${azurerm_resource_group.example.location}"
  resource_group_name                          = "${azurerm_resource_group.example.name}"
  enforce_backend_pools_certificate_name_check = false

  routing_rule {
      name                    = "exampleRoutingRule1"
      accepted_protocols      = ["Http", "Https"]
      patterns_to_match       = ["/*"]
      frontend_endpoints      = ["exampleFrontendEndpoint1"]
      forwarding_configuration {
        forwarding_protocol   = "MatchRequest"
        backend_pool_name     = "exampleBackendBing"
      }
  }

  backend_pool_load_balancing {
    name = "exampleLoadBalancingSettings1"
  }

  backend_pool_health_probe {
    name = "exampleHealthProbeSetting1"
  }

  backend_pool {
      name            = "exampleBackendBing"
      backend {
          host_header = "www.bing.com"
          address     = "www.bing.com"
          http_port   = 80
          https_port  = 443
      }

      load_balancing_name = "exampleLoadBalancingSettings1"
      health_probe_name   = "exampleHealthProbeSetting1"
  }

  frontend_endpoint {
    name                              = "exampleFrontendEndpoint1"
    host_name                         = "example-FrontDoor.azurefd.net"
    custom_https_provisioning_enabled = false
  }

   frontend_endpoint {
    name                              = "exampleFrontendEndpoint2"
    host_name                         = "example.com"
    custom_https_provisioning_enabled = true
      custom_https_configuration {
         certificate_source = "FrontDoor"
      }
   }
}

Actual code can be found here ==> https://github.com/hmcts/azure-platform-terraform/blob/710cca00a3db882accea2a78494dffaa681f5ea9/modules/azure-landing-zone/frontdoor.tf#L26

Debug Output

Expected Behavior

CNAME is created for the custom domain to point to xyz.azurefd.net domain
Provisioning of custom domain is completed and request for FrontDoor managed certificates initiated.

Since Azure FrontDoor takes 4-6 hours to validate and provision SSL certificates I expect terraform to initiate a request to FrontDoor to provision the certificates and then return successful initiation of the certs

Actual Behavior

Terraform continues

azurerm_frontdoor.example: Still creating... [17m20s elapsed]
azurerm_frontdoor.example: Still creating... [17m30s elapsed]
azurerm_frontdoor.example: Still creating... [17m40s elapsed]
azurerm_frontdoor.example: Still creating... [17m50s elapsed]
azurerm_frontdoor.example: Still creating... [18m0s elapsed]
azurerm_frontdoor.example: Still creating... [18m10s elapsed]
azurerm_frontdoor.example: Still creating... [18m20s elapsed]
azurerm_frontdoor.example: Still creating... [18m30s elapsed]
azurerm_frontdoor.example: Still creating... [18m40s elapsed]
azurerm_frontdoor.example: Still creating... [18m50s elapsed]

and then eventually times out.

Steps to Reproduce

  1. terraform apply on the example code with custom_https_provisioning_enabled = true

Important Factoids

NA

References

  • #0000
@tombuildsstuff
Copy link
Contributor

hi @naikajah

Thanks for opening this issue.

Taking a look into this the underlying issue appears to be #171 - once that's supported (which we're doing as a part of 2.0 - as outlined in #2807) will allow resources to define custom timeouts; as such once #171 is supported it should be possible to get this fixed.

Thanks!

@BenMitchell1979
Copy link

BenMitchell1979 commented Oct 2, 2019

The Azure API returns a 202 Response for this operation. Giving it a custom time-out won't resolve this issue if TF is expecting a 200.

https://github.com/Azure/azure-rest-api-specs/blob/master/specification/frontdoor/resource-manager/Microsoft.Network/stable/2019-05-01/frontdoor.json

@tombuildsstuff
Copy link
Contributor

@BenMitchell1979 the Azure Provider already polls on 202's - unfortunately in this instance 6 hours is a rather extreme provisioning time.

Internally the Azure Provider has a hard-limit for provisioning time of 3 hours per resource (which for most resources is considerably longer than they'd take) - but unfortunately this doesn't cover all of the longer running resources (such as this and SQL Managed Instance). This 3 hours value is a trade-off between resources requiring a longer provisioning time - and ensuring we're able to bubble up valid errors when somethings wrong (for example, most resources should be provisioned/destroyed within 30m, else something's wrong and we should surface that to the user).

As a part of 2.0 we're adding more reasonable default timeouts to resources - and allowing users to override them as necessary (such as in this instance) - support for that's being worked on and is tracked in #171 - as such this issue's currently blocked on #171

Thanks

@BenMitchell1979
Copy link

If it's a long running operation, then CLI/PS would return once the API returns 202 and user would get the status of the operation in a separate call(Get-AzfrontdoorfrontendEndpoint in case of powershell). It doesn't poll the API for 6-8 hrs in a waiting state.

@tombuildsstuff
Copy link
Contributor

@BenMitchell1979 Terraform intentionally waits until a resource has provisioned successfully before returning from the Create method - so unfortunately we'd not return "ready" when the resource isn't provisioned.

@BenMitchell1979
Copy link

Then I'm confused as to how this would work with long running resources like AFD with custom SSL? Surely the solution can't be to just let it run for 6-8hrs? Most Agents for pipelines won't stick around that long and even it did I wouldn't tie up my build agent that long.

@tombuildsstuff
Copy link
Contributor

@BenMitchell1979 if there's resources depending on that resource which takes that long to provision there's not a lot else we can do unfortunately, since we need confirmation that this resource has successfully completed provisioning/configuration.

@timja
Copy link
Contributor

timja commented Oct 10, 2019

Then I'm confused as to how this would work with long running resources like AFD with custom SSL? Surely the solution can't be to just let it run for 6-8hrs? Most Agents for pipelines won't stick around that long and even it did I wouldn't tie up my build agent that long.

The solution is really for this to get faster...
Microsoft are aware and plan to do some work to make it better, but they said it may take awhile

@WodansSon
Copy link
Collaborator

@tombuildsstuff and @timja are correct, this was a known issue of the underling API, AFD just takes that long to do the provisioning of the SSL cert and there is currently nothing we can do short of what @tombuildsstuff has already stated above. I have spoken to the AFD service team about the excessive amount of time and I got the exact same answer that @timja did, they know it is an issue and are working on ways to speed up the process but it will take time to implement.

@BenMitchell1979
Copy link

Sounds like we are better off for now deploying this via API/Powershell like we do App Service Environment. I am curious as to the logic behind "waiting" for the resource to fully deploy vs just accepting the 202 and moving on. The "resource" is there it's just provisioning the DNS which shouldn't have any downstream depends.

@tombuildsstuff tombuildsstuff modified the milestones: v2.0.0, v2.1.0 Feb 22, 2020
@tombuildsstuff tombuildsstuff modified the milestones: v2.1.0, v2.2.0 Mar 11, 2020
@katbyte katbyte modified the milestones: v2.2.0, v2.3.0 Mar 18, 2020
@tombuildsstuff tombuildsstuff modified the milestones: v2.3.0, v2.5.0 Mar 25, 2020
@tombuildsstuff tombuildsstuff modified the milestones: v2.5.0, v2.7.0 Apr 7, 2020
@katbyte katbyte modified the milestones: v2.7.0, v2.8.0 Apr 23, 2020
@WodansSon
Copy link
Collaborator

This is an API issue and there is nothing we can do at this point, I am going to close this issue as this is an issue with the upstream API.

@ghost
Copy link

ghost commented May 1, 2020

This has been released in version 2.8.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 2.8.0"
}
# ... other configuration ...

@ghost
Copy link

ghost commented May 30, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked and limited conversation to collaborators May 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants