Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloudflare_access_service_token should allow automatic rotation #1041

Closed
uncycler opened this issue Apr 22, 2021 · 5 comments · Fixed by #1057
Closed

cloudflare_access_service_token should allow automatic rotation #1041

uncycler opened this issue Apr 22, 2021 · 5 comments · Fixed by #1057
Labels
kind/enhancement Categorizes issue or PR as related to improving an existing feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@uncycler
Copy link

Current Terraform and Cloudflare provider version

Terraform v0.14.10
+ provider registry.terraform.io/cloudflare/cloudflare v2.20.0

Description

When a service token expire, the cloudflare_access_service_token resource is not tainted which cause disruption on service that depends on it.

Use cases

For example, a token that is used for healthcheck. In this use case, if the token expires, healthcheck will fail. The only way to fix the issue, is to issue "terraform taint cloudflare_access_service_token.healthcheck".

resource "cloudflare_access_service_token" "healthcheck" {
  zone_id   = data.cloudflare_zones.myzone.id
  name      = "healthcheck"
}

resource "cloudflare_healthcheck" "zerotrust_healthcheck" {
  zone_id     = data.cloudflare_zones.myzone.id
  name        = "myhealthcheck"
  address     = "myservice.com"

  notification_email_addresses = ["alert@alert.com"]
  type            = "HTTPS"
  port            = "443"
  path            = "/healthz"
  expected_codes  = [200]

  follow_redirects = true
  header {
    header = "CF-Access-Client-Secret"
    values = [cloudflare_access_service_token.healthcheck.client_secret]
  }
  header {
    header = "CF-Access-Client-ID"
    values = [cloudflare_access_service_token.healthcheck.client_id]
  }
  consecutive_fails     = 3
  consecutive_successes = 2
}

Potential Terraform configuration

We could leverage the "time_rotating" resource. But cloudflare_access_service_token resource doesnt have argument that force recreation. Maybe if there is an internal argument that trigger a recreation of the resource like this:

resource "time_rotating" "service_token_healthcheck" {
  rotation_days = 180
}

resource "cloudflare_access_service_token" "healthcheck" {
  zone_id   = data.cloudflare_zones.myzone.id
  name      = "healthcheck"
  trigger   = time_rotating.service_token_healthcheck.unix
}

References

No response

@uncycler uncycler added kind/enhancement Categorizes issue or PR as related to improving an existing feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 22, 2021
@jacobbednarz
Copy link
Member

Thanks for raising this one, it is definitely an interesting proposal.

I think for starters, we don't need an additional resource. The service token itself provides an expires on value which we could store and on validation, force a new resource if the date has passed or is within a configured threshold.

The thing that I keep coming back to though is that Terraform is encapsulating logic that isn't present in the API and that is sometimes surprising. Additionally, if an apply isn't run in that timeframe, this whole concept of renewal leeway goes out the window. For this proposal to work, apply steps need to be consistently running which very few installations are actually doing.

So far, I'm not sure if this is worth implementing for the little return the majority of users would see.

@uncycler
Copy link
Author

Running apply consistently should not be an issue for those who want automatic renewal.

Actually, this is the behavior of the acme_certificate resource for let's encrypt certificat renewal: https://registry.terraform.io/providers/vancluever/acme/latest/docs/resources/certificate#min_days_remaining

@jacobbednarz
Copy link
Member

I'll double check the ACME provider but I don't quite think that is how it works nor is it possible in the current state of Terraform.

The provider keeps track of the certificate expiry and attempts the renewal should that elapse. There isn't a way to have Terraform operate in the background without some form of interaction (either intentional apply or cron-style schedule).

@uncycler
Copy link
Author

Yes, terraform do modification only when on apply. The acme provider triggers a renewal request only if the apply is done within the min_days_remaining threshold. The expiry date is kept in the tfstate and it does it's calculation based on that which is not the case for cloudflare_access_service_token.

@jacobbednarz
Copy link
Member

jacobbednarz commented Apr 26, 2021

I think there is some misunderstanding here so let me confirm where we currently are.

Running apply consistently should not be an issue for those who want automatic renewal.

As we've mentioned above, this is a hurdle. Even if the attribute is added to the cloudflare_access_service_token resource, it doesn't do automatic renewal without an operator first running the terraform apply. For example, if I have an expiry date of a service token for 30 days, refresh value is set to 10 days before hand but no one actually runs a terraform apply within that 30 days, the rotation won't actually happen. If we add this, we need to be explicitly clear that action is in fact required for the token to renew and it is more of a managed rotation then automatic.

I don't mind if someone wants to take a pass at adding this using the following as a guideline but this isn't at the top of my list for now given the nuances I've mentioned with the implementation and expected usage.

  • Add expires_at to cloudflare_access_service_token resource schema
  • Add min_days_for_renewal to cloudflare_access_service_token resource schema defaulting to 0 for not active
    • Introduce a CustomizeDiff method to the expires_on field using the customdiff.ForceNewIfChange to determine if the timespan is within the range to renew.

To avoid downtime rotations, we will also need to include the examples with create_before_destroy otherwise there will be a brief period where the service relying on that token will not have access due to the resource being deleted.

@jacobbednarz jacobbednarz added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Categorizes issue or PR as related to improving an existing feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants