Skip to content
This repository has been archived by the owner on Mar 8, 2022. It is now read-only.

WIP: Support for verifying custom domains #228

Conversation

squarebracket
Copy link
Contributor

@squarebracket squarebracket commented May 7, 2020

Community Note

  • Please vote on this pull request by adding a 👍 reaction to the original pull request comment to help the community and maintainers prioritize this request
  • Please do not leave "+1" comments, they generate extra noise for pull request followers and do not help prioritize the request

Fixes #227, the lazy way.

Changes proposed in this pull request:

  • Add auth0_custom_domain_verification resource

Your CHANGELOG looks auto-generated, I'm not sure if I should add an entry or not.

Also, I don't think I can really add an acceptance test without being able to modify a DNS record...

Output from acceptance testing:

Not run.

@squarebracket squarebracket marked this pull request as draft May 7, 2020 22:36
@alexkappa
Copy link
Owner

Hi @squarebracket, my apologies for the late reply to this PR but since there is some interest in this functionality I would like to share my thoughts around it.

I would indeed like to support verification, but what makes me think twice about it is the volatility of DNS propagation. We would be depending on a name server to resolve a domain correctly within a reasonable time frame so we can perform the actual verification.

Using a null_resource might be a way to go, but perhaps we can do it inside the auth0_custom_domain_verification creation using some of terraforms APIs... But the question remains, what if the DNS didn't propagate correctly within the given timeout time? Can this be tested reliably?

Perhaps we could treat the output of the verification as the resource regardless of a successful verification. A custom domain verification response can have status disabled, pending, pending_verification, or ready. We could save this information and allow users to verify again, perhaps using triggers. This could make our tests more robust and predictable.

Repository owner deleted a comment from github-actions bot Aug 18, 2020
@squarebracket
Copy link
Contributor Author

Thank you for the comment. I'm sure it's tough to find time as a humble single maintainer.

I'm not very familiar with go or with coding for terraform, so I might need you to walk me through something here or there.

I agree that the volatility of DNS creates issues, both with the resource creation and in ensuring a reliable test. My thinking here is: what does the user expect based on the declarative nature of terraform? I would personally expect that the successful creation of all my resources would mean that everything is in the state I expect -- i.e. that the custom domain is verified and working in auth0.

In my opinion, that conflicts with having the auth0_custom_domain_verification resource complete creation with a status of pending_verification, which will then likely change later to ready once it verifies the domain async.

This is also probably worth a mention:

Once verification is complete, it may take up to 10 minutes before the custom domain can start accepting requests.

So there are potentially 2 sources of "wait time" here -- waiting for the verify to complete, and waiting for the custom domain to start accepting requests.

Perhaps we can take an approach similar to the cloudfront_distribution's wait_for_deployment argument. By default, when the resource is created, it continually polls the created cloudfront distribution to see if its status has been set to Deployed -- terraform only considers the resource to be created after that occurs. However, if you set wait_for_deployment to false, it will not wait for the deployment, and return the resource with an InProgress status.

Do you think adding a wait_for_verification parameter to auth0_custom_domain_verification would be a reasonable approach here?

We could save this information and allow users to verify again, perhaps using triggers.

I'm afraid I don't know what you mean, here. Could you elaborate? I thought triggers were only for provisioners?

@alexkappa
Copy link
Owner

alexkappa commented Aug 18, 2020

I would personally expect that the successful creation of all my resources would mean that everything is in the state I expect -- i.e. that the custom domain is verified and working in auth0.

This makes sense. However I would argue that many resources created by terraform are not immediately available. With aws_instance for example terraform may wait for a CREATED status, but that doesn't mean I can already SSH into the new instance.

The same I imagine applies to a digitalocean_record. Terraform won't wait for the DNS record to propagate across the globe before it considers the resource as created.

In my opinion, that conflicts with having the auth0_custom_domain_verification resource complete creation with a status of pending_verification, which will then likely change later to ready once it verifies the domain async.

I probably should have explained myself better here. We wouldn't be doing anything async here. The api.CustomDomain.Verify() would be called once, upon creation of the validation resource (lets think of validation as a resource for the moment).

If at the time of creation of this resource, its status property has a value of something other than ready then so be it. If we wish to re-verify the domain, we could modify the resource using a technique similar to rotating a secret.

@squarebracket
Copy link
Contributor Author

This makes sense. However I would argue that many resources created by terraform are not immediately available. With aws_instance for example terraform may wait for a CREATED status, but that doesn't mean I can already SSH into the new instance.

The same I imagine applies to a digitalocean_record. Terraform won't wait for the DNS record to propagate across the globe before it considers the resource as created.

This is a good point. There is a difference between a resource's concrete existence and being able to query something on / do something with that resource. That boundary isn't always clear. If you approach things from an OOP perspective in terms of objects and properties, I'm not sure a custom domain verification's status and the status of an ssh server within an aws_instance are analogous, but you could probably argue either way.

It's been a while since I've coded this, and some things are only just now coming back to me. If I remember correctly, calling api.CustomDomain.Verify() will fail if the DNS record does not exist. That may only be through the UI, though, I can't remember. Based on the documentation, that's sort of what happens; the API should return 200 whether the verification succeeds or fails.

To me, this means the custom domain will not eventually be verified, which jives with what I remember.

This means it is fundamentally different than an SSH server or DNS record. If configured correctly, those will be "eventually correct." In the case of a custom domain verification, even if you configure everything correctly, there is no chance the domain will be verified if it fails during the terraform apply because the DNS record has not yet propagated. You simply have to call api.CustomDomain.Verify() again.

If at the time of creation of this resource, its status property has a value of something other than ready then so be it. If we wish to re-verify the domain, we could modify the resource using a technique similar to rotating a secret.

Thank you for the example. How would you envision using it? In this case, I think the only thing that would be helpful is to recreate the verification's resource if its custom domain's status is not ready, but perhaps you have something more clever in mind.

If indeed it is all synchronous, though, I still think it's more useful to the user to simply fail the resource creation. It's clear, and the user can work around it with null_resource or what-have-you if they wish.

On that note, would failing the resource creation cause downstream failures? I assume that any auth0 resources/settings that require a custom domain can be configured with an unverified custom domain, but you'd be in a better position to know the actual answer.

@alexkappa
Copy link
Owner

alexkappa commented Aug 22, 2020

This means it is fundamentally different than an SSH server or DNS record...

I wasn't implying that it is. I was merely pointing out that it will likely rely on a DNS record to propagate in order to succeed.

If indeed it is all synchronous, though, I still think it's more useful to the user to simply fail the resource creation. It's clear, and the user can work around it with null_resource or what-have-you if they wish.

I disagree here. The null_resource and the use of local-exec does not guarantee that the DNS entry has propagated fully. If nslookup succeeds to resolve the domain locally (using local-exec), doesn't mean it will also succeed from Auth0's servers. Therefore I don't like the idea of failing the resource creation.

As you pointed out, the docs say that a domain verification request will succeed whether the domain passed or failed verification.

200 Custom domain successfully verified.
200 Custom domain failed verification.

Therefore we can run the verification multiple times until it is successful. To give an example on how to do this, I made a PoC using the trigger mechanism I described earlier. Have a look at this branch.

@squarebracket
Copy link
Contributor Author

squarebracket commented Aug 23, 2020

I disagree here. The null_resource and the use of local-exec does not guarantee that the DNS entry has propagated fully.

I'm not trying to claim that it does. It may work, it may not; the onus is on the user. But at least if the resource creation fails, you at least can know from the exit code of terraform if your infrastructure is in its desired state.

So if I understand your branch correctly, you would have to do the following to verify a domain:

  1. Run terraform apply. This will not attempt to verify the domain, but will create a DNS record.
  2. Modify the auth0_custom_domain resource so that its verification_trigger is some map that is different from the last run.
  3. Run terraform apply again. This will attempt to verify the domain.
  4. Hope it worked. If not, goto 2.

Is that correct?

I guess if you want to know if your domain is verified in an automated fashion, you would have to print the status as an output and do e.g. terraform apply | tee verify_status && grep 'status = "ready"' < verify_status?

@alexkappa
Copy link
Owner

Yep, thats what I'm thinking.

@squarebracket
Copy link
Contributor Author

Personally, I think having to run terraform apply twice is undesirable.

Not saying that I necessarily have a better solution.

@emilhdiaz
Copy link

@alexkappa / @squarebracket is this MR still alive? I'm running into a need for this as well. Happy to contribute if it helps revive this work.

I would recommend looking at the AWS terraform provider for a similar situation. The auth0_custom_domain_verification resource has to handle this same process for DNS verification of ACM certs.

@squarebracket
Copy link
Contributor Author

@emilhdiaz are you familiar with the code of the provider and how it handles the use case?

@squarebracket
Copy link
Contributor Author

squarebracket commented Feb 22, 2021

@alexkappa I took a look at how AWS provider's handles ACM certs and validations, as suggested by @emilhdiaz. It functions similarly, in that there is a resource which has validation attributes, which is then used to create a DNS record, and then some waiting must be done for the DNS change to propagate before verification can be attempted.

Here is what the use case looks like, straight from the docs of the aws_acm_certificate_validation resource:

resource "aws_acm_certificate" "example" {
  domain_name       = "example.com"
  validation_method = "DNS"
}

data "aws_route53_zone" "example" {
  name         = "example.com"
  private_zone = false
}

resource "aws_route53_record" "example" {
  for_each = {
    for dvo in aws_acm_certificate.example.domain_validation_options : dvo.domain_name => {
      name   = dvo.resource_record_name
      record = dvo.resource_record_value
      type   = dvo.resource_record_type
    }
  }

  allow_overwrite = true
  name            = each.value.name
  records         = [each.value.record]
  ttl             = 60
  type            = each.value.type
  zone_id         = data.aws_route53_zone.example.zone_id
}

resource "aws_acm_certificate_validation" "example" {
  certificate_arn         = aws_acm_certificate.example.arn
  validation_record_fqdns = [for record in aws_route53_record.example : record.fqdn]
}

Notice also the big warning message on the docs page stating that the validation resource is merely part of the validation workflow and "does not represent a real-world entity in AWS, therefore changing or deleting this resource on its own has no immediate effect."

I decided to take a look at the source code of the provider. I am not very familiar at all with coding providers, but it seems that in the Create function, they are simply calling resource.Retry (from the SDK, see also this) to continually try to validate the domain until it either succeeds or the create timeout of the resource elapses.

Using a similar strategy for the Auth0 use case, the relevant code could look like this:

resource "auth0_custom_domain" "mydomain" {
  domain = "login.example.com"
  type = "auth0_managed_certs"
  verification_method = "txt"
}
resource "digitalocean_record" "auth0_domain" {
  domain = "example.com"
  type = upper(auth0_custom_domain.mydomain.verification[0].methods[0].name)
  name = "login"
  value = "${auth0_custom_domain.mydomain.verification[0].methods[0].record}."
}
resource "auth0_custom_domain_verification" "mydomain" {
  custom_domain_id = auth0_custom_domain.mydomain.id
  # this is just for implicit resource ordering
  dns_record = digitalocean_record.auth0_domain.value
}

This seems very reasonable to me:

  • It's straightforward for the user in the basic use case:
    • A successful terraform apply with a created auth0_custom_domain_verification means the domain is verified
    • A failed terraform apply means the domain is not verified
    • Running a failed terraform apply again will attempt verification again
  • It allows for a large "wait period" by default, but creation will complete as soon as the DNS record propagates
  • It allows for overriding the "wait period" in a normal terraform way, if needed
  • If the domain gets unverified for whatever reason, the Read function could taint the resource if the custom domain is not in the correct state (see how the AWS provider does it), meaning it can detect configuration drift

What do you think @alexkappa?

@emilhdiaz
Copy link

@squarebracket this makes a lot of sense to me. For comparison, here's a work around we currently have in place using a terraform data external resource (note: we use Cloudflare for DNS management):

#auth0.tf

data "external" "auth0_custom_domain_verification" {
  program = ["bash", "${path.module}/verify-custom-domain.sh"]
  query = {
    domain = var.auth0_domain
    client_id = var.auth0_client_id
    client_secret = var.auth0_client_secret
    custom_domain_id = auth0_custom_domain._.id
  }
  depends_on = [cloudflare_record.auth0_custom_domain_verification]
}

... and the custom bash script that handles the verification logic. Note the use of an SSM param here to preserve the cname_api_key between different terraform runs in lieu of not having the value available in terraform state. Your proposed solution, I think, would solve this by storing this value in terraform state.

# verify-custom-domain.sh

#!/bin/bash

# Exit if any of the intermediate steps fail
set -e

eval "$(jq -r '@sh "DOMAIN=\(.domain) CLIENT_ID=\(.client_id) CLIENT_SECRET=\(.client_secret) CUSTOM_DOMAIN_ID=\(.custom_domain_id)"')"

function respond() {
  local DATA=$1
  echo "${DATA}"
  exit 0
}

function bail() {
  echo '{"status": "error"}'
  exit 1
}

function clean() {
  local DATA=$1
  echo "${DATA}" | jq 'del(.primary)' | jq 'del(.verification)'
}

ACCESS_TOKEN=$(
  curl -Ls --request POST \
  --url "https://${DOMAIN}/oauth/token" \
  --header "content-type: application/x-www-form-urlencoded" \
  --data grant_type=client_credentials \
  --data "client_id=${CLIENT_ID}" \
  --data "client_secret=${CLIENT_SECRET}" \
  --data "audience=https://${DOMAIN}/api/v2/" \
  | jq -r .access_token
)

CUSTOM_DOMAIN=$(
  curl -Ls --request GET \
  --url "https://${DOMAIN}/api/v2/custom-domains/${CUSTOM_DOMAIN_ID}" \
  --header "Authorization: Bearer ${ACCESS_TOKEN}"
)

SSM_PATH="/auth0/custom_domain/cname_api_key"

# Domain was already verified and CNAME_API_KEY should have been saved to SSM
if [ "$(echo "${CUSTOM_DOMAIN}" | jq -r .status)" == "ready" ]; then
  SSM_RESULT=$(
    aws ssm get-parameter \
     --name "${SSM_PATH}" \
     --with-decryption
  )
  # ssm param is missing!
  if [ $? -ne 0 ]; then bail; fi

  # append the CNAME_API_KEY to the custom domain object
  VALUE=$(echo "${SSM_RESULT}" | jq -r .Parameter.Value)
  respond "${VALUE}"
fi

# Domain needs to be verified
if [ "$(echo "${CUSTOM_DOMAIN}" | jq -r .status)" == "pending_verification" ]; then
  # attempt to verify
  VERIFIED_CUSTOM_DOMAIN=$(
    curl -Ls --request POST \
    --url "https://${DOMAIN}/api/v2/custom-domains/${CUSTOM_DOMAIN_ID}/verify" \
    --header "Authorization: Bearer ${ACCESS_TOKEN}"
  )

  # verification succeeded
  if [ "$(echo "${VERIFIED_CUSTOM_DOMAIN}" | jq -r .status)" != "ready" ]; then bail; fi

  # store CNAME_API_KEY to SSM so that it can be read in the future
  VALUE=$(echo "${VERIFIED_CUSTOM_DOMAIN}" | jq "{cname_api_key, origin_domain_name}")
  SSM_RESULT=$(
    aws ssm put-parameter \
     --name "${SSM_PATH}" \
     --value "${VALUE}" \
     --description "CNAME_API_KEY of the Auth0 custom domain" \
     --overwrite \
     --type SecureString
   )
   respond "${VALUE}"
fi

bail

@yinzara
Copy link
Contributor

yinzara commented Apr 7, 2021

I have a similar issue with domain verification with SendGrid in the custom provider we wrote.
https://github.com/yinzara/terraform-provider-sendgrid/blob/main/sendgrid/resource_sendgrid_domain_authentication.go

How I handled it was to add a "valid" (or in this case "verified") optional calculated field that I would just hard code to true in my terraform resources. If 'true', I ignored it during create however I would do an additional API call in the "update" handler in which failed the the update if it failed to verify. I then made sure to check the status of the verification in the "read" handler and set the "verified" boolean if the verification was successful that way running a 2nd apply would catch that the domain wasn't verified as long as a refresh was done (the default behavior) and would again run the verification if it wasn't verified.

This means you'd have to know to run an apply twice and the first apply would not actually perform the verification. But since DNS propagation can take a number of minutes, it was the only way to guarantee it always worked and had a happy workflow.

Just my 2 cents.

@alexkappa
Copy link
Owner

  • It's straightforward for the user in the basic use case:

    • A successful terraform apply with a created auth0_custom_domain_verification means the domain is verified
    • A failed terraform apply means the domain is not verified
    • Running a failed terraform apply again will attempt verification again
  • It allows for a large "wait period" by default, but creation will complete as soon as the DNS record propagates

  • It allows for overriding the "wait period" in a normal terraform way, if needed

  • If the domain gets unverified for whatever reason, the Read function could taint the resource if the custom domain is not in the correct state (see how the AWS provider does it), meaning it can detect configuration drift

@squarebracket thank you for looking deeper into this. Your analysis makes sense and given the example comes from one of the most used providers out there there's a good chance people will be used to how it works.

In this case having a separate resource for verification is more appropriate.

Would you like to revise your PR?

There is one thing we still need to figure out - how do we test this?

@alexkappa
Copy link
Owner

I've been looking into this during the past couple of weeks, mainly trying to wrap my head around testing. The approach makes sense, I've tried it several times and the workflow seems pretty reasonable.

On the testing front, we can use my DigitalOcean account to create DNS entries that can help us test the verification process. I've copied parts of the DO provider so we can create digitalocean_record's from our test cases. Pity terraform doesn't allow providers to be imported, but that seems to be by design.

#410 relies on your contributions and adds tests to verify the feature works as expected.

Thanks again for your effort!

@sergiught
Copy link
Collaborator

Closing this PR as support for verifying custom domains was introduced with #410

@sergiught sergiught closed this Jan 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Verification for custom domains
5 participants