Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LetsEncrypt DNS-01 fails when mixing domains with wildcard subdomains. #3468

Closed
quisido opened this issue Jun 10, 2018 · 33 comments
Closed

LetsEncrypt DNS-01 fails when mixing domains with wildcard subdomains. #3468

quisido opened this issue Jun 10, 2018 · 33 comments
Assignees
Labels
area/acme kind/bug/confirmed a confirmed bug (reproducible). priority/P1 need to be fixed in next release status/5-frozen-due-to-age
Milestone

Comments

@quisido
Copy link

quisido commented Jun 10, 2018

This may be a bug with ACME, but I'm not technical enough to know, and I'm using Traefik, so reporting it here.

Do you want to request a feature or report a bug?

Bug

What did you do?

I'm using the traefik docker image. I configured and run it, and immediately the certificates are rejected only when mixing wildcard domains.

What did you expect to see?

Wildcard subdomains to be allowed.

What did you see instead?

time="2018-06-10T21:15:43Z" level=error msg="Unable to obtain ACME certificate for domains \"*.EVERYDOMAIN.com,EVERYDOMAIN.com\" : cannot obtain certificates: acme: Error -> One or more domains had a problem:\n[EVERYDOMAIN.com] acme: Error 403 - urn:ietf:params:acme:error:unauthorized - Incorrect TXT record \"HASH HERE\" found at _acme-challenge.EVERYDOMAIN.com\n"

Points of interest:
This only happens if the wildcard domains is listed with the main domain.
Setting a delay of 5s did not fix this.
It is apparently able to set the TXT record; I'm not getting a permission error.
I believe the provider (Cloudflare) has to verify that TXT record is set before ACME even checks?

Output of traefik version: (What version of Traefik are you using?)

Version:      v1.6.3
Codename:     tetedemoine
Go version:   go1.10.2
Built:        2018-06-05_03:29:01PM
OS/Arch:      linux/amd64

What is your environment & configuration (arguments, toml, provider, platform, ...)?

defaultEntryPoints = ["http", "https"]

[acme]
  email = "..."
  entryPoint = "https"
  onHostRule = true
  storage = "acme.json"
  [acme.dnsChallenge]
    delayBeforeCheck = 0
    provider = "cloudflare"

[[acme.domains]]
  main = "*.MYDOMAIN.com"
  sans = ["MYDOMAIN.com"]

[entryPoints]
  [entryPoints.http]
    address = ":80"
    [entryPoints.http.redirect]
      entryPoint = "https"
    [entryPoints.https]
      address = ":443"
      [entryPoints.https.tls]

[web]
  address = ":8080"
  [web.auth.basic]
    users = ["..."]

The environment has cloudflare email and cloudflare api key in it.

When I set the wildcard subdomains to be their own main entry and the domain to be its own main entry, it works without error.

@nmengin
Copy link
Contributor

nmengin commented Jun 11, 2018

Hello @CharlesStover ,

Many thanks for your interest in the project.

It seems to be a timeout problem when our ACME client (LEGO) tries to check the TXT records.

Obviously it should work because the Cloudflare Timeout is equals to the Cloudflare TXT TTL.
But I guess that it should be better to have a TTL lesser than the Timeout.

Can you activate the Træfik DEBUG logs and the acme.acmelogging option and provide your logs please?
They will allow us to be sure that the problem comes from the timeout.

@nmengin nmengin added area/acme kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. contributor/waiting-for-feedback and removed status/0-needs-triage labels Jun 11, 2018
@quisido
Copy link
Author

quisido commented Jun 12, 2018

I've already generated all the certs for like 8 domains (and wildcard subdomains separately, as per the OP). I'm not sure the ramifications of erasing them and re-generating them just to get these debug logs. I don't want to hit LetsEncrypt's maximum requests and be stuck without TLS, as I have HSTS enabled.

Is there a way I can run this in staging from the existing docker image without losing my current certs? I'm very new to Traefik and without tutorials have no idea what I'm doing.

@leebenson
Copy link

leebenson commented Jun 13, 2018

I have exactly the same issue.

Config (k8s):
kind: ConfigMap
apiVersion: v1
metadata:
  name: traefik-https-v16
  namespace: kube-system
data:
  traefik.toml: |
    # traefik.toml
    debug = true
    defaultEntryPoints = ["http","https"]
    [entryPoints]
      [entryPoints.http]
      address = ":80"
      [entryPoints.http.redirect]
      entryPoint = "https"
      [entryPoints.https]
      address = ":443"
      [entryPoints.https.tls]
    [acme]
    acmeLogging = true
    email = "lee@leebenson.com"
    storage = "/etc/traefik/acme.json"
    entryPoint = "https"
    caServer = "https://acme-v02.api.letsencrypt.org/directory"
    [[acme.domains]]
    main = "*.notmyreal.site"
    sans = ["notmyreal.site"]
    [acme.dnsChallenge]
    provider = "cloudflare"
Logs:
time="2018-06-13T10:24:45Z" level=info msg="Using TOML configuration file /config/traefik.toml"
time="2018-06-13T10:24:45Z" level=info msg="Traefik version v1.6.3 built on 2018-06-05_03:29:01PM"
time="2018-06-13T10:24:45Z" level=info msg="\nStats collection is disabled.\nHelp us improve Traefik by turning this feature on :)\nMore details on: https://docs.traefik.io/basics/#collected-data\n"
time="2018-06-13T10:24:45Z" level=info msg="Preparing server http &{Address::80 TLS:<nil> Redirect:0xc4207abd40 Auth:<nil> WhitelistSourceRange:[] WhiteList:<nil> Compress:false ProxyProtocol:<nil> ForwardedHeaders:0xc42072ab20} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s"
time="2018-06-13T10:24:45Z" level=info msg="Preparing server traefik &{Address::8080 TLS:<nil> Redirect:<nil> Auth:<nil> WhitelistSourceRange:[] WhiteList:<nil> Compress:false ProxyProtocol:<nil> ForwardedHeaders:0xc42072ab40} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s"
time="2018-06-13T10:24:45Z" level=info msg="Preparing server https &{Address::443 TLS:0xc420771280 Redirect:<nil> Auth:<nil> WhitelistSourceRange:[] WhiteList:<nil> Compress:false ProxyProtocol:<nil> ForwardedHeaders:0xc42072ab00} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s"
time="2018-06-13T10:24:45Z" level=info msg="Starting server on :8080"
time="2018-06-13T10:24:45Z" level=info msg="Starting server on :80"
time="2018-06-13T10:24:45Z" level=info msg="Starting provider configuration.providerAggregator {}"
time="2018-06-13T10:24:45Z" level=info msg="Starting server on :443"
time="2018-06-13T10:24:45Z" level=info msg="Starting provider *kubernetes.Provider {\"Watch\":true,\"Filename\":\"\",\"Constraints\":[],\"Trace\":false,\"TemplateVersion\":0,\"DebugLogGeneratedTemplate\":false,\"Endpoint\":\"\",\"Token\":\"\",\"CertAuthFilePath\":\"\",\"DisablePassHostHeaders\":false,\"EnablePassTLSCert\":false,\"Namespaces\":null,\"LabelSelector\":\"\",\"IngressClass\":\"\"}"
time="2018-06-13T10:24:45Z" level=info msg="Starting provider *acme.Provider {\"Email\":\"lee@leebenson.com\",\"ACMELogging\":true,\"CAServer\":\"https://acme-v02.api.letsencrypt.org/directory\",\"Storage\":\"/etc/traefik/acme.json\",\"EntryPoint\":\"https\",\"OnHostRule\":false,\"OnDemand\":false,\"DNSChallenge\":{\"Provider\":\"cloudflare\",\"DelayBeforeCheck\":0},\"HTTPChallenge\":null,\"Domains\":[{\"Main\":\"*.notmyreal.site\",\"SANs\":[\"notmyreal.site\"]}],\"Store\":{}}"
time="2018-06-13T10:24:45Z" level=info msg="Testing certificate renew..."
time="2018-06-13T10:24:45Z" level=info msg="ingress label selector is: \"\""
time="2018-06-13T10:24:45Z" level=info msg="Creating in-cluster Provider client"
time="2018-06-13T10:24:46Z" level=info msg="Server configuration reloaded on :80"
time="2018-06-13T10:24:46Z" level=info msg="Server configuration reloaded on :8080"
time="2018-06-13T10:24:46Z" level=info msg="Server configuration reloaded on :443"
time="2018-06-13T10:24:46Z" level=info msg="Server configuration reloaded on :443"
time="2018-06-13T10:24:46Z" level=info msg="Server configuration reloaded on :80"
time="2018-06-13T10:24:46Z" level=info msg="Server configuration reloaded on :8080"
time="2018-06-13T10:24:50Z" level=info msg=Register...
legolog: 2018/06/13 10:24:50 [INFO] acme: Registering account for lee@leebenson.com
legolog: 2018/06/13 10:24:51 [INFO][*.notmyreal.site] acme: Obtaining bundled SAN certificate
legolog: 2018/06/13 10:24:51 [INFO][*.notmyreal.site] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz/5wwP658kTBzcxCS5utOMt7-ggUOoY3kPweQao7lWvJA
legolog: 2018/06/13 10:24:51 [INFO][notmyreal.site] acme: Trying to solve DNS-01
legolog: 2018/06/13 10:24:52 [INFO][notmyreal.site] Checking DNS record propagation using [10.0.0.10:53]
time="2018-06-13T10:25:03Z" level=error msg="Unable to obtain ACME certificate for domains \"*.notmyreal.site,notmyreal.site\" : cannot obtain certificates: acme: Error -> One or more domains had a problem:\n[notmyreal.site] acme: Error 403 - urn:ietf:params:acme:error:unauthorized - No TXT record found at _acme-challenge.notmyreal.site\n"

The error seems to alternate between:

 : Error 403 - urn:ietf:params:acme:error:unauthorized - No TXT record found at _acme-challenge.notmyreal.site

And:

Error 403 - urn:ietf:params:acme:error:unauthorized - Incorrect TXT record \"HREckyrZXY7uCVLaUkoYzadxkHbwfFavNWS_v14yMzk\" found at _acme-challenge.notmyreal.site\n"

(This is a dev domain, so I'm not too worried about rate limits/sensitive debug logs... but would prefer to stay within limits if possible, to avoid the faffing of switching to another domain.)

@ghost
Copy link

ghost commented Jun 16, 2018

I've also been seeing this, although its a different provider.

[acme]
  email = "not.me@notmysite.com"
  storage = "acme.json"
  entryPoint = "https"
  onHostRule = true

  [acme.dnsChallenge]
    provider = "digitalocean"
  [[acme.domains]]
    main = "*.notmyworkingsite.com" #this one gets the certificate
    sans = ["notmyworkingsite.com"]
  [[acme.domains]]
    main = "*.notmysite.com" #this one errors
    sans = ["notmysite.com"]

Error:

time="2018-06-16T16:21:32Z" level=error msg="Unable to obtain ACME certificate for domains \"*.notmysite.com,notmysite.com\" : cannot obtain certificates: acme: Error -> One or more domains had a problem:\n[notmysite.com] acme: Error 403 - urn:ietf:params:acme:error:unauthorized - Incorrect TXT record \"ysCnD8A34O4TJsW9bKdAU4yW3c_3l6pWQkxhqraUoOA\" found at _acme-challenge.notmysite.com\n"

Edit: as the key is cleared out each time, I can only assume the issue is that traefik is setting all domains to the same key, not the one that the request came in upon.

@nmengin
Copy link
Contributor

nmengin commented Jun 18, 2018

Hello @leebenson , @pirion

Many thanks for your feedbacks.

Is it possible to provide all the DEBUG logs please?
You can enable them by adding Debug=true in the Træfik options.

Moreover, you can use the Let's Encrypt staging mode by setting the option caServer = "https://acme-staging-v02.api.letsencrypt.org/directory" in Træfik.
Thus you will not have problem with the rate limiting.

Thanks in advance.

@nmengin nmengin self-assigned this Jun 18, 2018
@leebenson
Copy link

@nmengin, the logs in my comment above were displayed after setting Debug=true - are there other logs anywhere else?

I've deployed the domain in production now, so I can't switch back to staging at this stage. The above should hopefully be enough to diagnose the issue, unless there are other logs I'm not aware of... in which case, I could try again with a separate domain.

@dgorczyca
Copy link

dgorczyca commented Jun 22, 2018

@nmengin I am experiencing the same error (using digitalocean provider and bare metal kubernetes installation on digital ocean)
I am using traefik image v1.6.4
I adjusted my logLevel=debug, hope that provides you more info, here are the settings:

configuration
logLevel = "DEBUG"
# Force HTTPS
[entryPoints]
  [entryPoints.http]
  address = ":80"
    [entryPoints.http.redirect]
    entryPoint = "https"
  [entryPoints.https]
  address = ":443"
    [entryPoints.https.tls]
# Let's encrypt configuration
[acme]
email="admin@adomain.com"
storage="/etc/traefik/acme.json"
caServer = "https://acme-staging-v02.api.letsencrypt.org/directory"
acmeLogging=true
entryPoint="https"
[acme.dnsChallenge]
  provider = "digitalocean"
  delayBeforeCheck = 0
[[acme.domains]]
  main = "*.adomain.com"
  sans = ["adomain.com"]
log from the controller pod:
time="2018-06-22T00:12:45Z" level=info msg="Using TOML configuration file /config/traefik.toml"
time="2018-06-22T00:12:45Z" level=info msg="Traefik version v1.6.4 built on 2018-06-15_03:12:50PM"
time="2018-06-22T00:12:45Z" level=info msg="\nStats collection is disabled.\nHelp us improve Traefik by turning this feature on :)\nMore details on: https://docs.traefik.io/basics/#collected-data\n"
time="2018-06-22T00:12:45Z" level=debug msg="Global configuration loaded {\"LifeCycle\":{\"RequestAcceptGraceTimeout\":0,\"GraceTimeOut\":10000000000},\"GraceTimeOut\":0,\"Debug\":false,\"CheckNewVersion\":true,\"SendAnonymousUsage\":false,\"AccessLogsFile\":\"\",\"AccessLog\":null,\"TraefikLogsFile\":\"\",\"TraefikLog\":null,\"Tracing\":null,\"LogLevel\":\"DEBUG\",\"EntryPoints\":{\"http\":{\"Address\":\":80\",\"TLS\":null,\"Redirect\":{\"entryPoint\":\"https\"},\"Auth\":null,\"WhitelistSourceRange\":null,\"WhiteList\":null,\"Compress\":false,\"ProxyProtocol\":null,\"ForwardedHeaders\":{\"Insecure\":true,\"TrustedIPs\":null}},\"https\":{\"Address\":\":443\",\"TLS\":{\"MinVersion\":\"\",\"CipherSuites\":null,\"Certificates\":null,\"ClientCAFiles\":null,\"ClientCA\":{\"Files\":null,\"Optional\":false}},\"Redirect\":null,\"Auth\":null,\"WhitelistSourceRange\":null,\"WhiteList\":null,\"Compress\":false,\"ProxyProtocol\":null,\"ForwardedHeaders\":{\"Insecure\":true,\"TrustedIPs\":null}},\"traefik\":{\"Address\":\":8080\",\"TLS\":null,\"Redirect\":null,\"Auth\":null,\"WhitelistSourceRange\":null,\"WhiteList\":null,\"Compress\":false,\"ProxyProtocol\":null,\"ForwardedHeaders\":{\"Insecure\":true,\"TrustedIPs\":null}}},\"Cluster\":null,\"Constraints\":[],\"ACME\":null,\"DefaultEntryPoints\":[\"http\"],\"ProvidersThrottleDuration\":2000000000,\"MaxIdleConnsPerHost\":200,\"IdleTimeout\":0,\"InsecureSkipVerify\":false,\"RootCAs\":null,\"Retry\":null,\"HealthCheck\":{\"Interval\":30000000000},\"RespondingTimeouts\":null,\"ForwardingTimeouts\":null,\"AllowMinWeightZero\":false,\"Web\":null,\"Docker\":null,\"File\":null,\"Marathon\":null,\"Consul\":null,\"ConsulCatalog\":null,\"Etcd\":null,\"Zookeeper\":null,\"Boltdb\":null,\"Kubernetes\":{\"Watch\":true,\"Filename\":\"\",\"Constraints\":[],\"Trace\":false,\"TemplateVersion\":0,\"DebugLogGeneratedTemplate\":false,\"Endpoint\":\"\",\"Token\":\"\",\"CertAuthFilePath\":\"\",\"DisablePassHostHeaders\":false,\"EnablePassTLSCert\":false,\"Namespaces\":null,\"LabelSelector\":\"\",\"IngressClass\":\"\"},\"Mesos\":null,\"Eureka\":null,\"ECS\":null,\"Rancher\":null,\"DynamoDB\":null,\"ServiceFabric\":null,\"Rest\":null,\"API\":{\"EntryPoint\":\"traefik\",\"Dashboard\":true,\"Debug\":false,\"CurrentConfigurations\":null,\"Statistics\":null},\"Metrics\":null,\"Ping\":null}"
time="2018-06-22T00:12:45Z" level=info msg="Preparing server http &{Address::80 TLS:<nil> Redirect:0xc4203aaf00 Auth:<nil> WhitelistSourceRange:[] WhiteList:<nil> Compress:false ProxyProtocol:<nil> ForwardedHeaders:0xc4205d8480} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s"
time="2018-06-22T00:12:45Z" level=info msg="Preparing server https &{Address::443 TLS:0xc42038ed00 Redirect:<nil> Auth:<nil> WhitelistSourceRange:[] WhiteList:<nil> Compress:false ProxyProtocol:<nil> ForwardedHeaders:0xc4205d84a0} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s"
time="2018-06-22T00:12:45Z" level=info msg="Starting server on :80"
time="2018-06-22T00:12:45Z" level=info msg="Preparing server traefik &{Address::8080 TLS:<nil> Redirect:<nil> Auth:<nil> WhitelistSourceRange:[] WhiteList:<nil> Compress:false ProxyProtocol:<nil> ForwardedHeaders:0xc4205d84c0} with readTimeout=0s writeTimeout=0s idleTimeout=3m0s"
time="2018-06-22T00:12:45Z" level=info msg="Starting server on :443"
time="2018-06-22T00:12:45Z" level=info msg="Starting server on :8080"
time="2018-06-22T00:12:45Z" level=info msg="Starting provider configuration.providerAggregator {}"
time="2018-06-22T00:12:45Z" level=info msg="Starting provider *kubernetes.Provider {\"Watch\":true,\"Filename\":\"\",\"Constraints\":[],\"Trace\":false,\"TemplateVersion\":0,\"DebugLogGeneratedTemplate\":false,\"Endpoint\":\"\",\"Token\":\"\",\"CertAuthFilePath\":\"\",\"DisablePassHostHeaders\":false,\"EnablePassTLSCert\":false,\"Namespaces\":null,\"LabelSelector\":\"\",\"IngressClass\":\"\"}"
time="2018-06-22T00:12:45Z" level=debug msg="Using Ingress label selector: \"\""
time="2018-06-22T00:12:45Z" level=info msg="Starting provider *acme.Provider {\"Email\":\"admin@mydomain.com\",\"ACMELogging\":false,\"CAServer\":\"https://acme-staging-v02.api.letsencrypt.org/directory\",\"Storage\":\"/etc/traefik/acme.json\",\"EntryPoint\":\"https\",\"OnHostRule\":false,\"OnDemand\":false,\"DNSChallenge\":{\"Provider\":\"digitalocean\",\"DelayBeforeCheck\":0},\"HTTPChallenge\":null,\"Domains\":[{\"Main\":\"*.mydomain.com\",\"SANs\":[\"mydomain.com\"]}],\"Store\":{}}"
time="2018-06-22T00:12:45Z" level=info msg="ingress label selector is: \"\""
time="2018-06-22T00:12:45Z" level=info msg="Creating in-cluster Provider client"
time="2018-06-22T00:12:45Z" level=info msg="Testing certificate renew..."
time="2018-06-22T00:12:45Z" level=debug msg="Configuration received from provider ACME: {}"
time="2018-06-22T00:12:45Z" level=debug msg="Looking for provided certificate(s) to validate [\"*.mydomain.com\" \"mydomain.com\"]..."
time="2018-06-22T00:12:45Z" level=debug msg="Domains [\"*.mydomain.com\" \"mydomain.com\"] need ACME certificates generation for domains \"*.mydomain.com,mydomain.com\"."
time="2018-06-22T00:12:45Z" level=debug msg="Loading ACME certificates [*.mydomain.com mydomain.com]..."
time="2018-06-22T00:12:45Z" level=info msg="Server configuration reloaded on :80"
time="2018-06-22T00:12:45Z" level=info msg="Server configuration reloaded on :443"
time="2018-06-22T00:12:45Z" level=info msg="Server configuration reloaded on :8080"
time="2018-06-22T00:12:45Z" level=debug msg="Received Kubernetes event kind *v1.Service"
time="2018-06-22T00:12:45Z" level=debug msg="Configuration received from provider kubernetes: {\"backends\":{\"dev.mydomain.com/uaa\":{\"servers\":{\"authserver-5746466cb9-4r5pl\":{\"url\":\"http://10.244.1.127:8008\",\"weight\":1}},\"loadBalancer\":{\"method\":\"wrr\"}}},\"frontends\":{\"dev.mydomain.com/uaa\":{\"entryPoints\":[\"http\"],\"backend\":\"dev.mydomain.com/uaa\",\"routes\":{\"/uaa\":{\"rule\":\"PathPrefix:/uaa\"},\"dev.mydomain.com\":{\"rule\":\"Host:dev.mydomain.com\"}},\"passHostHeader\":true,\"priority\":0,\"basicAuth\":[]}}}"
time="2018-06-22T00:12:45Z" level=debug msg="Creating frontend dev.mydomain.com/uaa"
time="2018-06-22T00:12:45Z" level=debug msg="Wiring frontend dev.mydomain.com/uaa to entryPoint http"
time="2018-06-22T00:12:45Z" level=debug msg="Creating route /uaa PathPrefix:/uaa"
time="2018-06-22T00:12:45Z" level=debug msg="Creating route dev.mydomain.com Host:dev.mydomain.com"
time="2018-06-22T00:12:45Z" level=debug msg="Creating entry point redirect http -> https"
time="2018-06-22T00:12:45Z" level=debug msg="Creating backend dev.mydomain.com/uaa"
time="2018-06-22T00:12:45Z" level=debug msg="Creating load-balancer wrr"
time="2018-06-22T00:12:45Z" level=debug msg="Creating server authserver-5746466cb9-4r5pl at http://10.244.1.127:8008 with weight 1"
time="2018-06-22T00:12:45Z" level=info msg="Server configuration reloaded on :8080"
time="2018-06-22T00:12:45Z" level=info msg="Server configuration reloaded on :80"
time="2018-06-22T00:12:45Z" level=info msg="Server configuration reloaded on :443"
time="2018-06-22T00:12:45Z" level=debug msg="Received Kubernetes event kind *v1.Secret"
time="2018-06-22T00:12:45Z" level=debug msg="Skipping Kubernetes event kind *v1.Secret"
time="2018-06-22T00:12:45Z" level=debug msg="Received Kubernetes event kind *v1.Endpoints"
time="2018-06-22T00:12:45Z" level=debug msg="Skipping Kubernetes event kind *v1.Endpoints"
time="2018-06-22T00:12:47Z" level=debug msg="Building ACME client..."
time="2018-06-22T00:12:47Z" level=debug msg="https://acme-staging-v02.api.letsencrypt.org/directory"
time="2018-06-22T00:12:47Z" level=debug msg="Received Kubernetes event kind *v1.Endpoints"
time="2018-06-22T00:12:47Z" level=debug msg="Skipping Kubernetes event kind *v1.Endpoints"
time="2018-06-22T00:12:47Z" level=info msg=Register...
time="2018-06-22T00:12:48Z" level=debug msg="Using DNS Challenge provider: digitalocean"
time="2018-06-22T00:12:49Z" level=debug msg="Received Kubernetes event kind *v1.Endpoints"
...
time="2018-06-22T00:13:11Z" level=debug msg="Skipping Kubernetes event kind *v1.Endpoints"
time="2018-06-22T00:13:12Z" level=error msg="Unable to obtain ACME certificate for domains \"*.mydomain.com,mydomain.com\" : cannot obtain certificates: acme: Error -> One or more domains had a problem:\n[mydomain.com] acme: Error 403 - urn:ietf:params:acme:error:unauthorized - Incorrect TXT record \"HASH_HERE\" found at _acme-challenge.mydomain.com\n"
time="2018-06-22T00:13:13Z" level=debug msg="Received Kubernetes event kind *v1.Endpoints"
time="2018-06-22T00:13:13Z" level=debug msg="Skipping Kubernetes event kind *v1.Endpoints"

As a side note when I adjust acme settings to:

[[acme.domains]]
      main = "mydomain.com"
      sans = ["dev.mydomain.com"]

I am getting the certificate

@BrianSo
Copy link

BrianSo commented Jun 27, 2018

@nmengin I am experiencing the same error too.

Could you remove the waiting-for-feedback label? I think leebenson has given enough information.

@jjgraham
Copy link

@BrianSo Agree.
@nmengin This needs fixed. We are seeing the same problem

@ldez

This comment has been minimized.

@anatolinicolae

This comment has been minimized.

@nmengin
Copy link
Contributor

nmengin commented Jul 19, 2018

Many thanks for all these information.

As I said previously, there is a problem due to the TXT records name provided by ACME and the TTL fixed for these records in LEGO.

The best thing should be to change deeply the DNS challenge in LEGO to create all the TXT records before to let ACME trying to do the challenges.
But this solution is not possible.

We are currently discuss about another solution, maybe a workaround for this specific case (wildcard and root domain) in the way to implement it ASAP.

@nmengin nmengin added status/0-needs-triage kind/bug/confirmed a confirmed bug (reproducible). and removed kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. labels Jul 19, 2018
@dimm0
Copy link

dimm0 commented Jul 19, 2018

I just realized... We're using it in kubernetes, with several nodes getting the certs. Can it be because they try to get the cert simultaneously? What's the right way to do it for a DaemonSet?

@leebenson
Copy link

I don’t think that’s the issue (or the only issue, anyway); my setup was a single Pod and I got the error just the same.

@anatolinicolae
Copy link

anatolinicolae commented Jul 19, 2018

Can you guys try requesting the certs without SANs? I got a wildcard generated correctly when not specifying SANs... 🤔

i.e.:

[[acme.domains]]
      main = "*.mydomain.com"

@nmengin
Copy link
Contributor

nmengin commented Jul 20, 2018

Hello @anatolinicolae , as said previously, the issue comes from the TXT record names needed by ACME (the domain *.foo.com and the domain foo.com need the same TXT record name). That's why you can generate successfully the certificate if the root domain is not a SAN of the wildcard domain.

@dimm0 , as @leebenson said, it's not the only issue. If you need ACME by using multiple instances of Træfik, you may take a look to the way share configuration and ACME certificates between multiple instances of træfik here.

Note that, it's possible to generate a certifcate for a wildcard domain and its root domain but, the results are flaky. This flakyness is due to the different timeouts to manage and, sometimes, to the time for ACME to get a response...

For the moment, as a workaround, you can:

  • Set the wildcard domain and the root domain in 2 different certificates (not the best thing due to the rate limiting but, in lot of cases that can help)
  • Set the option acme.dnschallenge.provider to manual and check that the first TXT record has been cleared before to create the second one (thanks to a dig command for example).

Sure these solutions are not perfect but we are currenlty looking for a better one.
I'll give you some feedbacks in function of the advancement of the task.

@nmengin nmengin added priority/P1 need to be fixed in next release and removed status/0-needs-triage labels Jul 20, 2018
@BirkhoffLee
Copy link

Same here. Tried gcloud and dnsimple, neither of them worked.

traefik    | time="2018-07-26T07:58:54Z" level=error msg="Unable to obtain ACME certificate for domains \"*.birkhoff.me,birkhoff.me\" : cannot obtain certificates: acme: Error -> One or more domains had a problem:\n[birkhoff.me] acme: Error 403 - urn:ietf:params:acme:error:unauthorized - Incorrect TXT record \"YSRmYnyTWtFu3xrVXdTP7WY87VV16_QM-iR0P0HH0v8\" (and 1 more) found at _acme-challenge.birkhoff.me\n"

@traefiker
Copy link
Contributor

Closed by #3675.

@nmengin
Copy link
Contributor

nmengin commented Aug 3, 2018

Hello @anatolinicolae , @dimm0 , @leebenson , @BirkhoffLee .

Can you try the new Træfik release candidate (1.7-RC3) please?

Indeed, this RC contains the bugfix I made (#3675).

For information, I have discussed with Let's Encrypt developer about the problem in letsencrypt/boulder#3572.

@BirkhoffLee
Copy link

BirkhoffLee commented Aug 3, 2018

@nmengin I'm using dnsimple, traefik:v1.7.0-rc3 and got the following.

traefik    | time="2018-08-03T08:59:02Z" level=error msg="Error obtaining certificate retrying in 380.813265ms"
traefik    | time="2018-08-03T08:59:26Z" level=error msg="Error obtaining certificate retrying in 390.075615ms"
...
traefik    | time="2018-08-03T09:00:11Z" level=error msg="Error obtaining certificate: acme: Error -> One or more domains had a problem:\n[birkhoff.me] invalid KeyType: \n"
traefik    | time="2018-08-03T09:00:11Z" level=error msg="Unable to obtain ACME certificate for domains \"*.birkhoff.me,birkhoff.me\" : unable to generate a certificate for the domains [*.birkhoff.me birkhoff.me]: acme: Error -> One or more domains had a problem:\n[birkhoff.me] invalid KeyType: \n"

My env in docker-compose.yml looks like this:

environment:
  DNSIMPLE_OAUTH_TOKEN: "adadasdsadsadds"
  DNSIMPLE_BASE_URL: "https://api.dnsimple.com"

@nmengin
Copy link
Contributor

nmengin commented Aug 3, 2018

@BirkhoffLee

It seems to be related to the kind of keytype you have provided.

"KeyType used for generating certificate private key. Allow value 'EC256', 'EC384', 'RSA2048', 'RSA4096', 'RSA8192'. Default to 'RSA4096'"

But, the problem mentionned in the others comments is solved in 1.7.0-rc3 not rc1.

@BirkhoffLee
Copy link

Sorry, I was using rc3, that was a typo @nmengin

@BirkhoffLee
Copy link

BTW I didn't explicitly give a KeyType I think. This is my config:

[acme]
email = "[redacted]"
storage = "acme.json"
entryPoint = "https"
onHostRule = true
[acme.httpChallenge]
entryPoint = "http"
[acme.dnsChallenge]
  provider = "dnsimple"
  delayBeforeCheck = 0
[[acme.domains]]
  main = "*.birkhoff.me"
  sans = ["birkhoff.me"]

@nmengin
Copy link
Contributor

nmengin commented Aug 6, 2018

Hello @BirkhoffLee ,

Can you edit your acme.json file and look for the field KeyType please?

If it's possible for you, the quicker way to solve your problme may be to recreate the acme.json file, but, be careful, all the certificates contained will be re-generated.

Moreover, I guess you can delete the option acme.httpchallenge: you can not specify more than one challenge. In your example, only the DNChalllenge is taken in account.

@nmengin
Copy link
Contributor

nmengin commented Aug 6, 2018

@BirkhoffLee

I guess I found the problem, as described below, if you can reset your æcme.json file it can be a work-around.

If you cannot delete this file, you can add the field KeyType: "4096", instead of KeyType: "" in the file.

PR is coming soon.

@BirkhoffLee
Copy link

@nmandery I guess I'll stick with the bugfix, thanks for all the hard works!

@BirkhoffLee
Copy link

It's working now, thanks @nmengin ;)
By the way, is there any plan for supporting like when DNS challenge fails, fallback to http challenge?
For example, I have a.com and b.com, and have direct control of a.com but not b.com, and I want to host both of them on Traefik. If I configure DNS challenge for a.com, b.com won't be able to verify with DNS challenge.
Thanks again of the hard work.

@RRAlex
Copy link
Contributor

RRAlex commented Sep 6, 2018

So I'm still getting this with 1.7.0rc3 when I tried to add another domain to traefik.
The old ones already setup before the upgrade are working fine, but the new ones gives me the same error of the missing invalid KeyType.

This was merged after the 1.7.0rc3 release, so hopefully 1.7.0rc4 is coming out soon? :)

@ylbeethoven
Copy link

ylbeethoven commented Sep 25, 2018

Can anyone please tell me how you solved the issue?

Here is my traefik.toml

defaultEntryPoints = ["http","https"]
[entryPoints]
  [entryPoints.http]
  address = ":80"
    [entryPoints.http.redirect]
      entryPoint = "https"
  [entryPoints.https]
  address = ":443"
    [entryPoints.https.tls]
	
#Let's encrypt setup
[acme]
  email = "myemail@example.com"
  storage = "acme.json"
  entryPoint = "https"
    [acme.dnsChallenge]
    provider = "cloudflare" 
    delayBeforeCheck = 0
[[acme.domains]]
  main = "*.example.com"
  sans = ["example.com"]

I tested with Wildcard only, it was working fine.
I also tested it as

  main = "abc.example.com"
  sans = ["abc.example.com","def.example.com"]

and it was working fine too.

It is just the wildcard and SAN can not be put together.

I deleted acme.json and created the new file every time I run docker-compose up -d.

Can anyone please give me any advise? Thank you

@gitsf
Copy link

gitsf commented Sep 25, 2018

I set mine up back when this was still an issue, I havent tried it since this issue was fixed and closed.
Mine is still set with no sans and multiple main

so my config is like so:

[[acme.domains]]
  main = "*.example.com"
[[acme.domains]]
  main = "example.com"

I just removed my acme.json file and changed my config to be this style and it successfully repulled my certificates with the sans url

[[acme.domains]]
  main = "*.example.com"
  sans = ["example.com"]

So for me it worked both ways

Do you have any log info? Can you verify what version you are running.

@ylbeethoven
Copy link

ylbeethoven commented Sep 26, 2018

Hi gitsf,

Thanks for the reply.

Because you asked me to check what version i was using, I noticed that the default Traefik version was 1.6.6

Therefore, I changed my docker-compose.yml file to use 1.7 version.

  reverse-proxy:
    image: traefik:1.7 # The official Traefik docker image

With 1.7 version, it can pull certificate with SANs with no issue.

Thanks again for your help.

@gurumark

This comment has been minimized.

@ldez
Copy link
Contributor

ldez commented Sep 30, 2018

@gurumark Could you open a new issue?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/acme kind/bug/confirmed a confirmed bug (reproducible). priority/P1 need to be fixed in next release status/5-frozen-due-to-age
Projects
None yet
Development

No branches or pull requests