Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All hosts return error in Firefox Error Code: MOZILLA_PKIX_ERROR_SELF_SIGNED_CERT #693

Closed
arunk opened this issue Sep 16, 2020 · 21 comments
Closed
Labels
kind/failing-authorization Issue concerning failing ACME challenge

Comments

@arunk
Copy link

arunk commented Sep 16, 2020

First off thanks for this amazing piece of software, its been a big help for us running multiple sites on our server. I have installed jwilder/nginx-proxy and this jrcs/letsencrypt-nginx-proxy-companion on the server and have had it successfully create certificates for containers by adding LETSENCRYPT_HOST and LETSENCRYPT_EMAIL and VIRTUAL_HOST to containers.

But all of a sudden one of the containers that was created with these same values, started throwing an error on Firefox with Error Code: MOZILLA_PKIX_ERROR_SELF_SIGNED_CERT. So I ran a docker pull of the latest images of nginx-proxy and letsencrypt-nginx-proxy-companion and updated both the proxy container and the proxy companion container. And now all my containers that require HTTPS are down. They are all throwing this same error in Firefox and the equivalent error in Chrome.

The logs show for the proxy companion show some errors in generating certificates:

ERROR:simp_le:1417: CA marked some of the authorizations as invalid, which likely means it could not access http://example.com/.well-known/acme-challenge/X. Did you set correct path in -d example.com:path or --default_root? Are all your domains accessible from the internet? Please check your domains' DNS entries, your host's network/firewall setup and your webserver config. If a domain's DNS entry has both A and AAAA fields set up, some CAs such as Let's Encrypt will perform the challenge validation over IPv6. If your DNS provider does not answer correctly to CAA records request, Let's Encrypt won't issue a certificate for your domain (see https://letsencrypt.org/docs/caa/). Failing authorizations: https://acme-v02.api.letsencrypt.org/acme/authz-v3/7254435604, https://acme-v02.api.letsencrypt.org/acme/authz-v3/7254435607

But even for domains where certificate already exists I am seeing this error in the browser. Can you please tell me what the problem might be. Let me know if you would like me to post more of the server logs here.

Here is a check your site link for one of the hosts - https://check-your-website.server-daten.de/?q=docs.janastu.org

Thanks,
arun

@salus-sage
Copy link

salus-sage commented Sep 17, 2020

looks like the certificate subject name common name is incorrect, ex see below in browser where the parameter subject name -> Common Name letsencrypt-nginx-proxy-companion which should be the domain of the site

about:certificate?cert=MIIFOTCCAyGgAwIBAgIUYS50gGzaJrQ6YkvP08aD5C5aIw0wDQYJKoZIhvcNAQELBQAwLDEqMCgGA1UEAwwhbGV0c2VuY3J5cHQtbmdpbngtcHJveHktY29tcGFuaW9uMB4XDTIwMDMwNTEzNTc1M1oXDTIxMDMwNTEzNTc1M1owLDEqMCgGA1UEAwwhbGV0c2VuY3J5cHQtbmdpbngtcHJveHktY29tcGFuaW9uMIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAsSdP9ZwzOWBQ1T%2B1c%2BAnQ9EDLGhyWfaTd6mT4B1qEX93jT18mMB7lbEUQOWYzOSQI5Z%2FuoQLF1bmgdeLX7ndk5xbIIiPMSZiWIAXVjLNo84XWaii%2FYHktT2E%2FuX1%2FmHBgUE27666oeMJrGzk3zCzEKRvKqfiQH%2B2hl14pU9ANdTTLuoPGwf%2BnCNuN%2FlTggSUC1l1oS5q7%2FtgGQlRD%2FOagKQ44%2FS2quehjMqQtYqZ2aET6vEvIdnVREnTqPi%2Bi4uxpviCkAL7LJQMoYS2tNE1FLWtl7OzjE28VWxF1Pp2BH7bdvqQztmnx6cdGNzaHGHjuVSumyL5NgLP9vc%2Fog%2FrOJkiH9FwRpfWo5OQWhZQArJPneVlCmkY5SVMS%2BE6EzhCNBAXJ0QaqHyGkb3r%2Fp9g0CJuR83lRXBTrmky%2BxvnevScqwySICEy5VvpWZot3%2FdnzivOUCoww0qqWABXsdK%2BXucm%2BUQNVt%2FxNq7z3%2FzcIp4tQBXzVC8rsJmxomBxZlJOUP2n%2BbWlBExxN0gqRCvXzqq1G5vdak0EJu6Wi7Tlvs7UOqcFog7AErrerIjMyfXLsUyhiBUVKB1KzF%2Fa8Z94twfUpSWhjnorYJbHc1TbG79YzgLpTLTHgwAINsyDVIGYXiTplT4Hk8GARsXsrW%2B8AMuvyRu4B6SVCGQMc9tcoLsCAwEAAaNTMFEwHQYDVR0OBBYEFMyjygVYCU2aBx2WwVjJ%2FqSIJimvMB8GA1UdIwQYMBaAFMyjygVYCU2aBx2WwVjJ%2FqSIJimvMA8GA1UdEwEB%2FwQFMAMBAf8wDQYJKoZIhvcNAQELBQADggIBAFRQ8yNMOYEMjsncs2QMwvCvpDVesrMTHnDsEid%2FhlU2fuSMaZ8KKeBFPnCN4gdy9xGr4FkG7TH47DMhy%2FBP3aMP%2FGiWSZDU7k3QUwuY2ityVfXCyKKMjj2EdLucHNnAoylrp5%2Bsr%2FQZ8OGrrc6BEVXT06py62VkCjGO96Jvno85U5jy%2FZzEFB934ZhVRwmDRLB0AR2hggNZ5h5S5Ha8PF4liTjoWiPHSWwv73dkB0QZcO21hvbC1pwBzOTOkyvimutC3sB6Uq5nkzqARabbv%2BkVzPnDALKNsmOIBHVH7Yf4DGC0CxpgCMbvY3XfQ1Rl%2F6b%2F4xJHAK0CyuQO9%2FXa8OX%2F2yC7PWJsmeqsBkIyJeMsw4NieMbunKRSEdVEZz%2BofSoom8OQ6oIvyyDGAJlslUe7iE93wGOQ2H4ampqMhTS%2B4DEuaGnsOznV5%2BTlwkntpQbZll%2B3jSqPRxtn7A56CN%2B1acZaBl59btCbloWEgyWnR2m7KTt7SKNNyvE3EQ597yUMjcJS36BFzsB0J7EsgGf3G9kJVwf1FZ7ugFecdJQl%2B65IzcSIcgcXzzereDpTOBwaPVjY4mVmQaUKKy0Q0RNQiWVMOH%2B%2FxNR4pa6obcQBqbM4rw8MD8kJKKuovvdcBP8RUqivEraOhJH68GvRZNB%2Fvd3YpJMqETFPSilAx1Xg

@buchdag buchdag added the kind/failing-authorization Issue concerning failing ACME challenge label Sep 17, 2020
@salus-sage
Copy link

My bad, the valid certificate in the setup is not being recognized and the default certificate is being served, but how to debug this?

@seljuck
Copy link

seljuck commented Sep 27, 2020

I've run into this issue as well. After rebuilding my containers on 9/26/20, I get the same error message. Everything was working fine prior to the update. I've not change and settings or modified any configurations.

@buchdag
Copy link
Member

buchdag commented Sep 27, 2020

@arunk what is happening is your cert can't be issued because of a failing authorisation. If you can't get a cert issued for a given domain, nginx-proxy will instead serve the default certificate, which is a self signed certificate with letsencrypt-nginx-proxy-companion as its Common Name.

You can start there to troubleshoot why this authorisation might be failing : https://github.com/nginx-proxy/docker-letsencrypt-nginx-proxy-companion/blob/master/docs/Invalid-authorizations.md

@seljuck
Copy link

seljuck commented Sep 27, 2020

I just tried renaming the default.crt, and default.key files to .old and restarted the containers and all is working now. It created new default.crt and new default.key files.

@buchdag
Copy link
Member

buchdag commented Sep 28, 2020

@seljuck what fixed your issue most probably wasn't the renaming and automatic re-creation of the default key / cert but rather the containers restart 😕

@arunk
Copy link
Author

arunk commented Sep 30, 2020

@buchdag I set the environment variable DEBUG=true on the proxy companion and got some information about why some of the certificates are failing. There are some configuration errors, such as some domains being mentioned in the LETSENCRYPT_HOST but not mentioned in VIRTUAL_HOST. I have fixed those issues. But there are some certificates which are failing for unknown reason. There is no mention of them in the log even with DEBUG=true. But when I see the domains they are mentioned in /app/letsencrypt_service_data in the generated file. But when the letsencrypt_service update_cert runs, it doesn't appear to do anything with the domain. I see Symlinked domains, Enabled domains and Disabled domains once when the container is started but I don't see it being called after that. The mechanism for /app/letsencrypt_service is that it runs update_certs then waits for one hour (3600 seconds) then runs the script again right? I'm wondering how to debug these missing domains. Thanks for your help.

@buchdag
Copy link
Member

buchdag commented Sep 30, 2020

@arunk could you post your whole config, either command line or Docker compose file(s), for nginx-proxy (or nginx + docker-gen) the companion and you proxied containers ?

The mechanism for /app/letsencrypt_service is that it runs update_certs then waits for one hour (3600 seconds) then runs the script again right?

Yep.

@arunk
Copy link
Author

arunk commented Sep 30, 2020

@buchdag here are the docker-compose.yml files.
There is services-docker-compose.yml which contains the configuration for the nginx-proxy and letsencrypt proxy companion, and static-docker-compose.yml which is a short version of our production containers that uses these proxy and proxy companion. There are multiple containers shown that work, but the one that doesn't work is also shown. i have made notes in the comments. The certs for this one that doesn't work, never get created.

github-issue-693.zip

@arunk
Copy link
Author

arunk commented Oct 1, 2020

@buchdag I found an update to the nginx proxy companion and updated it. But now I find CPU usage for the letsencrypt_service is very high. It's consistently hitting 100% CPU usage and is using 2GB of RAM. This wasn't the case before, has some recent update changed how this works? High CPU usage isn't for 5 minutes or whatever when certs are being generated, but its high throughout. And both the letsencrypt_service processes run high CPU usage, but only the 2nd one has 2GB of RAM usage.

@masilver99
Copy link

I just tried renaming the default.crt, and default.key files to .old and restarted the containers and all is working now. It created new default.crt and new default.key files.

FYI, this worked for me as well. Restarting the containers didn't.

...Michael...

@salus-sage
Copy link

I just tried renaming the default.crt, and default.key files to .old and restarted the containers and all is working now. It created new default.crt and new default.key files.

FYI, this worked for me as well. Restarting the containers didn't.

...Michael...

This might probably work if it's a single site environment, but @arunk has got multi site production environment, with different domain names, so this won't cut the deal.

@buchdag
Copy link
Member

buchdag commented Oct 7, 2020

But now I find CPU usage for the letsencrypt_service is very high. It's consistently hitting 100% CPU usage and is using 2GB of RAM.

@arunk It should absolutely not happen unless the bash script got trapped in an infinite loop. Have you identified which process exactly is consuming the ressources ?

@arunk
Copy link
Author

arunk commented Oct 8, 2020

@buchdag it is the process - /bin/bash /app/letsencrypt_service .

@buchdag
Copy link
Member

buchdag commented Oct 8, 2020

The

volumes_from:
      - proxy

On your letsencrypt container definition is of no use as you are using the label method (by the way the correct label is com.github.jrcs.letsencrypt_nginx_proxy_companion.nginx_proxy without =true at the end) and re specifying the volumes anyway.

What version of jwilder/nginx-proxy are you using ? If you are using latest, when was it pulled ?

Can you use the v1.13 tagged version of the companion instead of latest ? If you still hit the ressource consumption issue, try using v1.12 instead.

Could you provide both container logs up to the point it starts consuming 100% CPU so I can check if there is an apparent loop ?

I'm assuming you are running this on amd64 arch ?

@buchdag
Copy link
Member

buchdag commented Oct 8, 2020

@seljuck @masilver99 the companion container automatically generates a default key and self signed certificate pair for nginx-proxy to use.

useful read on that subject : #529

If a requested certificate creation fails for whatever reason (99 % of the time an ACME authorisation failure) the proxy will serve this default certificate instead of the intended one. Renaming or deleting the default key and certificate will trigger the generation again on the next container startup but shouldn't do a thing for your non issued cert / failed authorisation unless we have a very weird and unidentified race condition with nginx.

If doing the former appears to fix the later, chances of it being coincidental are very high. I could be wrong though but I really don't see yet how the two could be tied.

@arunk
Copy link
Author

arunk commented Oct 12, 2020

@buchdag using proxy companion v1.13 has solved a lot of the problems, though a few cases still remain. I'm investigating what the issue is with these few remaining ones. Anyway in the meantime, here is the log of the proxy and letsencrypt companion containers after restarting them, attached to the issue. This is using the latest version of companion.
gh-issue.zip

@buchdag
Copy link
Member

buchdag commented Oct 12, 2020

You are getting a pile of error from nginx-proxy:

[36mproxy_1        |[0m [0;33;1mnginx.1    | [0;31;1m2020/10/12 15:48:58 [warn] 32#32: could not build optimal server_names_hash, you should increase either server_names_hash_max_size: 512 or server_names_hash_bucket_size: 128; ignoring server_names_hash_bucket_size
[36mproxy_1        |[0m [0m[0;33;1mnginx.1    | [0;31;1m2020/10/12 15:48:58 [warn] 32#32: could not build optimal server_names_hash, you should increase either server_names_hash_max_size: 512 or server_names_hash_bucket_size: 128; ignoring server_names_hash_bucket_size
[36mproxy_1        |[0m [0m[0;33;1mnginx.1    | [0;31;1m2020/10/12 15:48:58 [emerg] 107#107: io_setup() failed (11: Resource temporarily unavailable)
[36mproxy_1        |[0m [0m[0;33;1mnginx.1    | [0;31;1m2020/10/12 15:48:58 [emerg] 108#108: io_setup() failed (11: Resource temporarily unavailable)
[36mproxy_1        |[0m [0m[0;33;1mnginx.1    | [0;31;1m2020/10/12 15:48:58 [emerg] 109#109: io_setup() failed (11: Resource temporarily unavailable)
[36mproxy_1        |[0m [0m[0;33;1mnginx.1    | [0;31;1m2020/10/12 15:48:58 [emerg] 106#106: io_setup() failed (11: Resource temporarily unavailable)
[36mproxy_1        |[0m [0m[0;33;1mnginx.1    | [0;31;1m2020/10/12 15:48:58 [emerg] 110#110: io_setup() failed (11: Resource temporarily unavailable)
[36mproxy_1        |[0m [0m[0;33;1mnginx.1    | [0;31;1m2020/10/12 15:48:58 [warn] 32#32: server name "iaw2020.milli.link/iaw2020" has suspicious symbols in /etc/nginx/conf.d/default.conf:1925
[36mproxy_1        |[0m [0m[0;33;1mnginx.1    | [0;31;1m2020/10/12 15:48:58 [warn] 32#32: server name "iaw2020.milli.link/iaw2020" has suspicious symbols in /etc/nginx/conf.d/default.conf:1934
[36mproxy_1        |[0m [0m[0;33;1mnginx.1    | [0;31;1m2020/10/12 15:48:58 [warn] 32#32: server name "www.milli.link/iaw2020" has suspicious symbols in /etc/nginx/conf.d/default.conf:6372
[36mproxy_1        |[0m [0m[0;33;1mnginx.1    | [0;31;1m2020/10/12 15:48:58 [warn] 32#32: server name "www.milli.link/iaw2020" has suspicious symbols in /etc/nginx/conf.d/default.conf:6381
[36mproxy_1        |[0m [0m[0;33;1mnginx.1    | [0;31;1m2020/10/12 15:48:58 [emerg] 32#32: cannot load certificate "/etc/nginx/certs/360.pantoto.net.crt": BIO_new_file() failed (SSL: error:02001002:system library:fopen:No such file or directory:fopen('/etc/nginx/certs/360.pantoto.net.crt','r') error:2006D080:BIO routines:BIO_new_file:no such file)

And this goes on and on. Your nginx-proxy container clearly isn't working properly.

@arunk
Copy link
Author

arunk commented Oct 13, 2020

@buchdag any thoughts on what is causing the errors? the warnings about server name we need to fix, someone has added path's to the domain which is causing the warning. i'm not sure if that's the cause for the errors as well. i'll try and fix that and see if the error persists.

@buchdag
Copy link
Member

buchdag commented Oct 13, 2020

any thoughts on what is causing the errors?

No idea, searches on nginx-proxy issues didn't return anything.

@buchdag
Copy link
Member

buchdag commented May 16, 2021

Inactive issue, closing.

@buchdag buchdag closed this as completed May 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/failing-authorization Issue concerning failing ACME challenge
Projects
None yet
Development

No branches or pull requests

5 participants