-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Self check always fail #863
Comments
It happens the same to me. I'm setting a HA cluster and this is blocking us from moving the apps. We have used the helm package for installing it. Is there any workaround that could help us to continue deploying our infrastructure? |
Fixed in my case. The problem was that the nginx configuration in the load balancer was redirecting connections to port 80 to 443. |
the same here. Then i have my kubernetes cluster with ingress-nginx controller configured like this: apiVersion: v1
This way when i use cert-manager to ger my cert, i have always a self check error (by the way, all acme challenge are checked if i do it manually inside and outside the cluster. if i change my dns entry for one of the kubernetes nodes public ip, all is good and the certificate is issuing (but this is a big SPOF if the node where is the dns entry is going down) |
Issues go stale after 90d of inactivity. |
The same happens here, but using DNAT on public IP to internal MetalLB load balance configuration. |
I found out that the problem was that the cluster wasn't able to resolve the DNS. I solved that and it worked. |
Solve this myself too after a long time of messing about. Self-check is kinda tricky on your network confirmation. The certificate-mgr resolver tries to connect to itself to verify LetsEncrypt can access data at .well-known/acme-challenge/. This is often deceptively complicated in many networks. It requires the resolver being able to connect to itself using what would often resolve to a public IP address. Do a wget/curl to the .well-known/acme-challenge to see if it succeeds from the resolver container. In my case, I had to setup hairpin NAT at the router. Is it a good idea to optionally skip self-check? |
I'm going to close this issue out as it seems to be more related to network configuration than anything else. Let's Encrypt needs to be able to access your Ingress controller on port 80 in order to validate challenges, and exposing your ingress controller to the public internet (either via a LoadBalancer service or a NodePort) is outside the scope of cert-manager itself. We just need port 80 to work 😄 |
Port 80 isn't the issue, that's a given. The IP address is though. All installations behind NAT is likely going to fail without hairpin config. If not allow self-check be disabled, maybe mention it in docs? |
I guess this means "Cloudflare Always Use HTTPS" was causing this for me. Perhaps something about requiring port 80 and HTTP access to the domain here would be good: https://docs.cert-manager.io/en/latest/getting-started/troubleshooting.html |
Same issue here. I would like to disable self-check or provide the ip address of the loadbalancer because of hairpinning |
The problem is in Kubernetes networking if you use LoadBalancer that is provided by the hosting. I use DigitalOcean. Kubernetes is not routing network through LB public interface so there is no adding PROXY protocol header or SSL if you are setting it outside Kubernetes. I use PROXY protocol and the moment when I enable it and update Nginx to handle it everything works but cert-manager fails as it is trying to connect to public domain name and that fails. It works from my computer as I am outside and LB is adding needed headers, but not from within the cluster. Cert-manager is not guilty for this, but if we can add some switches where we can instruct validator to add PROXY protocol or to disable validation for that domain it would help a lot. For curl if I do (from inside the cluster):
it fails. If I do (from inside the cluster):
it works. |
I was informed by DigitalOcean team that there is a fix for this behavior. They added an additional annotation to nxinx-ingress controller service that forces Kubernetes to use domain name of public IP instead of IP and that tricks Kubernetes to think that it is not "ours" and routes network around through LB. https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/master/docs/controllers/services/examples/README.md#accessing-pods-over-a-managed-load-balancer-from-inside-the-cluster
|
@MichaelOrtho Hi, do you know if a similar workaround exists for Scaleway? I am testing their managed Kubernetes and am having the same problem. Thanks |
@vitobotta I have found on Scaleway you need to restart coredns and it will usually succeed. |
@AlexsJones Not for me. I had to add the annotation below
|
After changing Reason being, pod with the certificate-issuer wound up on a different node than the load balancer did, so it couldn’t talk to itself through the ingress. |
Hi all, I ran into the same issue. I've recently published It uses CoreDNS rewriting to intercept traffic that would be heading toward the external load balancer. It then adds a PROXY line to requests originating from within the cluster. This allows cert-manager's self-check to pass. |
@munnerz I think you misunderstood the problem here. You wrote:
The problem is not that Let's Encrypt can't reach the LoadBalancer... the problem is that certificate manager self-check can't reach it. The connection from LE to the LoadBalancer is fine, due to Destination NAT. The certificate manager inside the cluster how ever tries to resolve the domain name with the external IP and this will fail in DNAT scenarios. @munnerz there is already a whole project just for fixing this issue. Is there really no option to just disable self-checks? |
Here is another possible solution: You can use coredns for broadcasting wrong DNS records. Just create host aliases for the domains and link them to the internal cluster IPs. Then propagate these host/IP tuples via:
in your coredns config. This way you can use the internal IP addresses inside of your cluster. You just have to maintain another list (or you might just automate this via a custom operator or script). |
In DNAT Scenarios just set externalIP of a ingress Service to your external IP Addresses.
kubernetes, configured with iptables, mostly standard setup, $ sudo iptables-save | grep 11.22.33.44
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:http external IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:http external IP" -m tcp --dport 80 -m physdev ! --physdev-is-in -m addrtype ! --src-type LOCAL -j KUBE-SVC-VMPDTJD5TKOUD6KL
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:http external IP" -m tcp --dport 80 -m addrtype --dst-type LOCAL -j KUBE-SVC-VMPDTJD5TKOUD6KL
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:https external IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:https external IP" -m tcp --dport 443 -m physdev ! --physdev-is-in -m addrtype ! --src-type LOCAL -j KUBE-SVC-SUC36V4R4VKNMIWK
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:https external IP" -m tcp --dport 443 -m addrtype --dst-type LOCAL -j KUBE-SVC-SUC36V4R4VKNMIWK |
Switching from |
I was stuck with this selfcheck issue for the longest time. Issue was: I just came back to say thank you to @ptjhuang for the solution on setting hairpin on the gateway/router. I wanted to just let people know that this worked for me since I felt lost for the longest time. Hope this inspires others to try this solution |
As @vitobotta points out, but with lack of context, for cert-manager running in a Scaleway Kubernetes cluster
This annotations should be applied to the
If you're configuring ingress-nginx with Helm, you can set the value |
Describe the bug:
Unable to pass "self check" when Ingress Service is using NodePort and public IP is on HA proxy (tcp mode) outside the Kubernetes cluster. We can simulate the test from cert-manger container (kubectl exec) using curl (fetching /.well-known/...), which is successful. The same applies from outside the cluster.
Logs:
We replaced real domain name in this bug report for www.example.com
The cert-manager is working only when public IP is on Kubernetes cluster and Ingress Service is using LoadBalancer method.
Expected behaviour:
self check to pass with NodePort on Ingress Service
Steps to reproduce the bug:
Anything else we need to know?:
It is not clear to us, what exactly the self check is expecting to find, because the fetch of /well-known key is successful (confirmed via wireshark), but the self check is running again and again and still failing. Some more details about the reason of fail would be great.
Wireshark captured data - request from Cluster Node to HA proxy:
Environment details::
/kind bug
The text was updated successfully, but these errors were encountered: