Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io" #5401

Closed
aduncmj opened this issue Apr 19, 2020 · 177 comments · Fixed by #5445
Closed

Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io" #5401

aduncmj opened this issue Apr 19, 2020 · 177 comments · Fixed by #5445
Labels
triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@aduncmj
Copy link

aduncmj commented Apr 19, 2020

Hi all,

When I apply the ingress's configuration file named ingress-myapp.yaml by command kubectl apply -f ingress-myapp.yaml, there was an error. The complete error is as follows:

Error from server (InternalError): error when creating "ingress-myapp.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://ingress-nginx-controller-admission.ingress-nginx.svc:443/extensions/v1beta1/ingresses?timeout=30s: context deadline exceeded

This is my ingress:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-myapp
  namespace: default
  annotations: 
    kubernetes.io/ingress.class: "nginx"
spec:
  rules: 
  - host: myapp.magedu.com
    http:
      paths:
      - path: 
        backend: 
          serviceName: myapp
          servicePort: 80

Has anyone encountered this problem?

@aduncmj aduncmj added the kind/support Categorizes issue or PR as a support question. label Apr 19, 2020
@moljor
Copy link

moljor commented Apr 21, 2020

Hi,

I have.

The validatingwebhook service is not reachable in my private GKE cluster. I needed to open the 8443 port from the master to the pods.
On top of that, I then received a certificate error on the endpoint "x509: certificate signed by unknown authority". To fix this, I needed to include the caBundle from the generated secret in the validatingwebhookconfiguration.

A quick fix if you don't want to do the above and have the webhook fully operational is to remove the validatingwebhookconfiguration or setting the failurePolicy to Ignore.

I believe some fixes are needed in the deploy/static/provider/cloud/deploy.yaml as the webhooks will not always work out of the box.

@moljor
Copy link

moljor commented Apr 21, 2020

A quick update on the above, the certificate error should be managed by the patch job that exists in the deployment so that part should be a non-issue.
Only the port 8443 needed to be opened from master to pods for me.

@Cspellz
Copy link

Cspellz commented Apr 22, 2020

A quick update on the above, the certificate error should be managed by the patch job that exists in the deployment so that part should be a non-issue.
Only the port 8443 needed to be opened from master to pods for me.

Hi, I am a beginner in setting a k8s and ingress.
I am facing a similar issue. But more in a baremetal scenario. It would be very grateful if you can please share more details on what you mean by 'opening a port between master and pods'?

Update:
sorry, as I said, I am new to this. I checked there is a service (ingress-nginx-controller-admission) which is exposed to node 433 running from the ingress-nginx namespace. And for some reason my ingress resource trying to run from default namespace is not able to communicate to it. Please suggest on how I can resolve this.

error is :
Error from server (InternalError): error when creating "test-nginx-ingress.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://ingress-nginx-controller-admission.ingress-nginx.svc:443/extensions/v1beta1/ingresses?timeout=30s: context deadline exceeded

@johan-lejdung
Copy link

I'm also facing this issue, on a fresh cluster from AWS where I only did

helm install nginx-ing ingress-nginx/ingress-nginx --set rbac.create=true

And deployed a react service (which I can port-forward to and it works fine).

I then tried to apply both my own ingress and the example ingress

  apiVersion: networking.k8s.io/v1beta1
  kind: Ingress
  metadata:
    annotations:
      kubernetes.io/ingress.class: nginx
    name: example
    namespace: foo
  spec:
    rules:
      - host: www.example.com
        http:
          paths:
            - backend:
                serviceName: exampleService
                servicePort: 80
              path: /

I'm getting this error:

Error from server (InternalError): error when creating "k8s/ingress/test.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://nginx-ing-ingress-nginx-controller-admission.default.svc:443/extensions/v1beta1/ingresses?timeout=30s: stream error: stream ID 7; INTERNAL_ERROR

I traced it down to this loc by looking at the logs in the controller:
https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/controller.go#L532

Logs:

I0427 11:52:35.894902       6 server.go:61] handling admission controller request /extensions/v1beta1/ingresses?timeout=30s
2020/04/27 11:52:35 http2: panic serving 172.31.16.27:39304: runtime error: invalid memory address or nil pointer dereference
goroutine 2514 [running]:
net/http.(*http2serverConn).runHandler.func1(0xc00000f2c0, 0xc0009a9f8e, 0xc000981980)
	/home/ubuntu/.gimme/versions/go1.14.2.linux.amd64/src/net/http/h2_bundle.go:5713 +0x16b
panic(0x1662d00, 0x27c34c0)
	/home/ubuntu/.gimme/versions/go1.14.2.linux.amd64/src/runtime/panic.go:969 +0x166
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).getBackendServers(0xc000119a40, 0xc00000f308, 0x1, 0x1, 0x187c833, 0x1b, 0x185e388, 0x0, 0x185e388, 0x0)
	/tmp/go/src/k8s.io/ingress-nginx/internal/ingress/controller/controller.go:532 +0x6d2
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).getConfiguration(0xc000119a40, 0xc00000f308, 0x1, 0x1, 0x1, 0xc00000f308, 0x0, 0x1, 0x0)
	/tmp/go/src/k8s.io/ingress-nginx/internal/ingress/controller/controller.go:402 +0x80
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).CheckIngress(0xc000119a40, 0xc000bfc300, 0x50a, 0x580)
	/tmp/go/src/k8s.io/ingress-nginx/internal/ingress/controller/controller.go:228 +0x2c9
k8s.io/ingress-nginx/internal/admission/controller.(*IngressAdmission).HandleAdmission(0xc0002d4fb0, 0xc000943080, 0x7f8ffce8b1b8, 0xc000942ff0)
	/tmp/go/src/k8s.io/ingress-nginx/internal/admission/controller/main.go:73 +0x924
k8s.io/ingress-nginx/internal/admission/controller.(*AdmissionControllerServer).ServeHTTP(0xc000219820, 0x1b05080, 0xc00000f2c0, 0xc000457d00)
	/tmp/go/src/k8s.io/ingress-nginx/internal/admission/controller/server.go:70 +0x229
net/http.serverHandler.ServeHTTP(0xc000119ce0, 0x1b05080, 0xc00000f2c0, 0xc000457d00)
	/home/ubuntu/.gimme/versions/go1.14.2.linux.amd64/src/net/http/server.go:2807 +0xa3
net/http.initALPNRequest.ServeHTTP(0x1b07440, 0xc00067f170, 0xc0002dc700, 0xc000119ce0, 0x1b05080, 0xc00000f2c0, 0xc000457d00)
	/home/ubuntu/.gimme/versions/go1.14.2.linux.amd64/src/net/http/server.go:3381 +0x8d
net/http.(*http2serverConn).runHandler(0xc000981980, 0xc00000f2c0, 0xc000457d00, 0xc000a81480)
	/home/ubuntu/.gimme/versions/go1.14.2.linux.amd64/src/net/http/h2_bundle.go:5720 +0x8b
created by net/http.(*http2serverConn).processHeaders
	/home/ubuntu/.gimme/versions/go1.14.2.linux.amd64/src/net/http/h2_bundle.go:5454 +0x4e1

Any ideas? Seems strange to get this on a newly setup cluster where I followed the instructions correctly.

@johan-lejdung
Copy link

I might have solved it..

I followed this guide for the helm installation: https://kubernetes.github.io/ingress-nginx/deploy/

But when I followed this guide instead: https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-helm/

The error doesn't occur.

If you have this issue try it out by deleting your current helm installation.

Get the name:

helm list

Delete and apply stable release:

helm delete <release-name>
helm repo add nginx-stable https://helm.nginx.com/stable
helm install nginx-ing nginx-stable/nginx-ingress

@aledbf
Copy link
Member

aledbf commented Apr 27, 2020

@johan-lejdung not really, that is a different ingress controller.

@s977120
Copy link

s977120 commented May 1, 2020

@aledbf I use 0.31.1 still has same problem

bash-5.0$ /nginx-ingress-controller --version
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       0.31.1
  Build:         git-b68839118
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.17.10

-------------------------------------------------------------------------------

Error: UPGRADE FAILED: failed to create resource: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://ingress-nginx-controller-admission.ingress-nginx.svc:443/extensions/v1beta1/ingresses?timeout=30s: context deadline exceeded

@nicholaspier
Copy link

@aledbf Same error. Bare-metal installation.

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       0.31.1
  Build:         git-b68839118
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.17.10

-------------------------------------------------------------------------------

Error from server (InternalError): error when creating "./**ommitted**.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://ingress-nginx-controller-admission.ingress-nginx.svc:443/extensions/v1beta1/ingresses?timeout=30s: context deadline exceeded

@aledbf
Copy link
Member

aledbf commented May 1, 2020

I added a note about the webhook port in https://kubernetes.github.io/ingress-nginx/deploy/ and the links for the additional steps in GKE

@AbbetWang
Copy link

AbbetWang commented May 4, 2020

i still have the problem

update

i disable the webhook, the error go away

fix workaround

helm install my-release ingress-nginx/ingress-nginx
--set controller.service.type=NodePort
--set controller.admissionWebhooks.enabled=false

Caution!!!! it's may not resolve the issue properly.

now status

  • use helm 3
    helm install my-release ingress-nginx/ingress-nginx
    --set controller.service.type=NodePort

exec kubectl get svc,pods

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/a-service ClusterIP 10.105.159.98 80/TCP 28h
service/b-service ClusterIP 10.106.17.65 80/TCP 28h
service/kubernetes ClusterIP 10.96.0.1 443/TCP 3d4h
service/my-release-ingress-nginx-controller NodePort 10.97.224.8 80:30684/TCP,443:32294/TCP 111m
service/my-release-ingress-nginx-controller-admission ClusterIP 10.101.44.242 443/TCP 111m

NAME READY STATUS RESTARTS AGE
pod/a-deployment-84dcd8bbcc-tgp6d 1/1 Running 0 28h
pod/b-deployment-f649cd86d-7ss9f 1/1 Running 0 28h
pod/configmap-pod 1/1 Running 0 54m
pod/configmap-pod-1 1/1 Running 0 3h33m
pod/my-release-ingress-nginx-controller-7859896977-bfrxp 1/1 Running 0 111m
pod/redis 1/1 Running 1 6h11m
pod/test 1/1 Running 1 5h9m

my ingress.yaml

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
name: example

namespace: foo

spec:
rules:
- host: b.abbetwang.top
http:
paths:
- path: /b
backend:
serviceName: b-service
servicePort: 80
- path: /a
backend:
serviceName: a-service
servicePort: 80

tls:
- hosts:
- b.abbetwang.top

what I Do

when i run kubectl apply -f new-ingress.yaml
i got Failed calling webhook, failing closed validate.nginx.ingress.kubernetes.io:

my apiserver log blow:

I0504 06:22:13.286582 1 trace.go:116] Trace[1725513257]: "Create" url:/apis/networking.k8s.io/v1beta1/namespaces/default/ingresses,user-agent:kubectl/v1.18.2 (linux/amd64) kubernetes/52c56ce,client:192.168.0.133 (started: 2020-05-04 06:21:43.285686113 +0000 UTC m=+59612.475819043) (total time: 30.000880829s):
Trace[1725513257]: [30.000880829s] [30.000785964s] END
W0504 09:21:19.861015 1 watcher.go:199] watch chan error: etcdserver: mvcc: required revision has been compacted
W0504 09:31:49.897548 1 watcher.go:199] watch chan error: etcdserver: mvcc: required revision has been compacted
I0504 09:36:17.637753 1 trace.go:116] Trace[615862040]: "Call validating webhook" configuration:my-release-ingress-nginx-admission,webhook:validate.nginx.ingress.kubernetes.io,resource:networking.k8s.io/v1beta1, Resource=ingresses,subresource:,operation:CREATE,UID:41f47c75-9ce1-49c0-a898-4022dbc0d7a1 (started: 2020-05-04 09:35:47.637591858 +0000 UTC m=+71256.827724854) (total time: 30.000128816s):
Trace[615862040]: [30.000128816s] [30.000128816s] END
W0504 09:36:17.637774 1 dispatcher.go:133] Failed calling webhook, failing closed validate.nginx.ingress.kubernetes.io: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://my-release-ingress-nginx-controller-admission.default.svc:443/extensions/v1beta1/ingresses?timeout=30s: context deadline exceeded

@eltonbfw
Copy link

eltonbfw commented May 5, 2020

Why close this issue? What is the solution?

@aledbf
Copy link
Member

aledbf commented May 5, 2020

@eltonbfw update to 0.32.0 and make sure the API server can reach the POD running the ingress controller

@cnlong
Copy link

cnlong commented May 15, 2020

@eltonbfw update to 0.32.0 and make sure the API server can reach the POD running the ingress controller

I have the same problem,and i use 0.32.0.
What's the solution?
Pleast, thanks!

@nicholaspier
Copy link

For the specific issue, my problem did turn out to be an issue with internal communication. @aledbf added notes to the documentation to verify connectivity. I had internal communication issues caused by Centos 8's move to nftables. In my case, I needed additional "rich" allow rules in firewalld for:

  • Docker network source (172.17.0.0/16)
  • CNI CIDR source
  • Cluster CIDR source
  • Host IP source
  • Masquerading

@andrei-matei
Copy link

I have the same issue, baremetal install with CentOS 7 worker nodes.

@lesovsky
Copy link

lesovsky commented May 30, 2020

Have the same issue with 0.32.0 on HA baremetal cluster with strange behaviour:
Have two ingresses A and B:

---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: service-alpha
  namespace: staging
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
    - host: alpha.example.org
      http:
        paths:
          - path: /
            backend:
              serviceName: service-alpha
              servicePort: 1080
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: service-beta
  namespace: staging
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/rewrite-target: /$1
spec:
  rules:
    - host: beta.example.org
      http:
        paths:
          - path: /user/(.*)
            backend:
              serviceName: service-users
              servicePort: 1080
          - path: /data/(.*)
            backend:
              serviceName: service-data
              servicePort: 1080
  • ingress A the most of time created without errors, but in very rare cases create attempts return error
  • ingress B is not created and always returns error
# kubectl apply -f manifests/ingress-beta.yml 
Error from server (InternalError): error when creating "manifests/ingress-beta.yml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://ingress-nginx-controller-admission.ingress-nginx.svc:443/extensions/v1beta1/ingresses?timeout=30s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)

In the api-server logs errors look like that

I0530 08:05:56.884549       1 trace.go:116] Trace[898207247]: "Call validating webhook" configuration:ingress-nginx-admission,webhook:validate.nginx.ingress.kubernetes.io,resource:networking.k8s.io/v1beta1, Resource=ingresses,subresource:,operation:CREATE,UID:fdce95ab-e2a9-40f5-9ab3-73a85b603db6 (started: 2020-05-30 08:05:26.883895783 +0000 UTC m=+5434.178340436) (total time: 30.000569226s):
Trace[898207247]: [30.000569226s] [30.000569226s] END
W0530 08:05:56.884664       1 dispatcher.go:133] Failed calling webhook, failing closed validate.nginx.ingress.kubernetes.io: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://ingress-nginx-controller-admission.ingress-nginx.svc:443/extensions/v1beta1/ingresses?timeout=30s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I0530 08:05:56.885303       1 trace.go:116] Trace[868353513]: "Create" url:/apis/networking.k8s.io/v1beta1/namespaces/staging/ingresses,user-agent:kubectl/v1.18.3 (linux/amd64) kubernetes/2e7996e,client:127.0.0.1 (started: 2020-05-30 08:05:26.882592405 +0000 UTC m=+5434.177037017) (total time: 30.002669278s):
Trace[868353513]: [30.002669278s] [30.002248351s] END

The main question is why the first ingress is created the most of times and the second is always failed to create?

Upd. Also this comment on SO might be useful in investigating causes of problems.

Upd 2. When rewrite annotation is removed, the manifest is applied without errors.

Upd 3. It fails in combination with multiple paths and with rewrite annotation.

@aledbf Looks like a bug.

@tomoyk
Copy link
Contributor

tomoyk commented Jun 9, 2020

We have this issue on baremetal k3s cluster. Our http proxy logged these traffic.

gost[515]: 2020/06/09 15:15:37 http.go:151: [http] 192.168.210.21:47396 -> http://:8080 -> ingress-nginx-controller-admission.ingress-nginx.svc:443
gost[515]: 2020/06/09 15:15:37 http.go:241: [route] 192.168.210.21:47396 -> http://:8080 -> ingress-nginx-controller-admission.ingress-nginx.svc:443
gost[515]: 2020/06/09 15:15:37 http.go:262: [http] 192.168.210.21:47396 -> 192.168.210.1:8080 : dial tcp: lookup ingress-nginx-controller-admission.ingress-nginx.svc on 192.168.210.1:53: no such host

@yayoec
Copy link

yayoec commented Jun 10, 2020

@eltonbfw update to 0.32.0 and make sure the API server can reach the POD running the ingress controller

I have the same problem,and i use 0.32.0.
What's the solution?
Pleast, thanks!

me too

@andrei-matei
Copy link

If you are using the baremetal install from Kelsey Hightower, my suggestion is to install kubelet on your master nodes, start calico/flannel or whatever you use for CNI, label your nodes as masters so you have no other pods started there and then your control-plane would be able to communicate with your nginx deployment and the issue should be fixed. At least this is how it worked for me.

onedr0p added a commit to onedr0p/home-ops that referenced this issue Jul 9, 2020
@metaversed
Copy link

@aledbf This issue still occurs

@mikalai-t
Copy link

mikalai-t commented Jul 15, 2020

@andrei-matei Kelsey's cluster works perfectly even without additional CNI plugins and kubelet SystemD services installed on master nodes. All you need is to add a route to Services' CIDR 10.32.0.0/24 using worker node IPs as "next-hop" on master nodes only.
In this way I've got ingress-nginx (deployed from "bare-metal" manifest) and cert-manager webhooks working, but unfortunately not together :( still doesn't know why...

Updated: got both of them working

@lbs-rodrigo
Copy link

@aduncmj I found this solution https://stackoverflow.com/questions/61365202/nginx-ingress-service-ingress-nginx-controller-admission-not-found

kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission

@metaversed
Copy link

@aduncmj i did the same, thank you for sharing the findings. I m curious if this can be handled without manual intervention.

@bluehtt
Copy link

bluehtt commented Jul 25, 2020

@opensourceonly This worked for me, you can try it, you should add a pathType for Ingress configuration. #5445

@jungrae-prestolabs
Copy link

jungrae-prestolabs commented Feb 18, 2022

I don't think deleting all ValidatingWebhookConfiguration is the solution of this
In my case, The problem with the cause was that the version were mixed (e.g. 1.1.1 and 0.47.0)

Error: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": an error on the server ("") has prevented the request from succeeding

@mihaigalos
Copy link

In my GKE cluster I've manually increased timeoutSeconds to 30.

You can do it via Helm:

controller:
  admissionWebhooks:
    enabled: true
    timeoutSeconds: 45

Hi @tehkapa, what resource do you apply this to? Can you post a yaml containing the spec? Thank you.

@marpada
Copy link

marpada commented Apr 6, 2022

On EKS, a security group rule needs to be added on the Node Security Group tcp/8443 from the Cluster Security Group.

@matteovivona
Copy link

matteovivona commented Apr 7, 2022

In my GKE cluster I've manually increased timeoutSeconds to 30.
You can do it via Helm:

controller:
  admissionWebhooks:
    enabled: true
    timeoutSeconds: 45

Hi @tehkapa, what resource do you apply this to? Can you post a yaml containing the spec? Thank you.

@mihaigalos is the global configmap. you can apply it when you install ingress via helm. like this helm install ingress ingress-nginx/ingress-nginx -f values.yaml

values.yaml:

controller:
  admissionWebhooks:
    enabled: true
    timeoutSeconds: 45

@Clasyc
Copy link

Clasyc commented May 11, 2022

On EKS, a security group rule needs to be added on the Node Security Group tcp/8443 from the Cluster Security Group.

In case using terraform:

resource "aws_security_group_rule" "webhook_admission_inbound" {
  type                     = "ingress"
  from_port                = 8443
  to_port                  = 8443
  protocol                 = "tcp"
  security_group_id        = module.eks.node_security_group_id
  source_security_group_id = module.eks.cluster_primary_security_group_id
}

resource "aws_security_group_rule" "webhook_admission_outbound" {
  type                     = "egress"
  from_port                = 8443
  to_port                  = 8443
  protocol                 = "tcp"
  security_group_id        = module.eks.node_security_group_id
  source_security_group_id = module.eks.cluster_primary_security_group_id
}

@sbeaulie
Copy link

I updated from nginx-ingress to ingress-nginx in GKE, so if this helps anyone I needed to add a FW rule to allow 8443 from the API server to my nodes.

As per deploy instructions:
https://kubernetes.github.io/ingress-nginx/deploy/#gce-gke

I'm not sure why it was NOT needed in nginx-ingress.

@chance2021
Copy link

chance2021 commented Jun 29, 2022

Double check if there is any networkpolicy has been set

Error I was getting...

Error from server (InternalError): error when creating "/tmp/ingress-test.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post "https://ingress-nginx-controller-admission.ingress.svc:443/networking/v1/ingresses?timeout=10s": context deadline exceeded

Once below networkpolicy was applied, the issue was gone

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: networkpolicy
  namespace: default
spec:
  ingress:
  - {}
  podSelector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx
  policyTypes:
  - Ingress

@chance2021
Copy link

Make sure both our your nginx-ingress pod and service work properly. My case was that I was assigning the wrong public IP which didn't exist in the corresponding resource group in AKS.

@lgpasquale
Copy link

On EKS, a security group rule needs to be added on the Node Security Group tcp/8443 from the Cluster Security Group.

In case using terraform:

resource "aws_security_group_rule" "webhook_admission_inbound" {
  type                     = "ingress"
  from_port                = 8443
  to_port                  = 8443
  protocol                 = "tcp"
  security_group_id        = module.eks.node_security_group_id
  source_security_group_id = module.eks.cluster_primary_security_group_id
}

resource "aws_security_group_rule" "webhook_admission_outbound" {
  type                     = "egress"
  from_port                = 8443
  to_port                  = 8443
  protocol                 = "tcp"
  security_group_id        = module.eks.node_security_group_id
  source_security_group_id = module.eks.cluster_primary_security_group_id
}

I don't think you need both an ingress and an egress rule but just the ingress one. The first of these two rules should be enough.

For anyone using the terraform-aws-modules/eks/aws module, you can add this to your configuration:

  node_security_group_additional_rules = {
    # nginx-ingress requires the cluster to communicate with the ingress controller
    cluster_to_node = {
      description      = "Cluster to ingress-nginx webhook"
      protocol         = "-1"
      from_port        = 8443
      to_port          = 8443
      type             = "ingress"
      source_cluster_security_group = true
    }
    # Add here any other rule you already have
    # ...
  }

@adiii717
Copy link

adiii717 commented Sep 7, 2022

node_security_group_additional_rules = {
# nginx-ingress requires the cluster to communicate with the ingress controller
cluster_to_node = {
description = "Cluster to ingress-nginx webhook"
protocol = "-1"
from_port = 8443
to_port = 8443
type = "ingress"
source_cluster_security_group = true
}
# Add here any other rule you already have
# ...
}

correct, and this resolved issue for me on EKS 1.23
https://github.com/terraform-aws-modules/terraform-aws-eks#input_node_security_group_additional_rules

@Clasyc
Copy link

Clasyc commented Sep 9, 2022

On EKS, a security group rule needs to be added on the Node Security Group tcp/8443 from the Cluster Security Group.

In case using terraform:

resource "aws_security_group_rule" "webhook_admission_inbound" {
  type                     = "ingress"
  from_port                = 8443
  to_port                  = 8443
  protocol                 = "tcp"
  security_group_id        = module.eks.node_security_group_id
  source_security_group_id = module.eks.cluster_primary_security_group_id
}

resource "aws_security_group_rule" "webhook_admission_outbound" {
  type                     = "egress"
  from_port                = 8443
  to_port                  = 8443
  protocol                 = "tcp"
  security_group_id        = module.eks.node_security_group_id
  source_security_group_id = module.eks.cluster_primary_security_group_id
}

I don't think you need both an ingress and an egress rule but just the ingress one. The first of these two rules should be enough.

For anyone using the terraform-aws-modules/eks/aws module, you can add this to your configuration:

  node_security_group_additional_rules = {
    # nginx-ingress requires the cluster to communicate with the ingress controller
    cluster_to_node = {
      description      = "Cluster to ingress-nginx webhook"
      protocol         = "-1"
      from_port        = 8443
      to_port          = 8443
      type             = "ingress"
      source_cluster_security_group = true
    }
    # Add here any other rule you already have
    # ...
  }

You are right, ingress is enough.

@RamyAllam
Copy link

For GKE private nodes, this should help

gcloud compute firewall-rules create RULE-NAME-master-nginx-ingress \
    --action ALLOW \
    --direction INGRESS \
    --source-ranges CONTROL_PLANE_RANGE \
    --rules tcp:8443 \
    --target-tags TARGET \
    --project GCP_PROJECT

Example

gcloud compute firewall-rules create gke-private-cluster-01-f13afdc6-master-nginx-ingress \
    --action ALLOW \
    --direction INGRESS \
    --source-ranges 172.16.0.0/28 \
    --rules tcp:8443 \
    --target-tags gke-private-cluster-01-f13afdc6-node \
    --project mygcpproject

You can also list the existing rules for the cluster

gcloud compute firewall-rules list \
    --filter 'name~^CLUSTER_NAME' \
    --format 'table(
        name,
        network,
        direction,
        sourceRanges.list():label=SRC_RANGES,
        allowed[].map().firewall_rule().list():label=ALLOW,
        targetTags.list():label=TARGET_TAGS
    )' --project GCP_PROJECT

Reference: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#step_3_add_a_firewall_rule

lemeurherve added a commit to jenkins-infra/aws that referenced this issue Oct 11, 2022
…ingress controller

Fix following error when deploying an exposed service in eks-public:

> Error: release artifact-caching-proxy failed, and has been uninstalled due to atomic being set: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post "https://public-nginx-ingress-ingress-nginx-controller-admission.public-nginx-ingress.svc:443/networking/v1/ingresses?timeout=10s": context deadline exceeded

Ref: kubernetes/ingress-nginx#5401 (comment)
@xgt001
Copy link

xgt001 commented Oct 13, 2022

thanks @Clasyc, your hint worked for me

@sotiriougeorge
Copy link

sotiriougeorge commented Oct 19, 2022

@Clasyc and everyone also:

On EKS created from terraform-aws-modules/eks/aws module (version 17.x though) a security group is automatically created by the module itself, for the Worker Nodes that has a rule which allows traffic from the Control Plane security group on ports 1025-65535 for TCP.

This rule also includes the pre-defined description "Allow worker pods to receive communication from the cluster control plane".

Does this not cover the case of the security group mentioned above?

If it does, I am still facing this issue but intermittently, especially when I am deploying massive workloads through Helm (the Ingresses have been checked and are OK as far as their correctness is concerned). It almost seems like a flood-protection mechanism because if I let it cooldown then I don't get it anymore.

Am I missing something here?

@tbondarchuk
Copy link

@sotiriougeorge Same here: eks created by tf module, from time to time see those errors. I think amount of errors decreases when controller is scaled up. At least it seems so for me on prod with two replicas compared to dev with one.

@sotiriougeorge
Copy link

@sotiriougeorge Same here: eks created by tf module, from time to time see those errors. I think amount of errors decreases when controller is scaled up. At least it seems so for me on prod with two replicas compared to dev with one.

Thank you for the sanity check! Appreciated. I will try to scale up to more replicas and see what comes of it. However it would be good if through this GitHub issue there was some consensus on how to fight it holistically or if there is anything that needs to be changed on the controller side.

@jhodnett2
Copy link

jhodnett2 commented Nov 12, 2022

On EKS, a security group rule needs to be added on the Node Security Group tcp/8443 from the Cluster Security Group.

In case using terraform:

resource "aws_security_group_rule" "webhook_admission_inbound" {
  type                     = "ingress"
  from_port                = 8443
  to_port                  = 8443
  protocol                 = "tcp"
  security_group_id        = module.eks.node_security_group_id
  source_security_group_id = module.eks.cluster_primary_security_group_id
}

resource "aws_security_group_rule" "webhook_admission_outbound" {
  type                     = "egress"
  from_port                = 8443
  to_port                  = 8443
  protocol                 = "tcp"
  security_group_id        = module.eks.node_security_group_id
  source_security_group_id = module.eks.cluster_primary_security_group_id
}

I don't think you need both an ingress and an egress rule but just the ingress one. The first of these two rules should be enough.

For anyone using the terraform-aws-modules/eks/aws module, you can add this to your configuration:

  node_security_group_additional_rules = {
    # nginx-ingress requires the cluster to communicate with the ingress controller
    cluster_to_node = {
      description      = "Cluster to ingress-nginx webhook"
      protocol         = "-1"
      from_port        = 8443
      to_port          = 8443
      type             = "ingress"
      source_cluster_security_group = true
    }
    # Add here any other rule you already have
    # ...
  }

Just a heads up here: when the protocol is set to "-1", it means "All Traffic". This opens up all ports, making the from_port/to_port values moot. This may be too permissive in some cases. Setting to"tcp" will allow you to limit/set the port range to 8443.

Having had the same issues noted above and finding this solution, I found the rule wasn't what I was expecting. Had troubles finding the rule because I was searching by port.

@stevenyongzion
Copy link

stevenyongzion commented Apr 8, 2023

On EKS, a security group rule needs to be added on the Node Security Group tcp/8443 from the Cluster Security Group.

In case using terraform:

resource "aws_security_group_rule" "webhook_admission_inbound" {
  type                     = "ingress"
  from_port                = 8443
  to_port                  = 8443
  protocol                 = "tcp"
  security_group_id        = module.eks.node_security_group_id
  source_security_group_id = module.eks.cluster_primary_security_group_id
}

resource "aws_security_group_rule" "webhook_admission_outbound" {
  type                     = "egress"
  from_port                = 8443
  to_port                  = 8443
  protocol                 = "tcp"
  security_group_id        = module.eks.node_security_group_id
  source_security_group_id = module.eks.cluster_primary_security_group_id
}

I don't think you need both an ingress and an egress rule but just the ingress one. The first of these two rules should be enough.
For anyone using the terraform-aws-modules/eks/aws module, you can add this to your configuration:

  node_security_group_additional_rules = {
    # nginx-ingress requires the cluster to communicate with the ingress controller
    cluster_to_node = {
      description      = "Cluster to ingress-nginx webhook"
      protocol         = "-1"
      from_port        = 8443
      to_port          = 8443
      type             = "ingress"
      source_cluster_security_group = true
    }
    # Add here any other rule you already have
    # ...
  }

Just a heads up here: when the protocol is set to "-1", it means "All Traffic". This opens up all ports, making the from_port/to_port values moot. This may be too permissive in some cases. Setting to"tcp" will allow you to limit/set the port range to 8443.

Having had the same issues noted above and finding this solution, I found the rule wasn't what I was expecting. Had troubles finding the rule because I was searching by port.

For those who is using GKE, this is the sample Terraform code I use to open port 8443:

resource "google_compute_firewall" "port_8443_nginx_controller" {
  name    = "port-nginx-controller-webhook-allow-8443"
  network = google_compute_network.vpc.name
  description = "Refer to https://stackoverflow.com/a/65675908/778932"

  allow {
    protocol = "tcp"
    ports    = ["8443"]
  }

  source_ranges = [var.private_cluster_cidr]
  target_tags   = ["${var.project_name}-pool"]
}

Refer to this to get target_tags.

@PavithraKMR
Copy link

I too faced the same error when I was applying the command ->

C:\Users\Pavithra Kanmaniraja\Documents\kubernetes-sample-apps>kubectl apply -f ingressdemons1.yaml -n demons1
Error from server (InternalError): error when creating "ingressdemons1.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "https://nginx-ingress-ingress-nginx-controller-admission.ingress-nginx.svc:443/networking/v1/ingresses?timeout=10s": service "nginx-ingress-ingress-nginx-controller-admission" not found

So I tried to remove the namespace from the cluster, but it does not remove everything that I created when I installed ingress to my cluster. Then, I deleted the existing ValidatingWebhookConfiguration by using the command ->

C:\Users\Pavithra Kanmaniraja>kubectl delete ValidatingWebhookConfiguration nginx-ingress-ingress-nginx-admission
validatingwebhookconfiguration.admissionregistration.k8s.io "nginx-ingress-ingress-nginx-admission" deleted

After that, I applied the command ->

C:\Users\Pavithra Kanmaniraja>kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.0.4/deploy/static/provider/cloud/deploy.yaml
namespace/ingress-nginx created
serviceaccount/ingress-nginx created
configmap/ingress-nginx-controller created
clusterrole.rbac.authorization.k8s.io/ingress-nginx configured
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx configured
role.rbac.authorization.k8s.io/ingress-nginx created
rolebinding.rbac.authorization.k8s.io/ingress-nginx created
service/ingress-nginx-controller-admission created
service/ingress-nginx-controller created
deployment.apps/ingress-nginx-controller created
ingressclass.networking.k8s.io/nginx unchanged
validatingwebhookconfiguration.admissionregistration.k8s.io/ingress-nginx-admission created
serviceaccount/ingress-nginx-admission created
clusterrole.rbac.authorization.k8s.io/ingress-nginx-admission configured
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx-admission configured
role.rbac.authorization.k8s.io/ingress-nginx-admission created
rolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
job.batch/ingress-nginx-admission-create created
job.batch/ingress-nginx-admission-patch created

Next, again apply the command ->

C:\Users\Pavithra Kanmaniraja\Documents\kubernetes-sample-apps>kubectl apply -f ingressdemons1.yaml -n ingress-nginx
ingress.networking.k8s.io/doksexample-ingress created

Finally, the ingress is created now.

@longwuyuan
Copy link
Contributor

longwuyuan commented Aug 3, 2023 via email

@akhfzl
Copy link

akhfzl commented Sep 12, 2023

ingress-nginx
ingress.networking.k8s.io/doksexample-ingress created

ingress-nginx ingress.networking.k8s.io/doksexample-ingress created is not command, how about it

@hebabaze
Copy link

you can resolve that by opening the port : 443 and 8443 in each machine of your cluster
if you are using ubuntu you can do as this :
sudo ufw allow 8443
sudo ufw allow proto tcp from any to any port 8443

@devopstales
Copy link

devopstales commented Jan 17, 2024

For me this policy slowed my issue:

# Cilium format:
---
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-ingress--ingress-nginx-private
  namespace: ingress-system
spec:
  endpointSelector:
    matchLabels:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: ingress-private
  ingress:
    - fromEntities:
        - world
      toPorts:
        - ports:
            - port: "443"
        - ports:
            - port: "80"
    - fromEntities:
        - cluster
      toPorts:
        - ports:
            - port: "8443"
  egress:
    - {}
 ---
# Standard format:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-server-to-ingress-webhook-server
  namespace: ingress-system
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: ingress-private
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - ipBlock:
            cidr: 0.0.0.0/0
      ports:
        - port: 443
        - port: 80
        - port: 8443
    - from:
        - namespaceSelector: {}
      ports:
        - port: 443
        - port: 80
        - port: 8443
  egress: []

You need to allow the communication from the api server in the cluster to the ingress-controller pod.

@michaelRanivoEpitech
Copy link

Hello everyone,

I try to set up a Kubernetes cluster on Azure VMs with Ubuntu.
I initiated my cluster with kubeadm.
For this initialization, I followed the Kube documentation but when I installed the Nginx Ingress controller I had the same problem as this issue. Not only Nginx Ingress Controller but also cert-manager. When I try to create an ingress for the ingress controller and when I try to create an Issuer for cert-manager, I always have the same error.
I bought a book on building a Kubernetes cluster and I followed what was written but I still have the same problem.
I went to see the errors of Cert-Manager and in the list of errors I see my error: https://cert-manager.io/docs/troubleshooting/webhook/#error-context-deadline-exceeded I did the few manipulations they ask to do and my webhook cert-manager responds well as in their example. I tried the same thing for Nginx's webhook I have an answer.
They then say that it is the api-server that cannot contact the webhooks: https://cert-manager.io/docs/troubleshooting/webhook/#error-io-timeout-connectivity-issue . What I don't understand is that I have recreated my cluster over and over again but I still have the same errors. Have I initiated my cluster wrongly? Or are there really connection problems between the api-server and the webhooks?
If someone have some articles, documentation or something else to help building kubernetes cluster step by step I appreciate.
By the way I try to learn kubernetes so ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

Successfully merging a pull request may close this issue.