`Warning Event: Readiness probe failed: HTTP probe failed with statuscode: 500` occured while upgrade #12401

dbbDylan · 2024-11-22T10:08:48Z

What happened:

Background: ingress-nginx-controller zero downtime upgrade investigation.
Strategy: I used helm upgrade --reuse-values command to complete upgrade.

The system operates smoothly if no requests are sent during the upgrade period. However, when using Grafana K6 to monitor the frequency of HTTPS requests, an error occurs as the new controller pod is fully initialized and the old pod begins to terminate. This issue only lasts for a brief moment, yet it can be consistently reproduced.

Here is the warning event:

And here is the K6 test log:

$ sh run.sh

         /\      Grafana   /‾‾/
    /\  /  \     |\  __   /  /
   /  \/    \    | |/ /  /   ‾‾\
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/

     execution: local
        script: script.js
        output: -

     scenarios: (100.00%) 1 scenario, 1024 max VUs, 2m30s max duration (incl. graceful stop):
              * default: 1024 looping VUs for 2m0s (gracefulStop: 30s)
      
WARN[0087] Request Failed                                error="Post \"http://my-hostname/v1/tests/post\": EOF"                                                                                                                          
WARN[0087] Request Failed                                error="Post \"http://my-hostname/v1/tests/post\": read tcp 10.59.89.82:59064->10.47.104.129:80: wsarecv: An existing connection was forcibly closed by the remote host."        
WARN[0087] Request Failed                                error="Post \"http://my-hostname/v1/tests/post\": EOF"                                                                                                                                    
WARN[0087] Request Failed                                error="Post \"http://my-hostname/v1/tests/post\": read tcp 10.59.89.82:59082->10.47.104.129:80: wsarecv: An existing connection was forcibly closed by the remote host."                                                                                                                          
WARN[0087] Request Failed                                error="Post \"http://my-hostname/v1/tests/post\": EOF"                                                                                                                          

     data_received..................: 37 MB 295 kB/s
     data_sent......................: 12 MB 93 kB/s
     http_req_blocked...............: avg=23.5ms   min=0s       med=0s    max=731.46ms p(90)=0s    p(95)=510.49µs
     http_req_connecting............: avg=14.79ms  min=0s       med=0s    max=343.54ms p(90)=0s    p(95)=0s
     http_req_duration..............: avg=2.81s    min=3.12ms   med=2.8s  max=10.18s   p(90)=4.82s p(95)=5.07s
       { expected_response:true }...: avg=2.81s    min=313.71ms med=2.81s max=10.18s   p(90)=4.83s p(95)=5.07s
     http_req_failed................: 0.26% 117 out of 43956
     http_req_receiving.............: avg=468.21µs min=0s       med=0s    max=14.93ms  p(90)=987µs p(95)=2.21ms
     http_req_sending...............: avg=21.26µs  min=0s       med=0s    max=8.52ms   p(90)=0s    p(95)=0s
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s    max=0s       p(90)=0s    p(95)=0s
     http_req_waiting...............: avg=2.81s    min=3.12ms   med=2.8s  max=10.18s   p(90)=4.82s p(95)=5.07s
     http_reqs......................: 43956 350.979203/s
     iteration_duration.............: avg=2.83s    min=13.56ms  med=2.82s max=10.18s   p(90)=4.85s p(95)=5.09s
     iterations.....................: 43956 350.979203/s
     vus............................: 10    min=10           max=1024
     vus_max........................: 1024  min=1024         max=1024

                                                                                                                                                                                                                                                      
running (2m05.2s), 0000/1024 VUs, 43956 complete and 0 interrupted iterations                                                                                                                                                                         
default ✓ [======================================] 1024 VUs  2m0s

During this period, I encounter numerous empty responses, and there are no error logs in the ingress-nginx-controller pod. However, if a TCP connection has been established prior to this, it remains uninterrupted (tested it by telnet ${my-tcp-service} ${port} command).

So I want to confirm if it's the upgrade caused short-lived service interruption of the ingress-nginx-controller?

What you expected to happen:

No warnings should occur throughout the upgrade process, and any requests should be handled whether or not the returned status code is 200.

NGINX Ingress controller version (exec into the pod and run /nginx-ingress-controller --version): v1.11.2 & v1.11.3

Kubernetes version (use kubectl version):

Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.5

Environment:

Cloud provider or hardware configuration: I used Gardener to control all clusters, so I have no permissions to check it.
OS (e.g. from /etc/os-release): linux-amd64
Kernel (e.g. uname -a):
Install tools:
- Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.

Basic cluster related info:

kubectl get nodes -o wide

$ kubectl get nodes -o wide
NAME                                                       STATUS   ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE              KERNEL-VERSION       CONTAINER-RUNTIME
shoot--gtlcdevqa--dylan-test-worker-tt7i7-z1-6db57-25mdw   Ready    <none>   88m   v1.30.5   10.180.0.213   <none>        Garden Linux 1592.3   6.6.62-cloud-amd64   containerd://1.7.20
shoot--gtlcdevqa--dylan-test-worker-tt7i7-z1-6db57-mn8zv   Ready    <none>   89m   v1.30.5   10.180.0.187   <none>        Garden Linux 1592.3   6.6.62-cloud-amd64   containerd://1.7.20

How was the ingress-nginx-controller installed:

If helm was used then please show output of helm ls -A | grep -i ingress

$ helm ls -A | grep -i ingress
ingress-nginx           ingress-nginx           28              2024-11-18 16:34:27.1373854 +0800 CST   deployed    ingress-nginx-4.11.3            1.11.3

If helm was used then please show output of helm -n <ingresscontrollernamespace> get values <helmreleasename>

$ helm -n ingress-nginx get values ingress-nginx
USER-SUPPLIED VALUES:
controller:
  allowSnippetAnnotations: true
  config:
    client-body-timeout: "360"
    proxy-body-size: 1024m
    proxy-buffer-size: 16k
    proxy-connect-timeout: "30"
    proxy-read-timeout: "3600"
    proxy-send-timeout: "900"
    proxy-set-headers: ingress-nginx/custom-headers
  extraArgs:
    configmap: $(POD_NAMESPACE)/ingress-nginx-controller
    controller-class: k8s.io/ingress-nginx
    default-ssl-certificate: ingress-nginx/gtlconlycert
    enable-ssl-passthrough: "true"
    ingress-class: nginx
    publish-service: $(POD_NAMESPACE)/ingress-nginx-controller
    tcp-services-configmap: $(POD_NAMESPACE)/ingress-nginx-tcp
    validating-webhook: :8443
    validating-webhook-certificate: /usr/local/certificates/cert
    validating-webhook-key: /usr/local/certificates/key
    watch-ingress-without-class: "true"
  metrics:
    enabled: true
    service:
      annotations:
        prometheus.io/port: "10254"
        prometheus.io/scrape: "true"
    serviceMonitor:
      enabled: true
      namespace: kube-prometheus-stack
      scrapeInterval: 500ms
tcp:
  "31080": prod/blackduck-report:1081

Current State of the controller:

kubectl describe ingressclasses

Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.11.3
              helm.sh/chart=ingress-nginx-4.11.3
Annotations:  meta.helm.sh/release-name: ingress-nginx
              meta.helm.sh/release-namespace: ingress-nginx
Controller:   k8s.io/ingress-nginx
Events:       <none>

kubectl -n <ingresscontrollernamespace> get all -A -o wide

$ kubectl -n ingress-nginx get all -o wide
NAME                                            READY   STATUS    RESTARTS   AGE     IP            NODE                                                       NOMINATED NODE   READINESS GATES
pod/ingress-nginx-controller-67fbb67c7b-tpfpt   1/1     Running   0          3d22h   100.64.1.23   shoot--gtlcdevqa--dylan-test-worker-tt7i7-z1-6db57-25mdw   <none>           <none>

NAME                                         TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                                      AGE   SELECTOR
service/ingress-nginx-controller             LoadBalancer   100.111.24.47    10.47.104.129   80:31686/TCP,443:32033/TCP,31080:31568/TCP   25d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
service/ingress-nginx-controller-admission   ClusterIP      100.106.5.80     <none>          443/TCP                                      25d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
service/ingress-nginx-controller-metrics     ClusterIP      100.110.133.77   <none>          10254/TCP                                    14d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                                                                                                                   
  SELECTOR
deployment.apps/ingress-nginx-controller   1/1     1            1           25d   controller   registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME                                                  DESIRED   CURRENT   READY   AGE    CONTAINERS   IMAGES                                                                                                            
         SELECTOR
replicaset.apps/ingress-nginx-controller-56bcbbf9bc   0         0         0       4d1h   controller   registry.k8s.io/ingress-nginx/controller:v1.11.2@sha256:d5f8217feeac4887cb1ed21f27c2674e58be06bd8f5184cacea2a69abaf78dce   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=56bcbbf9bc
replicaset.apps/ingress-nginx-controller-67fbb67c7b   1         1         1       4d1h   controller   registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=67fbb67c7b

kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>

$ kubectl describe po -n ingress-nginx ingress-nginx-controller-67fbb67c7b-tpfpt
Name:             ingress-nginx-controller-67fbb67c7b-tpfpt
Namespace:        ingress-nginx
Priority:         0
Service Account:  ingress-nginx
Node:             shoot--gtlcdevqa--dylan-test-worker-tt7i7-z1-6db57-25mdw/10.180.0.213
Start Time:       Fri, 22 Nov 2024 16:11:19 +0800
Labels:           app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=ingress-nginx
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.11.3
                  helm.sh/chart=ingress-nginx-4.11.3
                  pod-template-hash=67fbb67c7b
Annotations:      cni.projectcalico.org/containerID: 6b2b57de91e25a2c7dbdac5dc865f7c3c09ae62b4b1a1269a1eb4c3070328020
                  cni.projectcalico.org/podIP: 100.64.1.23/32
                  cni.projectcalico.org/podIPs: 100.64.1.23/32
Status:           Running
IP:               100.64.1.23
IPs:
  IP:           100.64.1.23
Controlled By:  ReplicaSet/ingress-nginx-controller-67fbb67c7b
Containers:
  controller:
    Container ID:    containerd://cd4e18fc7e76caaabc2fed13acd26af7fef665f2e01a645503c3d8661a091831
    Image:           registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7
    Image ID:        registry.k8s.io/ingress-nginx/controller@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7
    Ports:           80/TCP, 443/TCP, 10254/TCP, 8443/TCP, 31080/TCP
    Host Ports:      0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
      --election-id=ingress-nginx-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
      --tcp-services-configmap=$(POD_NAMESPACE)/ingress-nginx-tcp
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
      --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
      --controller-class=k8s.io/ingress-nginx
      --default-ssl-certificate=ingress-nginx/gtlconlycert
      --enable-ssl-passthrough=true
      --ingress-class=nginx
      --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
      --tcp-services-configmap=$(POD_NAMESPACE)/ingress-nginx-tcp
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
      --watch-ingress-without-class=true
    State:          Running
      Started:      Fri, 22 Nov 2024 16:13:05 +0800
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      100m
      memory:   90Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5        
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3        
    Environment:
      POD_NAME:                 ingress-nginx-controller-67fbb67c7b-tpfpt (v1:metadata.name)
      POD_NAMESPACE:            ingress-nginx (v1:metadata.namespace)
      LD_PRELOAD:               /usr/local/lib/libmimalloc.so
      KUBERNETES_SERVICE_HOST:  api.dylan-test.gtlcdevqa.internal.canary.k8s.ondemand.com
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d2v66 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-admission
    Optional:    false
  kube-api-access-d2v66:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>

$ kubectl -n ingress-nginx describe svc ingress-nginx-controller
Name:                     ingress-nginx-controller
Namespace:                ingress-nginx
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=ingress-nginx
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.11.3
                          helm.sh/chart=ingress-nginx-4.11.3
Annotations:              loadbalancer.openstack.org/load-balancer-address: 10.47.104.129
                          loadbalancer.openstack.org/load-balancer-id: 54ef842a-05c0-482a-b3bf-255012af91d8 
                          meta.helm.sh/release-name: ingress-nginx
                          meta.helm.sh/release-namespace: ingress-nginx
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       100.111.24.47
IPs:                      100.111.24.47
LoadBalancer Ingress:     10.47.104.129
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  31686/TCP
Endpoints:                100.64.1.23:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  32033/TCP
Endpoints:                100.64.1.23:443
Port:                     31080-tcp  31080/TCP
TargetPort:               31080-tcp/TCP
NodePort:                 31080-tcp  31568/TCP
Endpoints:                100.64.1.23:31080
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

Current state of ingress object, if applicable:

kubectl -n <appnamespace> get all,ing -o wide

$ kubectl -n web-service get ingress -owide
NAME                      CLASS    HOSTS                      ADDRESS         PORTS   AGE
web-service-gin-ingress   <none>   my-host   10.47.104.129   80      8d

kubectl -n <appnamespace> describe ing <ingressname>

$ kubectl describe ingress web-service-gin-ingress -n web-service 
Name:             web-service-gin-ingress
Labels:           <none>
Namespace:        web-service
Address:          10.47.104.129
Ingress Class:    <none>
Default backend:  <default>
Rules:
  Host                      Path  Backends
  ----                      ----  --------
  my-host
                            /   web-service-gin-service:8080 (100.64.1.4:8080,100.64.1.5:8080,100.64.1.6:8080)
Annotations:                nginx.ingress.kubernetes.io/configuration-snippet: more_set_headers "X-Ingress-Pod-Name: $HOSTNAME";
Events:                     <none>

If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag

$ GUID=1
$ DATETIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
$ curl -X POST "http://my-host/v1/tests/post" -H "Content-Type: application/json" -d "{
    \"id\": \"$GUID\", 
    \"create_time\": \"$DATETIME\",
    \"sleep_time_ms\": 10
}"
{"id":"1","ingress_pod_name_form":"ingress-nginx-controller-67fbb67c7b-tpfpt","create_time":"2024-11-22T09:58:05Z","receive_time":"2024-11-22T09:58:52.425334756Z","finish_time":"2024-11-22T09:58:52.435498011Z","consume_sec":0.010163236}

$ curl -vX POST "http://my-host/v1/tests/post" -H "Content-Type: application/json" -d "{   
    \"id\": \"$GUID\",
    \"create_time\": \"$DATETIME\",
    \"sleep_time_ms\": 10
}"
Note: Unnecessary use of -X or --request, POST is already inferred.
* Host my-host:80 was resolved.
* IPv6: (none)
* IPv4: 10.47.104.129
*   Trying 10.47.104.129:80...
* Connected to dylan-test.gtlc.only.sap (10.47.104.129) port 80
* using HTTP/1.x
> POST /v1/tests/post HTTP/1.1
> Host: dylan-test.gtlc.only.sap
> User-Agent: curl/8.10.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 87
>
* upload completely sent off: 87 bytes
< HTTP/1.1 200 OK
< Date: Fri, 22 Nov 2024 10:00:18 GMT
< Content-Type: application/json; charset=utf-8
< Content-Length: 236
< Connection: keep-alive
< Access-Control-Allow-Credentials: true
< Access-Control-Allow-Headers: Content-Type, Content-Length, Accept-Encoding, X-CSRF-Token, Authorization, accept, origin, Cache-Control, X-Requested-With
< Access-Control-Allow-Methods: POST, OPTIONS, GET, PUT, DELETE
< Access-Control-Allow-Origin: *
< X-Ingress-Pod-Name-From: ingress-nginx-controller-67fbb67c7b-tpfpt
< X-Ingress-Pod-Name: ingress-nginx-controller-67fbb67c7b-tpfpt
<
{"id":"1","ingress_pod_name_form":"ingress-nginx-controller-67fbb67c7b-tpfpt","create_time":"2024-11-22T09:58:05Z","receive_time":"2024-11-22T10:00:18.300485598Z","finish_time":"2024-11-22T10:00:18.310760665Z","consume_sec":0.010275065}* Connection #0 to host my-host left intact

Others:
- Any other related information like ;
  - copy/paste of the snippet (if applicable)
  - kubectl describe ... of any custom configmap(s) created and in use
  - Any other related information that may help

How to reproduce this issue:

To reproduce it, you just need one web-service (any pod can receive HTTP request is ok). Then you can use this K6 script:

import http from 'k6/http';
import { uuidv4 } from 'https://jslib.k6.io/k6-utils/1.4.0/index.js';

export const options = {
  vus: 1024,
  duration: '120s',
};

function getFormattedDateTimeNow() {
  const now = new Date();
  const isoString = now.toISOString();

  return isoString;
}

function formattedResponseOutput(res) {
  const status = res.status;
  const statusText = res.status_text;
  const to = res.headers['X-Ingress-Pod-Name'];
  const from = res.headers['X-Ingress-Pod-Name-From'];

  if (res.status != 200) {
    console.log(`[${from}] --> [${to}] : { Status: ${status}, Status Text: ${statusText} }`);
  } else {
    console.log(`[${from}] --> [${to}] : { Status: ${status}, ResponseBody: ${res.body} }`);
  }
}

export default function () {
  const url = 'http://my-host/v1/tests/post';
  const sleep_upper_limit_ms = 5000

  const playload = JSON.stringify({
    "id": uuidv4(),
    "create_time": getFormattedDateTimeNow(),
    "sleep_time_ms": Math.floor(Math.random() * (sleep_upper_limit_ms + 1)), 
  })

  const params = {
    headers: {
      'Content-Type': 'application/json',
    },
  };

  const res = http.post(url, playload, params);
  formattedResponseOutput(res);
}

Anything else we need to know:

You can use my test image implemented by Go: image: doublebiao/web-service-gin:v1.0-beta

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-11-22T10:08:57Z

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

longwuyuan · 2024-11-22T14:46:37Z

/remove-kind bug
/kind support

You have only one pod of the controller so yes, you will get brief disruption during upgrade.

You can experiment with more than one replicas and the values for minAvailable etc.

dbbDylan · 2024-11-25T06:15:35Z

Thanks for your support! @longwuyuan

As your suggestions here, I try to change my value-specific.yaml:

+    replicaCount: 2

But the same error still occurred when the old pod switch to terminating:

I also try to add sleep 15 before executing the wait-shutdown, but it also not work.

longwuyuan · 2024-11-25T06:30:03Z

Those are not the only values. Please explore others.
Each use case is specific . For example I suggested but your response says you tried only one of my suggestions. Like increase replicas to maybe 3 and set minAvailable to 1 https://kubernetes.io/docs/tasks/run-application/configure-pdb/. This is for having at least 1 pod for new conections

If its about graceful draining of established connections, then please look at other such config options for timers etc. There is no well-documented use case with the controller for this. Each user finds their most suitable config by trial and error.

dbbDylan · 2024-11-25T08:35:38Z

I've tried a lot of ways:

replicaCount: 2 and minAvailable: 1
replicaCount: 3 and minAvailable: 1
replicaCount: 3 and minAvailable: 2
replicaCount: 1 and minAvailable: 1 and preStop: ["/bin/sh", "-c", "sleep 15s && /wait-shutdown"]

All of them are not works.

However, I have found that all the errors are coming from the old pod when executing the “wait-shutdown” script. The old pod still receives messages when the controller is shutting down and before nginx terminates, but this is not as expected:

So I don't think it's a configuration issue, but rather a brief service interruption during graceful termination. In my opinion, the expected process maybe like:

Graceful termination started.
Network traffic changed.
(Old pod stops receiving requests) Nginx service stopped.
Old pod deleted.

But the current stage can't guarantee the second step happened before the third step. Could you double-check it?

Thanks for your strong support again.

dbbDylan · 2024-11-25T08:56:22Z

func (srv *Server) ListenAndServe() error {
	if srv.shuttingDown() {
		return ErrServerClosed  // the fatal error
	}
	addr := srv.Addr
	if addr == "" {
		addr = ":http"
	}
	ln, err := net.Listen("tcp", addr)
	if err != nil {
		return err
	}
	return srv.Serve(ln)
}

dbbDylan · 2024-11-25T09:44:56Z

More information updated:

Once a pod transitions from Running to Terminating, the Endpoint associated with the ingress-nginx-controller
Service should have completed its IP change. Therefore, I suspect that the issue might not be with the ingress-nginx-controller itself, but rather with the way the k6 load testing tool is handling connections. Could you help me confirm this hypothesis?

dbbDylan added the kind/bug Categorizes issue or PR as related to a bug. label Nov 22, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Nov 22, 2024

k8s-ci-robot added the needs-priority label Nov 22, 2024

strongjz added this to [SIG Network] Ingress NGINX Nov 22, 2024

k8s-ci-robot added kind/support Categorizes issue or PR as a support question. and removed kind/bug Categorizes issue or PR as related to a bug. labels Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Warning Event: Readiness probe failed: HTTP probe failed with statuscode: 500` occured while upgrade #12401

`Warning Event: Readiness probe failed: HTTP probe failed with statuscode: 500` occured while upgrade #12401

dbbDylan commented Nov 22, 2024

k8s-ci-robot commented Nov 22, 2024

longwuyuan commented Nov 22, 2024

dbbDylan commented Nov 25, 2024

longwuyuan commented Nov 25, 2024

dbbDylan commented Nov 25, 2024 •

edited

Loading

dbbDylan commented Nov 25, 2024

dbbDylan commented Nov 25, 2024

Warning Event: Readiness probe failed: HTTP probe failed with statuscode: 500 occured while upgrade #12401

Warning Event: Readiness probe failed: HTTP probe failed with statuscode: 500 occured while upgrade #12401

Comments

dbbDylan commented Nov 22, 2024

k8s-ci-robot commented Nov 22, 2024

longwuyuan commented Nov 22, 2024

dbbDylan commented Nov 25, 2024

longwuyuan commented Nov 25, 2024

dbbDylan commented Nov 25, 2024 • edited Loading

dbbDylan commented Nov 25, 2024

dbbDylan commented Nov 25, 2024

`Warning Event: Readiness probe failed: HTTP probe failed with statuscode: 500` occured while upgrade #12401

`Warning Event: Readiness probe failed: HTTP probe failed with statuscode: 500` occured while upgrade #12401

dbbDylan commented Nov 25, 2024 •

edited

Loading