handle api server blips #3043

slaupster · 2018-09-05T08:43:53Z

NGINX Ingress controller version: 0.17.1

Kubernetes version (use kubectl version):1.10.7

Environment: Any

Cloud provider or hardware configuration: N/A
OS (e.g. from /etc/os-release): N/A
Kernel (e.g. uname -a): N/A
Install tools: N/A
Others: N/A

What happened:
When the API server is unavailable, nginx ingress stops working, believing no services are available. This causes a total ingress outage when config update happens, but only the api server is unavailable.
Everything in the data plane is fine.

What you expected to happen:
Nginx Ingress should know the underlying k8 client cannot connect to the api server and temporarily disable config update. This is not ideal and there is nothing to do other than get the api server back, but its better than a total outage. Once connection to API server is restored and successful inform cycle has run, re-enable the config update.

How to reproduce it (as minimally and precisely as possible):
Set up an ingress, stop/inhibit the API server, wait for Informers to update (with nothing), observe that the ingress no longer works, yet actual pods and services are still active and viable.

Anything else we need to know: no

The text was updated successfully, but these errors were encountered:

aledbf · 2018-09-05T12:20:53Z

@slaupster we disable the resync of the informers #2634
If the controller is already running, then this should not be an issue. That said, the controller will not start if we cannot reach the apiserver (there is nothing we can do about this)

slaupster · 2018-09-05T13:37:53Z

Thanks for the reply @aledbf

#2634 made it into 0.16.0 - I've hit this issue more than once with 0.17.1.

Logs look like
ingress.log

Nginx Ingress Pods were running happily for days before and days since, so it recovers fine.

           - /nginx-ingress-controller
           - --default-backend-service={{ .Values.namespace }}/default-http-backend
           - --tcp-services-configmap={{ .Values.namespace }}/tcp-configmap
           - --configmap={{ .Values.namespace }}/nginx-configuration
           - --enable-dynamic-configuration=true
           - --watch-namespace={{ .Values.namespace }}
           - --update-status=false

aledbf · 2018-09-05T13:42:49Z

E0827 21:13:39.275400 8 reflector.go:205] k8s.io/ingress-nginx/internal/ingress/controller/store/store.go:172: Failed to list *v1beta1.Ingress: Get https://172.21.0.1:443/apis/extensions/v1beta1/namespaces//ingresses?limit=500&resourceVersion=0: dial tcp 172.21.0.1:443: connect: connection timed out

This is expected. The informers (sync mechanism from client-go) detect connections issues with the apiserver. The content of the informers (services, configmap, endpoints, secrets) should be there.

W0827 21:25:41.767916 8 controller.go:359] Service does not have any active Endpoint

This is should not happen. Let me see if I can reproduce locally

aledbf · 2018-09-08T16:08:31Z

@slaupster I cannot reproduce the issue you are describing. Please check the gist https://gist.github.com/aledbf/5a24605f2083558b2d3be2b014c43c44

Scenarios:
1.

single ingress
short unavailability of apiserver

500 ingresses
short unavailability of apiserver
multiple unavailabilities (minutes to more than an hourt) of apiserver

aledbf · 2018-09-08T16:09:42Z

@slaupster also, when the server returned there wasn't a single reload. From your logs it seems that you have connectivity issues with the master and some ingress/service changed?

aledbf · 2018-09-08T16:10:36Z

Closing. Please reopen if you can provide a reproducible scenario of the issue you described.

aledbf closed this as completed Sep 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle api server blips #3043

handle api server blips #3043

slaupster commented Sep 5, 2018

aledbf commented Sep 5, 2018 •

edited

Loading

slaupster commented Sep 5, 2018 •

edited

Loading

aledbf commented Sep 5, 2018

aledbf commented Sep 8, 2018

aledbf commented Sep 8, 2018

aledbf commented Sep 8, 2018

handle api server blips #3043

handle api server blips #3043

Comments

slaupster commented Sep 5, 2018

aledbf commented Sep 5, 2018 • edited Loading

slaupster commented Sep 5, 2018 • edited Loading

aledbf commented Sep 5, 2018

aledbf commented Sep 8, 2018

aledbf commented Sep 8, 2018

aledbf commented Sep 8, 2018

aledbf commented Sep 5, 2018 •

edited

Loading

slaupster commented Sep 5, 2018 •

edited

Loading