Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Configuration options for upstream keepalive requests and timeout #3099

Closed
ElvinEfendi opened this issue Sep 15, 2018 · 6 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@ElvinEfendi
Copy link
Member

ElvinEfendi commented Sep 15, 2018

Nginx 1.15.3 introduced keepalive_requests and keepalive_timeout for ngx_http_upstream_module. We currently have upstream-keepalive-connections to configure the number of keepalive connections to upstream. We should also support these two new directive.
One of the use cases is when the client keepalive timeout in the backend is less than the time Nginx waits for upstream keepalive connections. In this scenario the upstream can close the connection but Nginx would still think it's open and proxy a request through it, which result in "Connection reset by peer" error. https://theantway.com/2017/11/analyze-connection-reset-error-in-nginx-upstream-with-keep-alive-enabled/ has more info on this.

NGINX Ingress controller version:

Kubernetes version (use kubectl version):

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know:

@JordanP
Copy link
Contributor

JordanP commented Sep 19, 2018

I am not sure If that could fix what I am experiencing in prod (Nginx behind a Google Cloud LB), but I have a lot of 502 errors (in the "stackdriver logging" console) with the following reason "backend-connection-closed-before-data-sent-to-client". I read and applied everything mentioned in this article: https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340 (section 3. NGINX timeouts: fix a nasty 502 Bad Gateway race condition) but still I got many (2/3 per s) 502 errors

@ElvinEfendi
Copy link
Member Author

I’d be surprised if these directives helped with that case since these are for upstream keepalive (in your case upstream is nginx).

But the issue sounds like the same just between GCLB and Nginx. I’d double check and make sure timeout in gclb is less than configured timeout in nginx.

Does it correlate with ingress-nginx deploys?

@JordanP
Copy link
Contributor

JordanP commented Sep 20, 2018

I have a couple of high load microservices that use the "gce-ingress" and don't experience this issue. Yeah I'd love this to be a misconfig but so far nothing pops out, I just wanted to see if any body else experienced this issue. Anyway, I now get that this feature request is for upstream servers, not clients, thanks.

@ElvinEfendi
Copy link
Member Author

@JordanP for downstream connections you can already do it using https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#keep-alive

@aledbf aledbf added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 8, 2018
@diazjf
Copy link

diazjf commented Oct 10, 2018

@ElvinEfendi I can add this via a ConfigMap change.

@ElvinEfendi
Copy link
Member Author

@diazjf that would be great!

diazjf pushed a commit to diazjf/ingress that referenced this issue Oct 12, 2018
Allows Upstream Keepalive values like keepalive_timeout and
keepalive_requests to be configured via ConfigMap.

Fixes kubernetes#3099
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants