ELB IP changes can bring the cluster down #598

danielfm · 2017-04-26T21:03:28Z

I ran into kubernetes/kubernetes#41916 twice in the last 3 days in my production cluster, with almost 50% of worker nodes transitioning to NotReady state almost simultaneously in both days, causing a brief downtime in critical services due to Kubernetes default (and agressive) eviction policy for failing nodes.

I just contacted AWS support to validate the hypothesis of the ELB changing IPs at the time of both incidents, and the answer was yes.

My configuration (multi-node control plane with ELB) matches exactly the one in that issue, and probably most kube-aws users are subject to this.

Have anyone else ran into this at some point?

The text was updated successfully, but these errors were encountered:

mumoshu · 2017-04-27T01:32:21Z

@danielfm Thanks for the report.

I think I have never encountered the issue myself for my production cluster but anyway, I believe this is a BIG issue.

For me, the issue seems to be composed of two parts: one is a long TTL other than ELB's(for example a CNAME record's TTL in kube-aws, which points to ELB's DNS name) and the another is an issue in ELB and/or kubelet's which prevents kubelet to detect broken connection to apiservers.

Is my assumption correct?

Anyway, once recordSetTTL in cluster.yaml is set to considerably lower than 60s(=ELB's default TTL), kubelet should detect any dead long-polling connection to one of API servers (via an ELB CNAME -> ELB instances) and then re-discover ELB instances via DNS and reconnect to one of them.

However I'm happy to work-around the issue in kube-aws.

Possible work-arounds

Perhaps:

periodically monitoring the list of ips an route53 alias record for an ELB and then
restarting kubelet whenever the monitor detects the list to be changed

so that kubelet can reconnect to one of active ELB ips before it marks the k8s node NotReady?

If we go that way, the monitor should be executed periodically with an interval of (<node-monitor-grace-period> - <amount of time required for kubelet to start> - <amount of time required for each monitor run>) / <a heuristic value: N> - <amount of time required to report node status after kubelet started> at least, so that we can provide kubelet at most N chances to successfully report its status to possibly available ELB.

Or we can implement a DNS round robin with a health-checking mechanism for serving k8s API endpoints like suggested and described in #281 #373

mumoshu · 2017-04-27T03:00:15Z

This hash changes only when backend IPs of an ELB has been changed.

$ dig my.k8s.cluster.example.com +noall +answer +short | sort | sha256sum | awk '{print $1}'
96ff696e9a0845dd56b1764dd3cfaa1426cdfbd7510bab64983e97268f8e0bc4

$ dig my.k8s.cluster.example.com +noall +answer +short | sort
52.192.165.96
52.199.129.34
<elb id>.ap-northeast-1.elb.amazonaws.com.

mumoshu · 2017-04-27T05:26:13Z

@danielfm I guess setting recordSetTTL in cluster.yaml(which sets the TTL of a CNAME record associated to a controller ELB) to be a value considerably lower than that of ELB's DNS A records TTL(=60sec) and kubelet's --node-monitor-grace-period and running the following service would alleviate the issue.

/etc/systemd/system/elb-ip-change-watcher.service:

[Service]
Type=notify
Restart=on-failure
Environment=API_ENDPOINT_DNS_NAME={{.APIEndpoint.DNSName}}
ExecStart=/opt/bin/elb-ip-change-watcher

/opt/bin/elb-ip-change-watcher:

#!/usr/bin/env bash

set -vxe

current_elb_backends_version() {
  dig ${API_ENDPOINT_DNS_NAME:?Missing required env var API_ENDPOINT_DNS_NAME} +noall +answer +short | \
    # take into account only ips even if dig returned a CNAME answer(when API_ENDPOINT_DNS_NAME is a CNAME rather than an A(or Route 53's "Alias") record
    grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' | \
    # sort IPs so that DNS round-robin doesn't unexpectedly trigger a kubelet restart
    sort | \
    sha256sum | \
    # sha256sum returns outputs like "<sha256 hash value> -". We only need the hash value excluding the trailing hyphen
    awk '{print $1}'
}

run_once() {
  local file=$ELB_BACKENDS_VERSION_FILE
  prev_ver=$(cat $file || echo)
  current_ver=$(current_elb_backends_version)
  echo comparing the previous version "$prev_ver" and the current version "$current_ver"
  if [ "$prev_ver" == "" -o "$prev_ver" == "$current_ver" ]; then
    echo the version has not changed. no need to restart kubelet.
    if [ "$KUBELET_RESTART_STRATEGY" == "watchdog" ]; then
      echo "notifying kubelet's watchdog not to trigger a restart of kubelet..."
      local kubelet_pid
      kubelet_pid=$(systemctl show $KUBELET_SYSTEMD_UNIT_NAME -p MainPID | cut -d '=' -f 2)
      systemd-notify --pid=$kubelet_pid WATCHDOG=1
    fi
  else
    echo the version has been changed. need to restart kubelet.
    if [ "$KUBELET_RESTART_STRATEGY" == "systemctl" ]; then
      systemctl restart $KUBELET_SYSTEMD_UNIT_NAME
    fi
  fi
  echo writing $current_ver to $file
  echo "$current_ver" > $file
}

ELB_BACKENDS_VERSION_FILE=${ELB_BACKENDS_VERSION_FILE:-/var/run/coreos/elb-backends-version}
KUBELET_SYSTEMD_UNIT_NAME=${KUBELET_SYSTEMD_UNIT_NAME:-kubelet.service}
KUBELET_RESTART_STRATEGY=${KUBELET_RESTART_STRATEGY:-systemctl}
WATCH_INTERVAL_SEC=${WATCH_INTERVAL_SEC:-3}

systemd-notify --ready
while true; do
  systemd-notify --status "determining if there're changes in elb ips"
  run_once
  systemd-notify --status "sleeping for $WATCH_INTERVAL_SEC seconds"
  sleep $WATCH_INTERVAL_SEC
done

mumoshu · 2017-04-27T05:30:00Z

Also - I have never realized that but we're creating CNAME records - which has its own TTL(=300 seconds by default in kube-aws) other than ELB's A records' TTL - for controller ELBs.
https://github.com/kubernetes-incubator/kube-aws/blob/master/core/controlplane/config/templates/stack-template.json#L778
We'd better make them Route53 Alias records so that there will be only one TTL.

mumoshu · 2017-04-27T05:38:54Z

Updated my assumption on the issue in my first comment #598 (comment)

mumoshu · 2017-04-27T06:13:16Z

Hmm, a TTL for a CNAME record associated to ELB's DNS name seems to be capped at 60s, even though kube-aws' default recordSetTTL is set to 300s.
Then, we won't need to care about the balance of two TTLs(kube-aws CNAME and ELB's A) that much.

mumoshu · 2017-04-27T06:15:04Z

Anyway, to forcibly restart kubelet to reconnect apiserver(s) when necessary(=elb ips changed) would still be important.
The script and the systemd unit I've suggested in #598 (comment) would be useful to achieve it.

redbaron · 2017-04-27T06:41:04Z

@mumoshu does it mean that going ELB-less mode with just CNAME record in Route53 pointing to controller nodes is considered dangerous now?

mumoshu · 2017-04-27T06:46:32Z

@redbaron Do you mean a single CNAME record for your k8s API endpoint which points to just one controller node's DNS name?

I've been assuming that you would have a DNS name associated to one or more A records(rather than CNAME) each is associated to one of controller nodes public/private IP if you'd go without an ELB.

redbaron · 2017-04-27T06:48:58Z

Doesn't change the fact that final set of A records returned from DNS request change when you trigger controllers ASG update, right? From what I read here it matches the case when ELBs change their IPs

mumoshu · 2017-04-27T06:59:15Z

@redbaron Ah, so you have a CNAME record which points to a DNS record containing multiple A records each is associated to one of controller nodes, right?

If so, no, as long as you have low TTLs for your DNS records.

AFAIK, someone in the upstream issue said that it becomes an issue only when ELB is involved. Perhaps ELB doesn't immediately shutdown an unnecessary instance and doesn't send FIN? If that's the case, kubelet would be unable to detect broken connections immediately.

When your controllers ASG is updated, old, unnecessary nodes would be terminated before it becomes non-functional like ELB's instances.

So your ELB-less mode with CNAME+A records would be safe as long as you have health checks to update the route 53 record set to eventually return A records "only for healthy controller nodes", and you have a TTL lower than --node-monitor-grace-period

redbaron · 2017-04-27T07:02:40Z

ELB-less mode I refer to is a recent feature in kube-aws, I didn't check which records it creates exactly, just wanted to verify that is still a safe option to do considering this bug report.

mumoshu · 2017-04-27T07:05:26Z

@redbaron If you're referring to the DNS round-robin for API endpoints, it isn't implemented yet.

redbaron · 2017-04-27T11:58:28Z

I wonder if using ALB can help here. L7 load balancer precisely knows all incoming/outgoing requests and can forcibly yet safely close HTTP connection when ELB scales/changes IP addresses.

mumoshu · 2017-04-28T06:46:10Z

@redbaron Thanks, I never realized that ALBs may help. I wish they can help, too.
Although I'd like to hear from contributors in the upstream issue to kindly clarify whether ALBs help or not,
shall we take this opportunity to add support for ALBs into kube-aws anyway? 😃

camilb · 2017-04-28T14:40:09Z

Today happened to me for the first time with a 128 days old cluster.

danielfm · 2017-04-28T14:44:37Z

Seems like Amazon is doing some serious maintenance work in the ELB infrastructure these days...

mumoshu · 2017-05-09T01:03:06Z

@danielfm @camilb @redbaron For me it seems like there are still no obvious fixes other than:

Adding a dirty, periodic elb monitoring and kubelet restarting script of mine
Moving to Route 53 DNS round-robin based api endpoints (suggested by @tarvip in AWS: Node becomes NodeNotReady without logged reason kubernetes/kubernetes#41916 (comment) and Allow to choose between ELB and Route53 round robin for the APIServer #281)
- A major challenge to include this into kube-aws is how to keep Route 53 records sets to be up-to-date with the running controller nodes at the time. Rolling-updates of controller nodes, sudden termination of controller nodes, unresponsive apiserver containers, etc., should ideally be considered while syncing record sets.

Would you like to proceed with any of them or any other idea(s)?

tarvip · 2017-05-09T06:03:10Z

Rolling-updates of controller nodes, sudden termination of controller nodes, unresponsive apiserver containers, etc

I think sudden termination, unresponsive apiserver and etc shouldn't cause problems as long as these events are not happening at the same time, but in this case even ELB setup fails.

But keeping route53 record up-to-date is a bit more complicated without using Lambda and SNS.
When adding/restarting or replacing(terminating and recreating) controller nodes we could have unit in cloud-config that is executed when host comes up, this script gets all controller nodes that are part of ASG and modifies route53 record accordingly. But I don't know how to handle node removal (without adding new node back, although I think this is not happening very often).

Also I think it shouldn't be cause problems if route53 record is not fully up-to-date when restarting or replacing controller nodes, as long they are not replaced at the same time there should be at least one working controller.

mumoshu · 2017-05-09T06:07:19Z

Thanks @tarvip!

think sudden termination, unresponsive apiserver and etc shouldn't cause problems as long as these events are not happening at the same time, but in this case even ELB setup fails.

Sorry if I wasn't clear enough but the above concerns are not specific to this problem but more general. I just didn't want to introduce a new issue due to missing health checks / constant updates to route 53 record sets.

mumoshu · 2017-05-09T06:11:35Z

Also I think it shouldn't be cause problems if route53 record is not fully up-to-date when restarting or replacing controller nodes, as long they are not replaced at the same time there should be at least one working controller.

Probably that's true - then, my question is if everyone is ok with e.g. 50% k8s api error rate persists for several minutes when one of two controller nodes are being replaced?

mumoshu · 2017-05-09T06:25:30Z

But I don't know how to handle node removal (without adding new node back, although I think this is not happening very often).

If we go without cloudwatch events + lambda, we probably need a systemd timer which periodically triggers a script to update a route53 record set so that it periodically start reflecting controller nodes terminated either expectedly or unexpectedly, right?

redbaron · 2017-05-09T06:35:35Z

@mumoshu , is ALB known not to help here?

tarvip · 2017-05-09T06:37:48Z

I just didn't want to introduce a new issue due to missing health checks / constant updates to route 53 record sets.

I think missing health check is ok, that health check is just TCP check anyway, kubelet and kube-proxy can also detect connection failure and recreate connection to another host.
Regarding constant updates, I guess we can perform update only if current state is different compared to route53.

mumoshu · 2017-05-09T06:37:52Z

Thanks @redbaron - No but it isn't known to help either.

tarvip · 2017-05-09T06:39:11Z

If we go without cloudwatch events + lambda, we probably need a systemd timer which periodically triggers a script to update a route53 record set so that it periodically start reflecting controller nodes terminated either expectedly or unexpectedly, right?

Yes, that is one way to solve it.

mumoshu · 2017-05-09T06:39:25Z

@tarvip Thanks for the reply!

I think missing health check is ok, that health check is just TCP check anyway, kubelet and kube-proxy can also detect connection failure and recreate connection to another host.

Ah, makes sense to me! At least it seems to worth trying for me now - thanks for the clarification.

whereisaaron · 2017-10-09T23:29:42Z

AWS's NLB replacement for ELB's using one IP per zone/subnet and those IP be your EIP's. So using this new product you can get a LB with a set of fixed IPs that won't change.

http://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html

AWS new ALB (for HTTPS) and NLB (for TCP) seem to AWS's next-gen replacement for the old ELB, which AWS now calls 'Classic Load Balancers'. k8s and kube-aws should probably look to transition to the new products, which also appear have some advantages, such as fixed IP's - as I see #937 and #945 are doing! 🎉

mumoshu · 2017-10-10T08:40:47Z

@whereisaaron Thanks for the suggestion! I agree with your point. Anyway, please let me also add that ALB was experimented in #608 and decided as not appropriate for an K8S API load balancer.

rodawg · 2017-11-06T23:32:39Z

Unfortunately NLBs don't support VPC Peering on AWS, so some users (including me) will need to use Classic ELBs in conjunction with NLBs to support kubectl commands.

stephbu · 2017-11-12T21:10:11Z

Yes we see this today in production, and experienced player impact yesterday from this exact issue.
Working with AWS support we reproduced the issue by forcing a scaledown on the API ELB for one of our integration clusters. All worker nodes went stale and workloads were evicted before nodes recovered at the 15min mark after the scaling event.

We confirmed that the DNS was updated almost immediately. We're going with the Kubelet restart tied to DNS change for the time being, but IMHO this is not a good long-term fix.

javefang · 2017-11-15T11:48:49Z

Seen this today. Our set up use Consul DNS for kubelet to discover the apiserver, which means the apiserver DNS name are multiple A-record pointing to the exact IP addresses of the apiservers, which changes every time an apiserver node is replaced.

In our case the workers come back eventually but it took a long while. My feeling is kubelet is not really respecting DNS TTLs as all Consul DNS names have TTL set to 0. Can anyone confirm?

mumoshu · 2017-11-15T12:29:09Z

Thanks everyone.
At this point, would the only possible, universal work-around be the one shared by @ROYBOTNIK?
// At least mine won't work with @javefang's case of course.

mumoshu · 2017-11-15T12:30:58Z

I was in the impression that since some k8s version kubelet has implemented the clide-side timeout to mitigate this issue, but can't remember the exact github issue right now.

javefang · 2017-11-15T13:26:25Z

I noticed that after the master DNS record changed the underlying IP, all kubelet instances fail for exactly 15min. (Our master DNS TTL is 0). When it fails we get the following error.

Nov 15 13:08:08 dev-kubeworker-gen-0 kubelet[11954]: E1115 13:08:08.638348   11954 kubelet_node_status.go:390] Error updating node status, will retry: error getting node "dev-kubeworker-gen-0": Get https://apiserver-gen.service.dev.winton.consul/api/v1/nodes/dev-kubeworker-gen-0: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

It recovered by its own without restarting after 15min (sharp). It feels more like kubelet (or the apiserver client used) is caching the DNS. I'm trying to pin-point the exact line of code which caused this behaviour. But anyone know the code-base better might be able to confirm this?

javefang · 2017-11-15T14:01:13Z

Seeing the following messages right before the worker came back. Last failure was at 13:15:18, then it reported some watch error (10.106.102.105 was the previous master which got destroyed) and re-resolved the DNS name before the cluster report the worker as "Ready" again! Maybe this is related to kubelet watch on apiserver not being dropped quick enough when the apiserver endpoint becomes unavailable?

Nov 15 13:15:16 dev-kubeworker-gen-0 kubelet[11954]: I1115 13:15:16.994725   11954 qos_container_manager_linux.go:320] [ContainerManager]: Updated QoS cgroup configuration
Nov 15 13:15:18 dev-kubeworker-gen-0 kubelet[11954]: E1115 13:15:18.670478   11954 kubelet_node_status.go:390] Error updating node status, will retry: error getting node "dev-kubeworker-gen-0": Get https://apiserver-gen.service.dev.test.consul/api/v1/nodes/dev-kubeworker-gen-0: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Nov 15 13:15:22 dev-kubeworker-gen-0 kubelet[11954]: E1115 13:15:22.640287   11954 streamwatcher.go:109] Unable to decode an event from the watch stream: read tcp 10.106.102.104:49178->10.106.102.105:443: read: no route to host
Nov 15 13:15:22 dev-kubeworker-gen-0 kubelet[11954]: E1115 13:15:22.640410   11954 kubelet_node_status.go:390] Error updating node status, will retry: error getting node "dev-kubeworker-gen-0": Get https://apiserver-gen.service.dev.test.consul/api/v1/nodes/dev-kubeworker-gen-0: read tcp 10.106.102.104:49178->10.106.102.105:443: read: no route to host
Nov 15 13:15:22 dev-kubeworker-gen-0 kubelet[11954]: E1115 13:15:22.640943   11954 streamwatcher.go:109] Unable to decode an event from the watch stream: read tcp 10.106.102.104:49178->10.106.102.105:443: read: no route to host
Nov 15 13:15:22 dev-kubeworker-gen-0 kubelet[11954]: E1115 13:15:22.641445   11954 streamwatcher.go:109] Unable to decode an event from the watch stream: read tcp 10.106.102.104:49178->10.106.102.105:443: read: no route to host
Nov 15 13:15:22 dev-kubeworker-gen-0 kubelet[11954]: W1115 13:15:22.663747   11954 reflector.go:334] k8s.io/kubernetes/pkg/kubelet/kubelet.go:413: watch of *v1.Service ended with: too old resource version: 1883 (16328)
Nov 15 13:15:22 dev-kubeworker-gen-0 kubelet[11954]: W1115 13:15:22.665010   11954 reflector.go:334] k8s.io/kubernetes/pkg/kubelet/kubelet.go:422: watch of *v1.Node ended with: too old resource version: 16145 (16328)
Nov 15 13:15:22 dev-kubeworker-gen-0 kubelet[11954]: W1115 13:15:22.665806   11954 reflector.go:334] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: watch of *v1.Pod ended with: too old resource version: 5602 (16345)

Found a possible line of code which explains the 15min behaviour

https://github.com/kubernetes/kubernetes/blob/fc8bfe2d8929e11a898c4557f9323c482b5e8842/pkg/kubelet/kubeletconfig/watch.go#L44

whereisaaron · 2017-11-15T22:30:46Z

It seems like there is a problem. If the controller DNS entry has a 30 second TTL, the kubelet should be able to recover from an IP change within 30s + update period, so about 40s. @javefang you think the kubelet is using this long, up to 15 minute back-off when the old IP goes stale? And so not a DNS caching problem, but rather it just stops trying to update the controller for several minutes?

For AWS at least, an NLB using fixed EIP addresses would mostly obviate the IP address every changing I think? Even if you recreate or move the LB, you can reapply the EIP so nothing changes. However an extra wrinkle is we would want worker nodes in multi-AZ clusters to use the EIP for the NLM endpoint in the same AZ. NLB's have one EIP per AZ as I understand it?

We saw a similar issue a couple time where the workers couldn't contact the controllers for ~2 minutes (no IP address change involved). Even though well less than the 5 minutes eviction time, everything got evicted anyway. Maybe the same back-off issue?

javefang · 2017-11-15T22:44:37Z

@whereisaaron yep this is indeed taking 15min for kubelet to recover. I have reproduced it with the following setup:

OS: Centos 7.4 (SELinux on)
Docker: 1.12.6
K8S: 1.8.3
Apiserver: 3 instances running on separate VMs
Apiserver DNS names: all 3 registered as Consul Services, which does DNS round-robin for them (dig apiserver.service.consul will show 3 IPs, pointing to the VMs running the apiserver)

To reproduce:

Destroy the VM running apiserver 1
Create a new VM to replace (this will get a different IP)
30% of the worker nodes goes into "NotReady" state, kubelet prints error message kubelet_node_status.go:390] Error updating node status, will retry: error getting node "dev-kubeworker-gen-0": Get https://apiserver-gen.service.dev.winton.consul/api/v1/nodes/dev-kubeworker-gen-0: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Repeat 1-3 for the other 2 apiservers
Now all workers should be in "NotReady" state
Wait 15min, kubelet on workers print the unable to decode an event from the watch stream and read: no route to host message before coming back to "Ready" state

I'm just curious about the mechanism in kubelet that can cause kubelet to be broken for 15min after any apiserver IP changes. We are deploying this on-premise. Tomorrow I'll try to put the 3 apiservers behind a load balancer with fixed IP to see if that fixes the issue.

javefang · 2017-11-17T23:54:53Z

UPDATE: putting all apiservers behind a load balancer (round-robin) with a static IP fixed it. Now all workers work fine even if I replace one of the master node. So using fixed IP load balancer would be my workaround for now. But do you think it's still worth investigating by kubelet doesn't respect apiserver's DNS TTL?

RyPeck · 2017-12-15T20:14:48Z

I believe the 15 minute window break many of us are experiencing is described in kubernetes/kubernetes#41916 (comment). Reading through issues and pull requests, I don't see where a TCP Timeout was implemented on the underlying connection. The timeout on the HTTP request definitely was implemented.

…-connections Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. track/close kubelet->API connections on heartbeat failure xref kubernetes#48638 xref kubernetes-retired/kube-aws#598 we're already typically tracking kubelet -> API connections and have the ability to force close them as part of client cert rotation. if we do that tracking unconditionally, we gain the ability to also force close connections on heartbeat failure as well. it's a big hammer (means reestablishing pod watches, etc), but so is having all your pods evicted because you didn't heartbeat. this intentionally does minimal refactoring/extraction of the cert connection tracking transport in case we want to backport this * first commit unconditionally sets up the connection-tracking dialer, and moves all the cert management logic inside an if-block that gets skipped if no certificate manager is provided (view with whitespace ignored to see what actually changed) * second commit plumbs the connection-closing function to the heartbeat loop and calls it on repeated failures follow-ups: * consider backporting this to 1.10, 1.9, 1.8 * refactor the connection managing dialer to not be so tightly bound to the client certificate management /sig node /sig api-machinery ```release-note kubelet: fix hangs in updating Node status after network interruptions/changes between the kubelet and API server ```

frankconrad · 2018-09-22T22:14:43Z

All there work around the real problem, that the connections are keep forever.
If we limit the livetime of the connection by time the problem would be not happened. Or at lest by nr or handled requests.
Also we would get a better load distribution, because new connections allow loadblancer todo new distribution.

liggitt · 2018-09-22T22:16:33Z

the connections are keep forever. If we limit the livetime or an connection by time the problem would be not happened.

They don't live forever, they live for the operating system TCP timeout limit (typically 15 minutes by default)

danielfm · 2018-09-22T22:20:40Z

I haven't seen this happening anymore in some of the latest versions of Kubernetes 1.8.x (and I suspect the same is true for newer versions as well), so maybe we can close this?

frankconrad · 2018-09-22T22:26:59Z

Yes and this 15 min are to long for many cases, like here.
The dead connection from elb/alb when there get terminated after there are 6 days depricated, mean not visible in dns any more.
If we would reconnect every hour (or 10 min) we would not have the problem. And would get as site effect better load distribution. But still would have all benefits from keepalive.
What have done here is workaround the real problem, that no dynamic cloud based loadblancer can good handle long live connections good.
The problem need to fixed on http connection handling pooling too, as the higher level there is no real influence of connection resuse if you use pool feature.

liggitt · 2018-09-22T22:29:15Z

The fix merged into the last several releases of kubernetes was to drop/reestablish the apiserver connections from the kubelet if the heartbeat times out twice in a row. Reconnecting every 10 minutes or every hour would still let nodes go unavailable.

frankconrad · 2018-09-22T22:47:00Z

What seen in other go projects, if you use pooling and frequently sent request that keepalive idle timeout get not reached you run into this issue. If you disable pooling and make only one request per connection, you have not that issue. But higher latency and overhead, this why keepalive make sense.

By the way, the old Apache httpd had not only keepalive idle timeout but also keepalive max request count. Which helped a lot in many of this problems.

fejta-bot · 2019-04-25T17:10:46Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-05-25T17:52:37Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-06-24T18:43:18Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-06-24T18:43:25Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mumoshu mentioned this issue Apr 28, 2017

ALB as a k8s API endpoint load balancer #608

Closed

jfoy mentioned this issue May 1, 2017

Allow kubelet to set HTTP Keep-Alive client header for apiserver connections kubernetes/kubernetes#45174

Closed

mumoshu added the awaiting reply label May 9, 2017

RyPeck mentioned this issue Dec 15, 2017

Eliminate hangs/throttling of node heartbeat kubernetes/kubernetes#52176

Merged

liggitt mentioned this issue May 14, 2018

track/close kubelet->API connections on heartbeat failure kubernetes/kubernetes#63492

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 25, 2019

k8s-ci-robot closed this as completed Jun 24, 2019

ELB IP changes can bring the cluster down #598

ELB IP changes can bring the cluster down #598

Comments

danielfm commented Apr 26, 2017 • edited Loading

mumoshu commented Apr 27, 2017 • edited Loading

Possible work-arounds

mumoshu commented Apr 27, 2017

mumoshu commented Apr 27, 2017 • edited Loading

mumoshu commented Apr 27, 2017 • edited Loading

mumoshu commented Apr 27, 2017

mumoshu commented Apr 27, 2017

mumoshu commented Apr 27, 2017 • edited Loading

redbaron commented Apr 27, 2017

mumoshu commented Apr 27, 2017

redbaron commented Apr 27, 2017

mumoshu commented Apr 27, 2017

redbaron commented Apr 27, 2017

mumoshu commented Apr 27, 2017

redbaron commented Apr 27, 2017

mumoshu commented Apr 28, 2017

camilb commented Apr 28, 2017

danielfm commented Apr 28, 2017

mumoshu commented May 9, 2017

tarvip commented May 9, 2017

mumoshu commented May 9, 2017

mumoshu commented May 9, 2017

mumoshu commented May 9, 2017

redbaron commented May 9, 2017

tarvip commented May 9, 2017

mumoshu commented May 9, 2017

tarvip commented May 9, 2017

mumoshu commented May 9, 2017

whereisaaron commented Oct 9, 2017 • edited Loading

mumoshu commented Oct 10, 2017

rodawg commented Nov 6, 2017

stephbu commented Nov 12, 2017 • edited Loading

javefang commented Nov 15, 2017

mumoshu commented Nov 15, 2017

mumoshu commented Nov 15, 2017

javefang commented Nov 15, 2017

javefang commented Nov 15, 2017 • edited Loading

whereisaaron commented Nov 15, 2017 • edited Loading

javefang commented Nov 15, 2017

javefang commented Nov 17, 2017 • edited Loading

RyPeck commented Dec 15, 2017

frankconrad commented Sep 22, 2018 • edited Loading

liggitt commented Sep 22, 2018

danielfm commented Sep 22, 2018 • edited Loading

frankconrad commented Sep 22, 2018 • edited Loading

liggitt commented Sep 22, 2018

frankconrad commented Sep 22, 2018 • edited Loading

fejta-bot commented Apr 25, 2019

fejta-bot commented May 25, 2019

fejta-bot commented Jun 24, 2019

k8s-ci-robot commented Jun 24, 2019

danielfm commented Apr 26, 2017 •

edited

Loading

mumoshu commented Apr 27, 2017 •

edited

Loading

mumoshu commented Apr 27, 2017 •

edited

Loading

mumoshu commented Apr 27, 2017 •

edited

Loading

mumoshu commented Apr 27, 2017 •

edited

Loading

whereisaaron commented Oct 9, 2017 •

edited

Loading

stephbu commented Nov 12, 2017 •

edited

Loading

javefang commented Nov 15, 2017 •

edited

Loading

whereisaaron commented Nov 15, 2017 •

edited

Loading

javefang commented Nov 17, 2017 •

edited

Loading

frankconrad commented Sep 22, 2018 •

edited

Loading

danielfm commented Sep 22, 2018 •

edited

Loading

frankconrad commented Sep 22, 2018 •

edited

Loading

frankconrad commented Sep 22, 2018 •

edited

Loading