Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster.Bootstrap causes network socket port exhaustion due to socket leak during cluster formation #2571

Closed
Arkatufus opened this issue Jun 17, 2024 · 7 comments · Fixed by #2601
Labels

Comments

@Arkatufus
Copy link
Contributor

Arkatufus commented Jun 17, 2024

It has been observed that Cluster.Bootstrap can cause network socket port exhaustion due to TCP protocol holding the socket port open in the WAIT_TIME linger state if Cluster.Bootstrap failed to form a cluster immediately.

This has been observed especially in conjunction with Akka.Discovery.Azure.

@Arkatufus
Copy link
Contributor Author

Problem isolated to the Ceen.Httpd package. Socket usage was stable when Ceen was replaced with Kestrel.

PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 758
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 753
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 741
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 754
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 749
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 742
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 752
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 750
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 742
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 752
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760

@Aaronontheweb
Copy link
Member

Update here since I've been investigating this using our test lab - we can reproduce this issue on Linux too, and it's making me think that the problem might be related to how aggressively we try to re-connect during cluster formation.

Cluster Fully Formed (20/20 nodes)

Only about ~20 active TCP connections per node, which makes sense - most of these are Akka.Remote, an OTLP exporter, and maybe a few others

lab-experiment-1-control

Cluster Unable to Form (18/20 nodes)

About ~1100 active TCP connections per node. This looks like hyper-aggressive retries, not some kind of TCP handling issue.

lab-experiment-1

@Aaronontheweb
Copy link
Member

Another piece of evidence in favor of the "aggressive retries" theory of the case, look at the step function of active TCP connections when cluster formation does occur:

image

The oldest nodes have significantly more open TCP connections than the newer nodes that were started later during the deployment by Kubernetes. This looks more like a "Thundering Herd" problem rather than a resource leak.

@Aaronontheweb
Copy link
Member

We did some more work on this over the weekend and captured more data from more experiments - the problem is definitely caused by how frequently Akka.Management's cluster bootstrapper is HTTP-polling its peers:

TCP Connectivity Data

1s interval - ~1100 connections per node

image

5s interval - ~260-280 connections per node

image

10s interval - ~100-105 connections per node

image

The key setting at play here is the akka.management.cluster.bootstrap.contact-point.probe-interval , which defaults to 1s. If we increase it to 5s we see a much smaller number of concurrent TCP connections.

Cluster Formation Times

akka.management.cluster.bootstrap.contact-point.probe-interval = 1s

Running a 22 node cluster using Akka.Discovery.KubernetesApi, we see the following end to end cluster formation times with akka.management.cluster.bootstrap.contact-point.probe-interval = 1s, the default. We also have a hard 20-nodes-must-be-up requirement configured for Cluster.Boostrap, so cluster formation can't occur until the 20th node has come online.

image

It takes about an average of 30s for a cluster to fully form - this is mostly due to the amount of time it takes Kubernetes to spin up all of the pods. The oldest nodes in the cluster have a longer average and the youngest ones have a shorter one, hence why you see this time distribution.

akka.management.cluster.bootstrap.contact-point.probe-interval = 5s

Same exact environment / reproduction sample as before, just with the probing interval set to 5s:

image

The cluster never forms, and this is apparently due to a bug in the logic around "timing out" the freshness of a node's last healthy check-in - the configuration we use for this setting is totally independent of the polling interval and that is a bug.

Next Steps

@Arkatufus already identified this issue and is preparing a fix for it. If that fix works, then the port exhaustion problems can be addressed by just increasing the probing interval. We are going to test this in our lab and confirm before making concrete recommendations to affected users. Just wanted to post an update to let everyone know that this is being urgently addressed.

@Aaronontheweb
Copy link
Member

One other setting that can alleviate major stressors that contribute to this port exhaustion problem:

# Does a successful response have to be received by all contact points.
# Used by the LowestAddressJoinDecider
# Can be set to false in environments where old contact points may still be in service discovery
# or when using local discovery and cluster formation is desired without starting all the nodes
# Required-contact-point-nr still needs to be met
contact-with-all-contact-points = true

Set that to false and this will also significantly reduce the amount of TCP traffic. I'll put up some data supporting that in the next day or so as well. Changing this setting can, in theory, open the possibly of a split brain forming but IMHO that should be quite rate in practice.

@Aaronontheweb
Copy link
Member

Related fix: #2589

@Aaronontheweb
Copy link
Member

Should we just change the default polling interval to 5s - that should help put this issue to bed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants