Cluster.Bootstrap causes network socket port exhaustion due to socket leak during cluster formation #2571

Arkatufus · 2024-06-17T16:12:11Z

It has been observed that Cluster.Bootstrap can cause network socket port exhaustion due to TCP protocol holding the socket port open in the WAIT_TIME linger state if Cluster.Bootstrap failed to form a cluster immediately.

This has been observed especially in conjunction with Akka.Discovery.Azure.

Arkatufus · 2024-06-17T19:41:38Z

Problem isolated to the Ceen.Httpd package. Socket usage was stable when Ceen was replaced with Kestrel.

PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 758
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 753
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 741
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 754
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 749
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 742
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 752
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 750
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 742
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 752
PS D:\git\akkadotnet\Akka.Management\src\discovery\examples\SocketLeakTest> .\netstat.ps1
Total number of open connections with local or foreign address port in the range 15885-16000: 760

Aaronontheweb · 2024-06-20T14:39:39Z

Update here since I've been investigating this using our test lab - we can reproduce this issue on Linux too, and it's making me think that the problem might be related to how aggressively we try to re-connect during cluster formation.

Cluster Fully Formed (20/20 nodes)

Only about ~20 active TCP connections per node, which makes sense - most of these are Akka.Remote, an OTLP exporter, and maybe a few others

Cluster Unable to Form (18/20 nodes)

About ~1100 active TCP connections per node. This looks like hyper-aggressive retries, not some kind of TCP handling issue.

Aaronontheweb · 2024-06-20T14:50:15Z

Another piece of evidence in favor of the "aggressive retries" theory of the case, look at the step function of active TCP connections when cluster formation does occur:

The oldest nodes have significantly more open TCP connections than the newer nodes that were started later during the deployment by Kubernetes. This looks more like a "Thundering Herd" problem rather than a resource leak.

Aaronontheweb · 2024-06-24T16:45:33Z

We did some more work on this over the weekend and captured more data from more experiments - the problem is definitely caused by how frequently Akka.Management's cluster bootstrapper is HTTP-polling its peers:

TCP Connectivity Data

1s interval - ~1100 connections per node

5s interval - ~260-280 connections per node

10s interval - ~100-105 connections per node

The key setting at play here is the akka.management.cluster.bootstrap.contact-point.probe-interval , which defaults to 1s. If we increase it to 5s we see a much smaller number of concurrent TCP connections.

Cluster Formation Times

`akka.management.cluster.bootstrap.contact-point.probe-interval = 1s`

Running a 22 node cluster using Akka.Discovery.KubernetesApi, we see the following end to end cluster formation times with akka.management.cluster.bootstrap.contact-point.probe-interval = 1s, the default. We also have a hard 20-nodes-must-be-up requirement configured for Cluster.Boostrap, so cluster formation can't occur until the 20th node has come online.

It takes about an average of 30s for a cluster to fully form - this is mostly due to the amount of time it takes Kubernetes to spin up all of the pods. The oldest nodes in the cluster have a longer average and the youngest ones have a shorter one, hence why you see this time distribution.

`akka.management.cluster.bootstrap.contact-point.probe-interval = 5s`

Same exact environment / reproduction sample as before, just with the probing interval set to 5s:

The cluster never forms, and this is apparently due to a bug in the logic around "timing out" the freshness of a node's last healthy check-in - the configuration we use for this setting is totally independent of the polling interval and that is a bug.

Next Steps

@Arkatufus already identified this issue and is preparing a fix for it. If that fix works, then the port exhaustion problems can be addressed by just increasing the probing interval. We are going to test this in our lab and confirm before making concrete recommendations to affected users. Just wanted to post an update to let everyone know that this is being urgently addressed.

Aaronontheweb · 2024-06-24T16:54:06Z

One other setting that can alleviate major stressors that contribute to this port exhaustion problem:

Akka.Management/src/management/Akka.Management/Resources/reference.conf

Lines 151 to 156 in d7812ff

    
           # Does a successful response have to be received by all contact points. 
        
           # Used by the LowestAddressJoinDecider 
        
           # Can be set to false in environments where old contact points may still be in service discovery 
        
           # or when using local discovery and cluster formation is desired without starting all the nodes 
        
           # Required-contact-point-nr still needs to be met 
        
           contact-with-all-contact-points = true

Set that to false and this will also significantly reduce the amount of TCP traffic. I'll put up some data supporting that in the next day or so as well. Changing this setting can, in theory, open the possibly of a split brain forming but IMHO that should be quite rate in practice.

Aaronontheweb · 2024-06-24T17:31:02Z

Related fix: #2589

Aaronontheweb · 2024-06-26T14:44:42Z

Should we just change the default polling interval to 5s - that should help put this issue to bed

Arkatufus added bug Something isn't working cluster-bootstrap akka-discovery labels Jun 17, 2024

Arkatufus mentioned this issue Jun 27, 2024

Update probe-interval and stale contact point timeout calculation #2601

Merged

Aaronontheweb closed this as completed in #2601 Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster.Bootstrap causes network socket port exhaustion due to socket leak during cluster formation #2571

Cluster.Bootstrap causes network socket port exhaustion due to socket leak during cluster formation #2571

Arkatufus commented Jun 17, 2024 •

edited

Loading

Arkatufus commented Jun 17, 2024

Aaronontheweb commented Jun 20, 2024

Aaronontheweb commented Jun 20, 2024

Aaronontheweb commented Jun 24, 2024

Aaronontheweb commented Jun 24, 2024

Aaronontheweb commented Jun 24, 2024

Aaronontheweb commented Jun 26, 2024

Cluster.Bootstrap causes network socket port exhaustion due to socket leak during cluster formation #2571

Cluster.Bootstrap causes network socket port exhaustion due to socket leak during cluster formation #2571

Comments

Arkatufus commented Jun 17, 2024 • edited Loading

Arkatufus commented Jun 17, 2024

Aaronontheweb commented Jun 20, 2024

Cluster Fully Formed (20/20 nodes)

Cluster Unable to Form (18/20 nodes)

Aaronontheweb commented Jun 20, 2024

Aaronontheweb commented Jun 24, 2024

TCP Connectivity Data

1s interval - ~1100 connections per node

5s interval - ~260-280 connections per node

10s interval - ~100-105 connections per node

Cluster Formation Times

akka.management.cluster.bootstrap.contact-point.probe-interval = 1s

akka.management.cluster.bootstrap.contact-point.probe-interval = 5s

Next Steps

Aaronontheweb commented Jun 24, 2024

Aaronontheweb commented Jun 24, 2024

Aaronontheweb commented Jun 26, 2024

Arkatufus commented Jun 17, 2024 •

edited

Loading

`akka.management.cluster.bootstrap.contact-point.probe-interval = 1s`

`akka.management.cluster.bootstrap.contact-point.probe-interval = 5s`