Network connection leak #4679

arkpar · 2020-01-20T10:06:56Z

Currently substrate is based on libp2p 0.14.0-alpha.1 which has a few issues:

Network opens a lot of TCP connections. After running for a few hours on kusama the number of connections reported by lsof is over 2000.
Each connection preallocates at least 176 kbytes of memory for buffers in the noise protocol.
Which leads to memory usage in hundreds of megabytes.
Each connection spawns a short-lived thread, which leads to a leak in sysinfo crate.

Each of these should be fixed as soon as possible. Substrate already has a limit for protocol connections. That limit must be enforced as a global limit for TCP connections, including Kademlia and all other protocols.

All buffers should be preallocated to some small size, like 1k. The rest should be allocated lazily.

The text was updated successfully, but these errors were encountered:

twittner · 2020-01-20T11:17:16Z

Network opens a lot of TCP connections. After running for a few hours on kusama the number of connections reported by lsof is over 2000.

If I am not mistaken most of those connections are not established yet, e.g.

> ls -l /proc/(pgrep polkadot)/fd | wc -l
1407
> lsof -n -p (pgrep polkadot) | rg ESTABLISHED | wc -l
185
> lsof -n -p (pgrep polkadot) | rg SYN_SENT | wc -l
1134

Each connection spawns a short-lived thread, which leads to a leak in sysinfo crate.

These threads are actually not as short lived as one would hope as for unreachable destinations each runs until the maximum number of initial SYN retries has been reached (cf. man 7 tcp). E.g. on a Linux system:

tcp_syn_retries (integer; default: 5; since Linux 2.2)
    The  maximum  number  of times initial SYNs for an active TCP
    connection attempt will be retransmitted. This value should not
    be higher than 255.  The default value is 5, which corresponds
    to approximately 180 seconds.

arkpar · 2020-01-20T11:30:20Z

If I am not mistaken most of those connections are not established yet, e.g.

This might be the case. However I've also put a counter of NoiseOuput instances that went to thousands after a few hours as well. It is not clear then why a connection that is not established yet requires preallocated buffers. And also why there are that many connection attempts in the first place.

These threads are actually not as short lived as one would hope as for unreachable destinations each runs until the maximum number of initial SYN retries has been reached (cf. man 7 tcp). E.g. on a Linux system:
tcp_syn_retries (integer; default: 5; since Linux 2.2)
    The  maximum  number  of times initial SYNs for an active TCP
    connection attempt will be retransmitted. This value should not
    be higher than 255.  The default value is 5, which corresponds
    to approximately 180 seconds.

By short-lived I mean any thread is not kept around for the lifetime of the application. Anyway that's enough to cause issues with sysinfo and generally waste system resources.

twittner · 2020-01-20T14:42:04Z

If I am not mistaken most of those connections are not established yet, e.g.

This might be the case. However I've also put a counter of NoiseOuput instances that went to thousands after a few hours as well. It is not clear then why a connection that is not established yet requires preallocated buffers.

The buffers are not preallocated but require an established connection. I was using twittner/rust-libp2p@fdd5bb5 to count the number of NoiseOutput instances and it correlates strongly with the number of established connections, not the ones in SYN_SENT state.

And also why there are that many connection attempts in the first place.

I agree, this is puzzling. I suppose there are many more nodes discovered, but I have no answer to this yet.

twittner · 2020-01-21T10:01:24Z

And also why there are that many connection attempts in the first place.

I agree, this is puzzling. I suppose there are many more nodes discovered, but I have no answer to this yet.

I think that during discovery we repeatedly attempt to connect to many unreachable addresses which is not too uncommon. Discovery is scheduled to run every 60s. When queries time out (after 60s), the background threads continue to run until std::net::TcpStream::connect completes, which is roughly after 3 minutes. Thus instead of just one we may end up with up to three connection attempts for every unreachable address at any given time.

twittner · 2020-02-05T09:23:57Z

paritytech/polkadot#810 switched to async-std's master branch which fixes the spawning of a background thread for TcpStream::connect. In the meantime this version has also been published as async-std v1.5.

tomaka · 2020-02-05T16:27:40Z

Network opens a lot of TCP connections. After running for a few hours on kusama the number of connections reported by lsof is over 2000.

I've been debugging this today. It's going rather slowly, as I have to recompile Polkadot every single time.

I created a version of Polkadot where the Kademlia discovery mechanism is running at a quick rate (10 seconds instead of 60 seconds), but the results of discovery are not transmitted to the PSM.
The result is that we constantly stay between 7 and 9 TCP connections (one per bootnode, plus a few extra connections from time to time for Kademlia).

However, when the results of discovery are transmitted to the PSM but the substrate-specific protocol is prevented from opening new connections, the number of TCP connections explodes.

The first test confirms that Kademlia alone works properly and isn't responsible for this explosion.
The second test shows that the problem comes from connections that are maintained alive by the PSM, rather than new connections being open.

tomaka · 2020-02-05T17:35:09Z

Another test where discovery is normal but the PSM is modified to reject all connections and never open new ones: connections explode.
This is weird, but is a good clue.

tomaka · 2020-02-06T18:31:13Z

I'm still not sure how we reach 2000 connections, but if we increase the reputation penalty for not being responsive, the number of TCP connections is much closer to the number of peers we're connected with for Substrate.

substrate/client/peerset/src/lib.rs

Line 32 in 6e0a487

const DISCONNECT_REPUTATION_CHANGE: i32 = -10;

For instance, with a value of -256, and 18 connected peers, I'm between 20 and 30 established TCP connections (including the one for the telemetry), and there's no explosion.

It's however in general difficult to reach peers (regardless of this issue) because many nodes are unresponsive.

EDIT: with 25 peers, around 34 to 44 connections. It's still a bit weird.

EDIT: 27 peers. 29 to 46 connections.

EDIT: 26 peers. 41 to 56 connections. It's really weird.
The difficulty of the diagnosis comes from the fact that once Kademlia opens a connection, then Substrate is free to use it for Substrate-related purposes (and vice-versa).
Additionally, the DHT is quite polluted by obsolete addresses and/or nodes of other chains, which makes lots of connection attempts fail.

EDIT: 35 peers. 45 to 62 connections.

This PR refactors the metrics measuring and Prometheus exposing entity in sc-service into its own submodule and extends the parameters it exposes by: - system load average (over one, five and 15min) - the TCP connection state of the process (lsof), refs #5304 - number of tokio threads - number of known forks - counter for items in each unbounded queue (with internal unbounded channels) - number of file descriptors opened by this process (*nix only at this point) - number of system threads (*nix only at this point) refs #4679 Co-authored-by: Max Inden <mail@max-inden.de> Co-authored-by: Ashley <ashley.ruglys@gmail.com>

tomaka · 2020-04-09T09:06:58Z

The large number of connections has been fixed by #4889 and #5086, and after #5571 we now have Prometheus metrics that lets us track this, meaning that we should be able to notice if that happens again.

arkpar added the I3-bug The node fails to follow expected behavior. label Jan 20, 2020

gnunicorn added this to the 2.0 milestone Jan 20, 2020

This was referenced Feb 11, 2020

Increase the reputation penality for being offline #4889

Merged

Update libp2p to 0.16 #4928

Merged

gavofyork mentioned this issue Mar 2, 2020

Responsible Memory Usage paritytech/polkadot-sdk#562

Open

gavofyork mentioned this issue Mar 18, 2020

Prepare for Polkadot launch and Substrate 2.0 freeze #4961

Closed

24 tasks

gnunicorn mentioned this issue Mar 18, 2020

Track additional metrics around resource consumption #5304

Closed

4 tasks

gnunicorn mentioned this issue Mar 26, 2020

Additional Metrics collected and exposed via prometheus #5414

Merged

8 tasks

tomaka closed this as completed Apr 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network connection leak #4679

Network connection leak #4679

arkpar commented Jan 20, 2020 •

edited

Loading

twittner commented Jan 20, 2020

arkpar commented Jan 20, 2020

twittner commented Jan 20, 2020

twittner commented Jan 21, 2020

twittner commented Feb 5, 2020

tomaka commented Feb 5, 2020 •

edited

Loading

tomaka commented Feb 5, 2020 •

edited

Loading

tomaka commented Feb 6, 2020 •

edited

Loading

tomaka commented Apr 9, 2020 •

edited

Loading

Network connection leak #4679

Network connection leak #4679

Comments

arkpar commented Jan 20, 2020 • edited Loading

twittner commented Jan 20, 2020

arkpar commented Jan 20, 2020

twittner commented Jan 20, 2020

twittner commented Jan 21, 2020

twittner commented Feb 5, 2020

tomaka commented Feb 5, 2020 • edited Loading

tomaka commented Feb 5, 2020 • edited Loading

tomaka commented Feb 6, 2020 • edited Loading

tomaka commented Apr 9, 2020 • edited Loading

arkpar commented Jan 20, 2020 •

edited

Loading

tomaka commented Feb 5, 2020 •

edited

Loading

tomaka commented Feb 5, 2020 •

edited

Loading

tomaka commented Feb 6, 2020 •

edited

Loading

tomaka commented Apr 9, 2020 •

edited

Loading