Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Too many TCP connections open when authority-discovery is enabled #5612

Closed
tomaka opened this issue Apr 12, 2020 · 5 comments
Closed

Too many TCP connections open when authority-discovery is enabled #5612

tomaka opened this issue Apr 12, 2020 · 5 comments

Comments

@tomaka
Copy link
Contributor

tomaka commented Apr 12, 2020

While we are not "leaking" TCP connections anymore, the number of connections doesn't go below 200/300 when the authority discovery is enabled.

This should be figured out.

@tomaka
Copy link
Contributor Author

tomaka commented May 20, 2020

Here are some details.

The following metrics have been pulled from nodes running on a recent commit of the Polkadot master branch.

Every ten minutes, the number of libp2p tasks (in orange below) currently running on an average node increases to around 3000:
(note that the graph uses a logarithmic scale)

Screenshot from 2020-05-20 16-38-08

Unsurprisingly, the CPU consumption of the network worker task (in red) peaks every 10 minutes as well:

Screenshot from 2020-05-20 16-39-09

And the number of times calling NetworkWorker::poll takes more than one second also peaks every 10 minutes (network worker in blue):

Screenshot from 2020-05-20 16-40-07

While I don't have a formal proof that these come from the authority discovery, the only thing that "ticks" every ten minutes, as far as I know, is the authority discovery.

@arkpar
Copy link
Member

arkpar commented May 20, 2020

@mxinden could you take a look?

@tomaka
Copy link
Contributor Author

tomaka commented May 20, 2020

Additionally, here is the number of active TCP connections on an average node:

In red and green, the established connections. In bright red, the number of pending attempts. In yellow, the number of connections that have recently been closed.

Screenshot from 2020-05-20 16-43-18

If you except the 150 slots used for gossiping (I think our nodes are configured with 150 slots, not totally sure of that number), all the other connections are used for Kademlia purposes.

I opened this issue because I believe that this number is a bit high, but maybe this is a normal number by design, or by a bug somewhere that leaves connections open.

Tackling this issue consists in figuring out whether this is normal, and if not, decrease the number.

@tomaka
Copy link
Contributor Author

tomaka commented Aug 14, 2020

Might be fixed by libp2p/rust-libp2p#1698

@tomaka
Copy link
Contributor Author

tomaka commented Oct 14, 2020

I believe this is now fixed. See the description of paritytech/polkadot#1807 (comment) for details.

@tomaka tomaka closed this as completed Oct 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants