Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Increased p2p Traffic on Relay Chain #12797

Closed
2 tasks done
notlesh opened this issue Nov 28, 2022 · 7 comments
Closed
2 tasks done

Increased p2p Traffic on Relay Chain #12797

notlesh opened this issue Nov 28, 2022 · 7 comments

Comments

@notlesh
Copy link
Contributor

notlesh commented Nov 28, 2022

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

After a recent client upgrade on Moonbeam, we are seeing a drastic increase in p2p traffic on the relay side. The traffic is associated with numerous peer connections which never seem to be reflected as connected peers. A workaround is to use --reserved-only, likely because it disables automatic peer discovery.

You can see here the effect of running a node without --reserved-only (on the left of each graph) and restarting it with --reserved-only (on the right):

image

Note particularly that the number of connections is volatile after the restart for a brief time before dropping to its expected steady-state.

This problem isn't mitigated by restricting --in-peers and --out-peers.

Perhaps related: #12799

Steps to reproduce

  • run with and without --reserved-only and observe network stats
  • OR compare e.g. a Kusama/Polkadot client on <= v0.9.26 vs >= v0.9.27
  • OR compare moonbeam client <= v0.26.1 vs >= v0.27.0 (observed only on Relay client, not parachain)
@notlesh
Copy link
Contributor Author

notlesh commented Dec 5, 2022

Any thoughts on this? We have a workaround, but consider it very much temporary as it isolates our nodes into an island.

@altonen
Copy link
Contributor

altonen commented Dec 7, 2022

I am looking into this and #12799 and I can confirm that there is an increase in the bandwidth usage between versions v0.9.26 and v0.9.27. It looks like v0.9.27 is able establish and maintain more connections which could explain this increase in bandwidth. The behavior of v0.9.26 could be related to paritytech/polkadot-sdk#528

Is it possible that the lower right graph is incorrectly displaying only reserved peers? If you graph substrate_sub_libp2p_peers_count or substrate_sync_peers does it show similar pattern if you run the node with and without --reserved-only?

@altonen
Copy link
Contributor

altonen commented Dec 8, 2022

I've been analyzing the network traffic this morning and I can't find anything concerning in the network usage. During startup more bandwidth is used which seems to be associated with the number of connections the node has open as it switches between 20 and 50 connected peers but after the startup, the node count stabilizes and the average bandwidth between the two versions is basically the same. This instability during startup is something that warrants further research to see if we can somehow fix it and reduce the bandwidth consumption.

I'll close this in favor of paritytech/polkadot-sdk#528 and try to get working on that issue soon

Can you describe the workaround you have in place?

@purestakeoskar
Copy link

@altonen Thanks for looking at this issue. The bottom right graph is displaying substrate_sub_libp2p_peers_count, and the metric substrate_sync_peers is showing the exact same thing.

For the workaround. We have a set of full nodes running as peering nodes, these nodes have all the other nodes as reserved nodes, all the other nodes are having the peering nodes as reserved nodes.
So the trick is to set the in-peers = 0, and the out-peers = number of peering nodes, in addition to enable the --reserved-only flag.

Config before 12:25:

--in-peer=0
--out-peer=#peering nodes

Config after 12:25:

--in-peer=0
--out-peer=#peering nodes
--reserved-only

@altonen
Copy link
Contributor

altonen commented Dec 9, 2022

Does reducing the number of inbound light peers using --in-peers-light <num> fix the issue?

@the-right-joyce the-right-joyce moved this to Backlog 🗒 in Networking Dec 13, 2022
@purestakeoskar
Copy link

@altonen after setting --in-peers-light=0, the bandwidth usage is back to normal. The change has a much higher impact in Polkadot vs Kusama. It is almost not noticeable in Kusama.

Case 1

Before change

--in-peers=0
--out-peers=25

After change

--in-peers=0
--out-peers=25
--in-peers-light=0

Here we can see the impact of the 100 light peers that is allowed by default
image

Case 2

In this case we have 0 in peers, 0 out peers, 4 reserved nodes and the --reserved only flag set.
Before change

--in-peers=0
--out-peers=0
--reserved-only

After change

--in-peers=0
--out-peers=0
--in-peers-light=0

Here the change is almost not noticeable, as the light peers are already disable due to --reserved-only flag.
image

@altonen
Copy link
Contributor

altonen commented Feb 6, 2023

@purestakeoskar

This issue can be closed?

@bkchr bkchr closed this as completed Apr 26, 2023
@github-project-automation github-project-automation bot moved this from Backlog 🗒 to Blocked ⛔️ in Networking Apr 26, 2023
@altonen altonen moved this from Blocked ⛔️ to Done ✅ in Networking May 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

4 participants