Improve peer performance for NAT'd nodes #5345

AgeManning · 2024-03-04T06:28:02Z

The best way for lighthouse to manage peers is via pruning excess peers in order to get the best possible steady-state selection of peers long-term.

This process has always been reserved for nodes that have correctly port-forwarded their lighthouse node. Nodes that are behind NAT's never get excess peers and suffer from not having this peer selection logic.

Often these nodes will see InsufficientPeers warning occasionally because their peer set has not been optimized.

This PR adds logic to allow discovery to search for more peers than necessary if the inbound peer count is small (which indicates the lighthouse node is unreachable globally). By searching for and subsequently dialing excess peers, the pruning logic will now also apply for these lighthouse nodes giving them a better set of peers.

AgeManning · 2024-03-04T06:31:16Z

The downside is that after pruning, this logic won't be triggered again until a peer disconnects or the peer count falls below the target.
This will help NAT'd peers, as I notice disconnects happening all the time. The best solution is to port-forward correctly.

Open to suggestions if people want to invoke discovery more often, but i'm conscious of having it over-used, it's very noisy.

pawanjay176

I don't think this is the root cause of our issues, but should help non-port forwarded peers nevertheless.

pawanjay176 · 2024-03-05T05:28:24Z

beacon_node/lighthouse_network/src/peer_manager/mod.rs

@@ -61,6 +61,8 @@ pub const MIN_OUTBOUND_ONLY_FACTOR: f32 = 0.2;
 /// limit is 55, and we are at 55 peers, the following parameter provisions a few more slots of
 /// dialing priority peers we need for validator duties.
 pub const PRIORITY_PEER_EXCESS: f32 = 0.2;
+/// The number of inbound peers that are connected that indicate this node is not behind a NAT.
+pub const INBOUND_PEERS_NAT: usize = 5;


Why 5? How's the node getting any inbound peers at all if it does not have a reachable address?!

I think the node may still be reached by peers that are public, i.e. the node dials a public peer so the router adds that public address to it's table and therefore it may be dialed from them in the future right?

jxs

the logic seems sound, just have the same question as Pawan

jxs · 2024-03-05T18:52:00Z

beacon_node/lighthouse_network/src/peer_manager/mod.rs

@@ -61,6 +61,8 @@ pub const MIN_OUTBOUND_ONLY_FACTOR: f32 = 0.2;
 /// limit is 55, and we are at 55 peers, the following parameter provisions a few more slots of
 /// dialing priority peers we need for validator duties.
 pub const PRIORITY_PEER_EXCESS: f32 = 0.2;
+/// The number of inbound peers that are connected that indicate this node is not behind a NAT.
+pub const INBOUND_PEERS_NAT: usize = 5;


I think the node may still be reached by peers that are public, i.e. the node dials a public peer so the router adds that public address to it's table and therefore it may be dialed from them in the future right?

AgeManning · 2024-03-06T02:17:26Z

Yeah I came here to change it. The original number was just something I made up. It was some threshold.
I was worried about the case that perhaps the NAT was open at one point in time and later closed, maybe UPnP or something. Previous peers would slowly disconnect, but some may stay long-term.

I think the better logic now is to just always discover up to max_peers, regardless of inbound peers. We have been seeing cases of non-NAT'd peers not reaching the excess, so might as well always dial up to the max and let the peer manager prune down.

AgeManning · 2024-03-06T02:21:42Z

Ok, diff is really small now, but impact could be substantial

AgeManning · 2024-03-07T06:42:46Z

@Mergifyio queue

mergify · 2024-03-07T06:42:57Z

queue

🛑 Command `queue` cancelled because of a new `queue` command with different arguments

mergify · 2024-03-07T06:43:27Z

pawanjay176 · 2024-03-07T11:57:18Z

@Mergifyio dequeue

mergify · 2024-03-07T11:57:24Z

dequeue

✅ The pull request has been removed from the queue `default`

pawanjay176 · 2024-03-07T11:57:50Z

@Mergifyio requeue

mergify · 2024-03-07T11:57:58Z

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

mergify · 2024-03-07T11:57:59Z

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at de91c77

Peer discovery for Natd peers

af66843

AgeManning added the ready-for-review The code is ready for review label Mar 4, 2024

AgeManning requested review from jxs and pawanjay176 March 4, 2024 06:28

pawanjay176 approved these changes Mar 5, 2024

View reviewed changes

jxs reviewed Mar 5, 2024

View reviewed changes

AgeManning added 2 commits March 6, 2024 13:20

Reduce logic, discover up to max peers

4e3d14c

Reduce diff

1f8d215

Merge latest unstable

d5dee22

AgeManning added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Mar 7, 2024

mergify bot added a commit that referenced this pull request Mar 7, 2024

Merge of #5345

eb05601

This was referenced Mar 7, 2024

merge queue: embarking unstable (b961457) and [#5357 + #5345] together #5369

Closed

Attempt to publish to at least mesh_n peers #5357

Merged

mergify bot added a commit that referenced this pull request Mar 7, 2024

Merge of #5345

6839f46

mergify bot mentioned this pull request Mar 7, 2024

merge queue: embarking unstable (b961457) and [#5357 + #5345] together #5370

Closed

6 tasks

mergify bot added a commit that referenced this pull request Mar 7, 2024

Merge of #5345

94c7b2f

This was referenced Mar 7, 2024

merge queue: embarking unstable (fc8f1a4) and [#5344 + #5321 + #5311 + #5345] together #5371

Closed

Correct the metrics for topic subscriptions #5344

Merged

This was referenced Mar 7, 2024

Update CI actions to alleviate deprecation warnings #5321

Merged

Reduce load on validator subscription channels #5311

Merged

mergify bot merged commit de91c77 into sigp:unstable Mar 7, 2024
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve peer performance for NAT'd nodes #5345

Improve peer performance for NAT'd nodes #5345

AgeManning commented Mar 4, 2024

AgeManning commented Mar 4, 2024

pawanjay176 left a comment

pawanjay176 Mar 5, 2024

jxs Mar 5, 2024

jxs left a comment

jxs Mar 5, 2024

AgeManning commented Mar 6, 2024

AgeManning commented Mar 6, 2024

AgeManning commented Mar 7, 2024

mergify bot commented Mar 7, 2024 •

edited

Loading

mergify bot commented Mar 7, 2024 •

edited

Loading

pawanjay176 commented Mar 7, 2024

mergify bot commented Mar 7, 2024

pawanjay176 commented Mar 7, 2024

mergify bot commented Mar 7, 2024

mergify bot commented Mar 7, 2024 •

edited

Loading

Improve peer performance for NAT'd nodes #5345

Improve peer performance for NAT'd nodes #5345

Conversation

AgeManning commented Mar 4, 2024

AgeManning commented Mar 4, 2024

pawanjay176 left a comment

Choose a reason for hiding this comment

pawanjay176 Mar 5, 2024

Choose a reason for hiding this comment

jxs Mar 5, 2024

Choose a reason for hiding this comment

jxs left a comment

Choose a reason for hiding this comment

jxs Mar 5, 2024

Choose a reason for hiding this comment

AgeManning commented Mar 6, 2024

AgeManning commented Mar 6, 2024

AgeManning commented Mar 7, 2024

mergify bot commented Mar 7, 2024 • edited Loading

🛑 Command queue cancelled because of a new queue command with different arguments

mergify bot commented Mar 7, 2024 • edited Loading

🟠 The pull request is the 2nd in the queue to be merged

pawanjay176 commented Mar 7, 2024

mergify bot commented Mar 7, 2024

✅ The pull request has been removed from the queue default

pawanjay176 commented Mar 7, 2024

mergify bot commented Mar 7, 2024

✅ This pull request will be re-embarked automatically

mergify bot commented Mar 7, 2024 • edited Loading

✅ The pull request has been merged automatically

mergify bot commented Mar 7, 2024 •

edited

Loading

🛑 Command `queue` cancelled because of a new `queue` command with different arguments

mergify bot commented Mar 7, 2024 •

edited

Loading

✅ The pull request has been removed from the queue `default`

mergify bot commented Mar 7, 2024 •

edited

Loading