feat(networking): add service slots to peer manager #1473

alrevuelta · 2023-01-03T11:43:36Z

Related #1461

Background:

Until we further decentralise the waku network, service peers (peers supporting service protocols such as store, lightpush, etc) have to be treated differently than relay peers.
For example, looking at this from the store protocol perspective, since we don't have any consensus on the data that is stored, there is some level of trust on the node we are quering. So in most cases we wan't to specify the nodes that are used for these services. Note that this feature already exists, but this PR fixes some of the issues around this.

Summary of changes:

This PR adds the concept of serviceSlots, which contains a set of "slotted" or "preferred" peers for the services, or in other words, any protocol that is not WakuRelayProtocol.
- WakuStoreCodec
- WakuFilterCodec
- WakuLightPushCodec
- WakuPeerExchangeCodec
The peer manager constantly checks that we are connected to these "slot" peers (if any), and if not it attempts a connection every x interval. This can speed things up if at some point we need something from that peers, since we are already connected.
Now every time we use selectPeer() we get a different peer depending the protocol that is requested, with the following priorities.
- For relay we just get the first peer of the peerstore, same behaviour as before.
- For non-relay protocols (service peers) now we first lookup the service peers, and if empty, we return one from the peerstore. In other words, peers specified with setstore, setxxx with the cli flags always take precendence.

status-im-auto · 2023-01-03T23:03:50Z

Jenkins Builds

Click to see older builds (2)

❔	Commit	#️⃣	Finished (UTC)	Duration	Platform	Result
❌	`aede3f2`	#1	2023-01-03 23:03:49	~16 min	`macos`	📄`log`

❌	`1ae87f3`	#1	2023-01-24 10:07:09	~18 min	`linux`	📄`log`

❔	Commit	#️⃣	Finished (UTC)	Duration	Platform	Result
❌	`8879340`	#2	2023-01-24 22:53:38	~6 min	`macos`	📄`log`

✔️	`c93875e`	#3	2023-01-25 23:21:22	~34 min	`macos`	📦`bin`

LNSD

Please, check my comments

LNSD · 2023-01-24T21:36:15Z

waku/v2/node/peer_manager/peer_manager.nim

+      if pm.peerStore.connectedness(servicePeer.peerId) != Connected:
+        # Attempt to dial peer. Note that service peers do not respect any backoff
+        let conn = await pm.dialPeer(servicePeer.peerId, servicePeer.addrs, serviceProto)
+        if conn.isNone:


Suggested change

if conn.isNone:

if conn.isNone():

LNSD · 2023-01-24T21:37:18Z

waku/v2/node/peer_manager/peer_manager.nim

+    # Log a summary of slot peers connected/notconnected
+    let connectedServicePeers = toSeq(pm.serviceSlots.pairs).filterIt(pm.peerStore.connectedness(it[1].peerId) == Connected)
+    if connectedServicePeers.len > 0:
+      info "Connected service peers",


As I commented in another PR, this should be set to debug level. The abuse of the info log level is bloating the node logs with debug information.

LNSD · 2023-01-24T21:37:40Z

waku/v2/node/peer_manager/peer_manager.nim

+# Ensure we are always connected to the slotted service peers
+proc serviceConnectivityLoop*(pm: PeerManager) {.async.} =
+  pm.serviceLoopUp = true
+  info "Starting service connectivity loop"


As I commented in another PR, this should be set to debug level. The abuse of the info log level is bloating the node logs with debug information.

LNSD · 2023-01-24T21:38:50Z

waku/v2/node/peer_manager/peer_manager.nim

@@ -335,7 +359,9 @@ proc connectToNodes*(pm: PeerManager,

 # Ensures a healthy amount of connected relay peers
 proc relayConnectivityLoop*(pm: PeerManager) {.async.} =
-  while true:
+  pm.relayLoopUp = true
+  info "Starting relay connectivity loop"


As I commented in another PR, this should be set to debug level. The abuse of the info log level is bloating the node logs with debug information.

Agree re keeping info clearer. Also goes for e.g. Adding peer to service slots above. Ideally the metrics dump every minute should be enough to convey all the info necessary to monitor a node that's not being debugged. It should be possible to extend that with more important metrics as they get added. (I understand that many other p2p apps logs at info level any maintenance pulses, but nwaku has too many such to not clutter the logs IMO.)

dont hold a strong opinion so can move to debug. but my rationale behind this:

both Starting relay connectivity loop and Starting service connectivity loop are a one time thing, and mimic "mounting relay protocol", "mounting libp2p ping protocol", "starting relay protocol", etc, which i think are quite relevant.

I also think Adding peer to service slots is quite relevant since it shows the service peers we are using, and the impact of this i quite high (eg in store). As it is is a one time thing (taken from cli flags).

Regarding logging connected peers every few seconds, sure I can use this metrics dump every minute for that. iirc the connected peers metrics is not there (just the delta aka new connection since last cycle)?

done! using debug now

LNSD · 2023-01-24T21:43:34Z

waku/v2/node/peer_manager/peer_manager.nim

+          servicePeers = connectedServicePeers.mapIt(it[1].addrs),
+          respectiveProtocols = connectedServicePeers.mapIt(it[0])
+
+    let notConnectedServicePeers = toSeq(pm.serviceSlots.pairs).filterIt(pm.peerStore.connectedness(it[1].peerId) != Connected)


The cognitive load of this line makes it hard to understand. Please, create procs that reduce this complexity. This one-liner is hard to debug.

Magic numbers: it[1]

Where does the [1] come from? Where does the [0] at L414 come from?

removed the loop, so this code does not exist anymore.

LNSD · 2023-01-24T21:44:42Z

waku/v2/node/peer_manager/peer_manager.nim

+
+proc selectPeer*(pm: PeerManager, proto: string): Option[RemotePeerInfo] =
+  # Selects the best peer for a given protocol
+  let peers = pm.peerStore.peers().filterIt(it.protos.contains(proto))


Extract a proc: peerStore.getPeersByProto(proto: string)

LNSD · 2023-01-24T21:51:52Z

waku/v2/node/waku_node.nim

+  node.peerManager.serviceLoopUp = false
+  node.peerManager.relayLoopUp = false


☠️ ☠️ ☠️ ☠️ ☠️ ☠️

Add a stop method to the different event loops.

LNSD · 2023-01-24T21:54:03Z

waku/v2/node/peer_manager/peer_manager.nim

+
+    await sleepAsync(ServicePeersInterval)
+
+proc selectPeer*(pm: PeerManager, proto: string): Option[RemotePeerInfo] =


The proc name is too generic. Rename this method to something like selectPeerByProto

LNSD · 2023-01-24T21:58:27Z

waku/v2/node/peer_manager/peer_manager.nim

+    if pm.serviceSlots.len == 0:
+      warn "No service peers configured, but service loop is running"
+    for serviceProto, servicePeer in pm.serviceSlots.pairs:
+      if pm.peerStore.connectedness(servicePeer.peerId) != Connected:


Add isConnected(peerId) method to peerStore.

Suggested change

if pm.peerStore.connectedness(servicePeer.peerId) != Connected:

if !pm.peerStore.isConnected(servicePeer.peerId):

added isConnected function!

waku/v2/node/peer_manager/peer_manager.nim

jm-clius

Thanks. In general LGTM! Ready to approve, but would like to get clarity on what the advantage of maintaining connectivity to service peers are. Please also see comments around tying loop lifecycles more formally to a clean node.stop() shutdown and reining in the use of info. :)

jm-clius · 2023-01-25T08:53:44Z

waku/v2/node/peer_manager/peer_manager.nim

-  while true:
+  pm.relayLoopUp = true
+  info "Starting relay connectivity loop"
+  while pm.relayLoopUp:


Why can't this be tied to node.started so that the all loops' existence is cleared by the formal node.stop() method? Not sure we want to keep track via individual variables of each loop's scheduling, especially since some are started conditionally, others under all conditions, etc.

...realising this may not be accessible from here. Perhaps then adding a stop() method and (private) .started bool to the PeerManager which gets formally stopped when we do node.stop()? I think these loops can at least be tied to the lifecycle of the peer manager and should not need to be tracked separately.

added start() and stop()!

jm-clius · 2023-01-25T08:59:35Z

waku/v2/node/peer_manager/peer_manager.nim

@@ -335,7 +359,9 @@ proc connectToNodes*(pm: PeerManager,

 # Ensures a healthy amount of connected relay peers
 proc relayConnectivityLoop*(pm: PeerManager) {.async.} =
-  while true:
+  pm.relayLoopUp = true
+  info "Starting relay connectivity loop"


Agree re keeping info clearer. Also goes for e.g. Adding peer to service slots above. Ideally the metrics dump every minute should be enough to convey all the info necessary to monitor a node that's not being debugged. It should be possible to extend that with more important metrics as they get added. (I understand that many other p2p apps logs at info level any maintenance pulses, but nwaku has too many such to not clutter the logs IMO.)

jm-clius · 2023-01-25T09:20:26Z

waku/v2/node/peer_manager/peer_manager.nim

@@ -364,3 +390,53 @@ proc relayConnectivityLoop*(pm: PeerManager) {.async.} =
    await pm.connectToNodes(outsideBackoffPeers[0..<numPeersToConnect], WakuRelayCodec)

    await sleepAsync(ConnectivityLoopInterval)
+
+# Ensure we are always connected to the slotted service peers
+proc serviceConnectivityLoop*(pm: PeerManager) {.async.} =


What would be the main advantage of maintaining our connections to service peers? Most service protocols are very opportunistic and may work fine with ad-hoc connections (e.g. when making a store query, filter query, etc.). In fact, we assume that client nodes making use of services may often have connectivity restrictions, so will for periods of time be unable to reach the service nodes. It is up to the application then to attempt connecting to service nodes (store, filter) to retrieve messages it may have missed during bad connectivity.

Certainly agree that we need to keep slots for service peers, though.

main reasons are to i) shorten the time, since the connection is already made and ii) detect slot peers problems asap, and not wait until we need it. But I see your point.

Will remove it, but perhaps we will still have a similar behaviour? I mean, with keepAlive and one store request we will keep connected, until ofc the keepAlive fails or the connection is dropped.

removed the serviceConnectivityLoop

changes were addressed

alrevuelta · 2023-01-26T09:01:36Z

changes were addressed, can we get this in? approval required.

jm-clius

LGTM!

alrevuelta force-pushed the add-service-slots branch from aede3f2 to 1ae87f3 Compare January 24, 2023 07:51

alrevuelta marked this pull request as ready for review January 24, 2023 14:42

alrevuelta requested review from jm-clius and LNSD January 24, 2023 14:45

LNSD previously requested changes Jan 24, 2023

View reviewed changes

jm-clius reviewed Jan 25, 2023

View reviewed changes

alrevuelta added 8 commits January 25, 2023 16:14

feat(networking): Add service slots in peer manager

86d36d1

feat(networking): keep slot peers connected loop

639c139

feat(networking): move asyncSpawn to app

e6ec0c4

Modify selectPeer logic adding slot peers

0af3902

feat(networking): add tests to selectPeer

2c46be0

Minor fixes

72977e6

Remove some comments

e11babf

Fix comments

c93875e

alrevuelta force-pushed the add-service-slots branch from c5c70f3 to c93875e Compare January 25, 2023 15:14

alrevuelta requested review from jm-clius and LNSD January 26, 2023 09:01

jm-clius approved these changes Jan 26, 2023

View reviewed changes

alrevuelta merged commit ea4703e into master Jan 26, 2023

alrevuelta deleted the add-service-slots branch January 26, 2023 09:20

This was referenced Jan 26, 2023

feat(networking): create slots in PeerManager for px/store/lp/filter peers #1461

Closed

Networking MVP: Refactor + extend functionality #1353

Closed

This was referenced Jul 4, 2023

Refactor : Peer Store waku-org/go-waku#579

Closed

Introduce Peer Management waku-org/go-waku#594

Closed

chaitanyaprem mentioned this pull request Aug 8, 2023

feat: support serviceslots in peermanager waku-org/go-waku#631

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(networking): add service slots to peer manager #1473

feat(networking): add service slots to peer manager #1473

alrevuelta commented Jan 3, 2023 •

edited

Loading

status-im-auto commented Jan 3, 2023 •

edited

Loading

LNSD left a comment

LNSD Jan 24, 2023

LNSD Jan 24, 2023

LNSD Jan 24, 2023

LNSD Jan 24, 2023

jm-clius Jan 25, 2023

alrevuelta Jan 25, 2023

alrevuelta Jan 25, 2023

LNSD Jan 24, 2023

LNSD Jan 24, 2023

alrevuelta Jan 25, 2023

LNSD Jan 24, 2023

LNSD Jan 24, 2023

LNSD Jan 24, 2023

LNSD Jan 24, 2023

alrevuelta Jan 25, 2023

jm-clius left a comment

jm-clius Jan 25, 2023

jm-clius Jan 25, 2023

alrevuelta Jan 25, 2023

jm-clius Jan 25, 2023

jm-clius Jan 25, 2023

alrevuelta Jan 25, 2023

alrevuelta Jan 25, 2023

alrevuelta commented Jan 26, 2023

jm-clius left a comment

		node.peerManager.serviceLoopUp = false
		node.peerManager.relayLoopUp = false


		await sleepAsync(ServicePeersInterval)

		proc selectPeer*(pm: PeerManager, proto: string): Option[RemotePeerInfo] =

	if pm.peerStore.connectedness(servicePeer.peerId) != Connected:
	if !pm.peerStore.isConnected(servicePeer.peerId):

feat(networking): add service slots to peer manager #1473

feat(networking): add service slots to peer manager #1473

Conversation

alrevuelta commented Jan 3, 2023 • edited Loading

status-im-auto commented Jan 3, 2023 • edited Loading

Jenkins Builds

LNSD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jm-clius left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alrevuelta commented Jan 26, 2023

jm-clius left a comment

Choose a reason for hiding this comment

alrevuelta commented Jan 3, 2023 •

edited

Loading

status-im-auto commented Jan 3, 2023 •

edited

Loading