-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: graft/prune events and mesh peer tagging #383
feat: graft/prune events and mesh peer tagging #383
Conversation
b82f169
to
09123be
Compare
cba3b5f
to
eb75c83
Compare
eb75c83
to
e41c02c
Compare
comment from @dapplion
|
Following a suggestion from @achingbrain we could do the tagging / untagging after the sending an rpc object to a peer (e.g. https://github.com/ChainSafe/js-libp2p-gossipsub/blob/master/src/index.ts#L1241) to unblock grafting / pruning operations, this should avoid the performance penalty. what do you think @dapplion ? |
I think it's important to establish a baseline measurement to ensure we have a basis for comparison of different solutions before we make architectural decisions predicated on performance impact. @dapplion do you have a specific metric in mind that changes here must not affect? If you do, is there a benchmark that we can use as the baseline measurement? If not, can one be created? |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #383 +/- ##
==========================================
+ Coverage 78.52% 78.76% +0.24%
==========================================
Files 46 46
Lines 10868 10992 +124
Branches 1058 1077 +19
==========================================
+ Hits 8534 8658 +124
Misses 2334 2334 ☔ View full report in Codecov by Sentry. |
90b3bf6
to
8177270
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As argued in libp2p/js-libp2p#369 (comment) I do not believe this feature is understood enough to be a net positive in js-libp2p. Collapsing a multidimensional score of "peer usefulness" into a single number can lead to unintended consequences. Note that Go and Rust gossipsub implementations do not use tagging.
The go-libp2p-gossipsub implementation uses a tagTracer which applies tags to peer connections based on their behaviour, leveraging the TagPeer methods from the connection manager implementation Peers are tagged based on the following criteria:
So perhaps we need to expand this PR to accommodate those different types of tags and penalties when adjusting a peer-score . What do you think @achingbrain @wemeetagain ? |
bbcc209
to
57ab517
Compare
What happened to the diff? +14,000 -42,000 |
This is just for testing purposes which is why the commit is labelled DNM |
After deploying this on Feb 7th and observing on our larger cluster I have some observations. In summary Pros
Cons
for more details: The screenshots are in order of unstable ( Gossip score (avg / min / max)UnstableFeat2We can observe that unstable on average had more low scores that deviated from the average than feat2 although the variance is not significant over the last 6 days. Peer count per gossip score thresholdUnstableFeat2We can observe that on feat2 there is substantially less variance over the last 6 days than unstable (around 9 at max) as opposed to spikes of 50 peers on unstable , the peer count remained relatively stable in feat2 Process CPU user secondsUnstableFeat2Slightly more on average CPU usage with feat2 Process Heap BytesUnstableFeat2There is consistently higher heap usage with feat2 with an average higher than unstable’s max. Received messages per secondUnstableFeat2On average feat2 is receiving more messages per second over the last 6 days. Would be interested to hear your thoughts @dapplion @wemeetagain @tuyennhv |
Just sending a reminder @dapplion @tuyennhv |
What specific input do you need from us? |
I wanted to know if you thought these fluctuations were significant enough to be warranted as a performance penalty otherwise we could go ahead with this PR. |
Updated @wemeetagain mind you the value is set to |
I don't see the reference to the tag being set to 0 in that comment |
My bad, I had misinterpreted the comment that says tag values should be between 0-100 to imply that for all cases, I realize in libp2p when untagging we set the metadata to |
@wemeetagain the test that actually checks for the tagging is flakey but I am not sure why, could it be that the peerStore is being cleared somewhere else? |
I've also updated |
src/index.ts
Outdated
|
||
// rust-libp2p | ||
// - peer_score.graft() | ||
// - Self::control_pool_add() | ||
// - peer_added_to_mesh() | ||
}) | ||
|
||
if (this.opts?.taggingEnabled ?? false) { | ||
Array.from(toAdd).map(async (id) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forEach or for-of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably for-of is better, then no need to create an array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then I would have to make this whole method async
and the motivation was to move away from unnecessary promises. i.e. promises would only be created for consumers who enable tagging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if peerStore.merge
throws?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be refactored to a for-of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then I would have to make this whole method
async
and the motivation was to move away from unnecessary promises. i.e. promises would only be created for consumers who enable tagging.
libp2p triage update -
|
src/index.ts
Outdated
|
||
// rust-libp2p | ||
// - peer_score.graft() | ||
// - Self::control_pool_add() | ||
// - peer_added_to_mesh() | ||
}) | ||
|
||
if (this.opts?.taggingEnabled ?? false) { | ||
Array.from(toAdd).map(async (id) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be refactored to a for-of
src/index.ts
Outdated
@@ -1474,10 +1479,30 @@ export class GossipSub extends TypedEventEmitter<GossipsubEvents> implements Pub | |||
const now = Date.now() | |||
let doPX = this.opts.doPX | |||
|
|||
if (this.opts?.taggingEnabled ?? false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make sure that these peer tagging sections are at the bottom of each code section they're in? Currently, peer tagging is awaited before the functionality of the event that triggers the peer tagging, which blocks core gossipsub functionality (often timing-sensitive) on this feature.
doStuff() {
// let all important stuff happen first
...
// then do peer tagging
if (this.opts?.tagMeshPeers) {...}
}
Really this feature can/should be refactored to be a side-effect of grafting / pruning.
export type GraftPeer = {peerId: string, topic: string, direction: Direction}
export type PrunePeer = GraftPeer
export interface GossipsubEvents extends PubSubEvents {
'gossipsub:heartbeat': CustomEvent
'gossipsub:message': CustomEvent<GossipsubMessage>
'gossipsub:graft': CustomEvent<GraftPeer>
'gossipsub:prune': CustomEvent<PrunePeer>
}
class Gossipsub ... {
...
tagMeshPeer = (evt: CustomEvent<GraftPeer>) => {
merge(...).catch(...)
}
untagMeshPeer = (evt: CustomEvent<PrunePeer>) => {
merge(...).catch(...)
}
start() {
...
if (this.opts?.tagMeshPeers) {
this.on('gossipsub:graft', this.tagMeshPeer)
this.on('gossipsub:prune', this.untagMeshPeer)
}
}
stop() {
...
if (this.opts?.tagMeshPeers) {
this.off('gossipsub:graft', this.tagMeshPeer)
this.off('gossipsub:prune', this.untagMeshPeer)
}
}
handleGraft(graft) {
// actually handle the graft
// then emit 'gossipsub:graft'
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can ge done in a follow up PR if necessary considering not only is this an optional feature but it one immediate use case is where it can be used prune a connection via the connection manager in where that connection should not be gossiped to, in the case where it is enabled.
@achingbrain can you review this? I'm happy with it. |
this.components.peerStore.merge(peerIdFromString(peerId), { | ||
tags: { | ||
[topic]: { | ||
value: 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value could be parameterised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, small nit with the hard-coded tag value but it's not essential.
'gossipsub:graft'
and'gossipsub:prune'
events with the following detail:tagMeshPeers
(default totrue
) which will tag the mesh peers in the peer store (This may be used in turn by the connection manager to rank connections, eg for disconnection purposes)