Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: graft/prune events and mesh peer tagging #383

Merged
merged 17 commits into from
Feb 27, 2024

Conversation

maschad
Copy link
Contributor

@maschad maschad commented Dec 14, 2022

  • Add 'gossipsub:graft' and 'gossipsub:prune' events with the following detail:
export interface MeshPeer {
  peerId: string
  topic: string
  direction: Direction
}
  • Add new configuration option tagMeshPeers (default to true) which will tag the mesh peers in the peer store (This may be used in turn by the connection manager to rank connections, eg for disconnection purposes)
  • Closes Tag mesh peers on graft, remove tags on prune #380

@maschad maschad changed the title fix: Tag mesh peers on graft, remove tags on prunehttps://github.com/ChainSafe/js-libp2p-gossipsub/issues/380 fix: Tag mesh peers on graft, remove tags on prune Dec 14, 2022
src/index.ts Outdated Show resolved Hide resolved
maschad added a commit to maschad/js-libp2p-gossipsub that referenced this pull request Dec 23, 2022
maschad added a commit to maschad/js-libp2p-gossipsub that referenced this pull request Jan 3, 2023
maschad added a commit to maschad/js-libp2p-gossipsub that referenced this pull request Jan 3, 2023
maschad added a commit to maschad/js-libp2p-gossipsub that referenced this pull request Jan 3, 2023
maschad added a commit to maschad/js-libp2p-gossipsub that referenced this pull request Jan 3, 2023
@wemeetagain
Copy link
Member

comment from @dapplion

Hey! I strongly oppose merging the peer tagging feature before having strong guarantees this won't have a performance penalty.

@maschad
Copy link
Contributor Author

maschad commented Jan 10, 2023

comment from @dapplion

Hey! I strongly oppose merging the peer tagging feature before having strong guarantees this won't have a performance penalty.

Following a suggestion from @achingbrain we could do the tagging / untagging after the sending an rpc object to a peer (e.g. https://github.com/ChainSafe/js-libp2p-gossipsub/blob/master/src/index.ts#L1241) to unblock grafting / pruning operations, this should avoid the performance penalty.

what do you think @dapplion ?

@achingbrain
Copy link
Collaborator

this should avoid the performance penalty

I think it's important to establish a baseline measurement to ensure we have a basis for comparison of different solutions before we make architectural decisions predicated on performance impact.

@dapplion do you have a specific metric in mind that changes here must not affect? If you do, is there a benchmark that we can use as the baseline measurement? If not, can one be created?

@maschad maschad marked this pull request as ready for review January 19, 2023 15:11
@maschad maschad requested a review from a team as a code owner January 19, 2023 15:11
@codecov-commenter
Copy link

codecov-commenter commented Jan 19, 2023

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (f255ae4) 78.52% compared to head (86ef3d8) 78.76%.

Files Patch % Lines
src/index.ts 89.77% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #383      +/-   ##
==========================================
+ Coverage   78.52%   78.76%   +0.24%     
==========================================
  Files          46       46              
  Lines       10868    10992     +124     
  Branches     1058     1077      +19     
==========================================
+ Hits         8534     8658     +124     
  Misses       2334     2334              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As argued in libp2p/js-libp2p#369 (comment) I do not believe this feature is understood enough to be a net positive in js-libp2p. Collapsing a multidimensional score of "peer usefulness" into a single number can lead to unintended consequences. Note that Go and Rust gossipsub implementations do not use tagging.

@maschad
Copy link
Contributor Author

maschad commented Jan 23, 2023

As argued in libp2p/js-libp2p#369 (comment) I do not believe this feature is understood enough to be a net positive in js-libp2p. Collapsing a multidimensional score of "peer usefulness" into a single number can lead to unintended consequences. Note that Go and Rust gossipsub implementations do not use tagging.

The go-libp2p-gossipsub implementation uses a tagTracer which applies tags to peer connections based on their behaviour, leveraging the TagPeer methods from the connection manager implementation

Peers are tagged based on the following criteria:

  • Directly connected peers are tagged with GossipSubConnTagValueDirectPeer (default 1000).
  • Mesh peers are tagged with a value of GossipSubConnTagValueMeshPeer (default 20).
    If a peer is in multiple topic meshes, they'll be tagged for each.
  • For each message that we receive, we bump a delivery tag for peer that delivered the message first (using the GossipSubConnTagBumpMessageDelivery variable)
  • The delivery tags have a maximum value, GossipSubConnTagMessageDeliveryCap, and they decay at a rate of GossipSubConnTagDecayAmount / GossipSubConnTagDecayInterval.

So perhaps we need to expand this PR to accommodate those different types of tags and penalties when adjusting a peer-score . What do you think @achingbrain @wemeetagain ?

@maschad maschad requested a review from dapplion January 23, 2023 19:27
@maschad maschad force-pushed the fix/tag-peers-on-graft branch 3 times, most recently from bbcc209 to 57ab517 Compare February 7, 2023 17:42
@dapplion
Copy link
Contributor

What happened to the diff? +14,000 -42,000

@maschad
Copy link
Contributor Author

maschad commented Feb 13, 2023

What happened to the diff? +14,000 -42,000

This is just for testing purposes which is why the commit is labelled DNM

@maschad
Copy link
Contributor Author

maschad commented Feb 13, 2023

After deploying this on Feb 7th and observing on our larger cluster I have some observations. In summary

Pros

  • Improved throughput
  • More stable peer count
  • better gossip scores

Cons

  • Increased heap usage
  • slightly increased CPU usage

for more details:

The screenshots are in order of unstable (unstable-lg1k-hzax41) then feat2 (feat2-lg1k-hzax41) respectively.

Gossip score (avg / min / max)

Unstable

mpkatquRz7qf5JaOLWyCguEPtHmpn0f9mwn5wZIpZDufpd9SWeGoZYndMD4e_riQmX87hrDWiTR9WO5Mjb26_G9c4r_cKnGCdHI9eJk14qwDWiGmIyR8KsbsZ3Xp

Feat2

pNyZLxTOYl6uvUCmxxu2Fb_c-JOI3_6rzW4Yia-a3oCbMQ9jmaaXj45SEr7A31V4J3brrYSgOE6FqAt0HmsdcwUh9MLuvdvVNpFtJcHh9KHarr4mOVGNJd59AcaG

We can observe that unstable on average had more low scores that deviated from the average than feat2 although the variance is not significant over the last 6 days.

Peer count per gossip score threshold

Unstable

CXkgPCLO-tjgjtwMvWacxCXNTQhHB8FypaXCqLkrkN--8zXlvXkIMq8ABlDE8hVPqhJfGLLgwG3JmkwFIkh-28ZaAFXWmzBUAS_nuN4G0WTsp4C5-4qnOd_o2jFz

Feat2

7ZJfIZ5YUgIQIx2vd6fhavX6FVWBOqsM3ZniItNcz-Se-gU8MB3PAC46G54ekViEbuk1X0z_zNlDJBRdISiMhIPGlc1JvFIXmn4S2IxcJGLHjvcbI8stsQvZrSmf

We can observe that on feat2 there is substantially less variance over the last 6 days than unstable (around 9 at max) as opposed to spikes of 50 peers on unstable , the peer count remained relatively stable in feat2

Process CPU user seconds

Unstable

h-_dlymqO7Zt9gchY6uYYLvugATKt_xDFpuMPp2jCyWfHhMdIecHFWelgwVR2bPM-3EDylK0Ry7UrLQSGZCOMjwkhDPgHhE6bk9-j6ONDHh8QgdRKsmkUMqxqTEN

Feat2

YNZ55ZY52xVhlMff5SVvRbkNcX3opHFg_am9sznj9av-RHmhmp4Jl9W92TQ49qXkH05wVa2OTeA2uXGsy-zh5wxhr4E19nWb8cQo0XGFRtsAk-jP2TpOgyzG4Nti

Slightly more on average CPU usage with feat2

Process Heap Bytes

Unstable

Vhd7UczCMxa8s9bTqF35Qt5DpaTBbNDPBMxpoBdztD60oslOIqSLoshEBAHyymNzcFV_oZklaXWA5RUJlZY1mU_osdplYCN6RMVZps_aCY5CkBvRX4XeZm6FYaJo

Feat2

CZhDAWy-ZSry8XyzI89hYkUV-muWLlLCID52WjxcZPEu3wEYbcOYSF7emgFXTwqSw7MdA8IsAdvtn8dapR2Q1YxiDmnX6hC1pzVxzIn0jvozuv5eMuqNEnb4C-zn

There is consistently higher heap usage with feat2 with an average higher than unstable’s max.

Received messages per second

Unstable

Ss1gdZUQxxrK8K2a-ynX0LkXt9mdEFfNRiqzQznWVxUWiWxLGTrSLc0ylZc8et1mxT-jowyP2542Uz9QVi-WVrxG6etnR6XgZvVHySSbvR7E3XcpyfwR4P4P-ku2

Feat2

ca1bdqBRKRYa5Z19uMabnBKThhHAWVt6iVd1IysJWmWPitPDVtGINJ8ZOe8A6nbe80Wn7CvecFBmd-t59UeSanhzctm3B5dLVFp1Mn_hexfdf1VbjcsTykRUBnnj

On average feat2 is receiving more messages per second over the last 6 days.

Would be interested to hear your thoughts @dapplion @wemeetagain @tuyennhv

@maschad
Copy link
Contributor Author

maschad commented Feb 28, 2023

Just sending a reminder @dapplion @tuyennhv

@dapplion
Copy link
Contributor

dapplion commented Mar 1, 2023

Just sending a reminder @dapplion @tuyennhv

What specific input do you need from us?

@maschad
Copy link
Contributor Author

maschad commented Mar 1, 2023

Just sending a reminder @dapplion @tuyennhv

What specific input do you need from us?

I wanted to know if you thought these fluctuations were significant enough to be warranted as a performance penalty otherwise we could go ahead with this PR.

@maschad
Copy link
Contributor Author

maschad commented Oct 9, 2023

Updated @wemeetagain mind you the value is set to 0 not that is removed as referenced in #380 (comment)

@wemeetagain
Copy link
Member

Updated @wemeetagain mind you the value is set to 0 not that is removed as referenced in #380 (comment)

I don't see the reference to the tag being set to 0 in that comment

@maschad
Copy link
Contributor Author

maschad commented Oct 10, 2023

My bad, I had misinterpreted the comment that says tag values should be between 0-100 to imply that for all cases, I realize in libp2p when untagging we set the metadata to undefined I have done so in c117cf9

@maschad
Copy link
Contributor Author

maschad commented Oct 10, 2023

@wemeetagain the test that actually checks for the tagging is flakey but I am not sure why, could it be that the peerStore is being cleared somewhere else?

src/index.ts Outdated Show resolved Hide resolved
@maschad maschad requested a review from dapplion October 23, 2023 21:51
@maschad
Copy link
Contributor Author

maschad commented Oct 23, 2023

I've also updated package-lock.json as npm was reporting critical vulnerabilities.

src/index.ts Outdated

// rust-libp2p
// - peer_score.graft()
// - Self::control_pool_add()
// - peer_added_to_mesh()
})

if (this.opts?.taggingEnabled ?? false) {
Array.from(toAdd).map(async (id) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forEach or for-of

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably for-of is better, then no need to create an array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I would have to make this whole method async and the motivation was to move away from unnecessary promises. i.e. promises would only be created for consumers who enable tagging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if peerStore.merge throws?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be refactored to a for-of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I would have to make this whole method async and the motivation was to move away from unnecessary promises. i.e. promises would only be created for consumers who enable tagging.

@achingbrain
Copy link
Collaborator

achingbrain commented Jan 16, 2024

libp2p triage update -

  • @maschad to pick up the final changes:
  • Error handling to be improved
  • Then review
  • Then merge!

src/index.ts Outdated

// rust-libp2p
// - peer_score.graft()
// - Self::control_pool_add()
// - peer_added_to_mesh()
})

if (this.opts?.taggingEnabled ?? false) {
Array.from(toAdd).map(async (id) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be refactored to a for-of

src/index.ts Outdated Show resolved Hide resolved
src/index.ts Outdated Show resolved Hide resolved
src/index.ts Outdated
@@ -1474,10 +1479,30 @@ export class GossipSub extends TypedEventEmitter<GossipsubEvents> implements Pub
const now = Date.now()
let doPX = this.opts.doPX

if (this.opts?.taggingEnabled ?? false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make sure that these peer tagging sections are at the bottom of each code section they're in? Currently, peer tagging is awaited before the functionality of the event that triggers the peer tagging, which blocks core gossipsub functionality (often timing-sensitive) on this feature.

  doStuff() {
    // let all important stuff happen first
    ...
    // then do peer tagging
    if (this.opts?.tagMeshPeers) {...}
  }

Really this feature can/should be refactored to be a side-effect of grafting / pruning.

export type GraftPeer = {peerId: string, topic: string, direction: Direction}
export type PrunePeer = GraftPeer

export interface GossipsubEvents extends PubSubEvents {
  'gossipsub:heartbeat': CustomEvent
  'gossipsub:message': CustomEvent<GossipsubMessage>
  'gossipsub:graft': CustomEvent<GraftPeer>
  'gossipsub:prune': CustomEvent<PrunePeer>
}

class Gossipsub ... {
  ...

  tagMeshPeer = (evt: CustomEvent<GraftPeer>) => {
    merge(...).catch(...)
  }
  untagMeshPeer = (evt: CustomEvent<PrunePeer>) => {
    merge(...).catch(...)
  }
  
  start() {
    ...
    if (this.opts?.tagMeshPeers) {
      this.on('gossipsub:graft', this.tagMeshPeer)
      this.on('gossipsub:prune', this.untagMeshPeer)
    }
  }
  stop() {
    ...
    if (this.opts?.tagMeshPeers) {
      this.off('gossipsub:graft', this.tagMeshPeer)
      this.off('gossipsub:prune', this.untagMeshPeer)
    }
  }
  
  handleGraft(graft) {
    // actually handle the graft
    // then emit 'gossipsub:graft'
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can ge done in a follow up PR if necessary considering not only is this an optional feature but it one immediate use case is where it can be used prune a connection via the connection manager in where that connection should not be gossiped to, in the case where it is enabled.

@wemeetagain wemeetagain changed the title fix: Tag mesh peers on graft, remove tags on prune feat: graft/prune events and mesh peer tagging Feb 7, 2024
@wemeetagain
Copy link
Member

@achingbrain can you review this? I'm happy with it.

this.components.peerStore.merge(peerIdFromString(peerId), {
tags: {
[topic]: {
value: 100
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value could be parameterised?

Copy link
Collaborator

@achingbrain achingbrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, small nit with the hard-coded tag value but it's not essential.

@wemeetagain wemeetagain merged commit 42b5b92 into ChainSafe:master Feb 27, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tag mesh peers on graft, remove tags on prune
6 participants