-
Notifications
You must be signed in to change notification settings - Fork 973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EIP-7594: Passive sampling #3717
Conversation
Since a node doesn't know when it should requests the samples from its peers, it's better to passively receive them from the subnets. This method is called "passive sampling" The former method, now called "active sampling", will be used only if the node wants to do sampling in the past slots or the passive sampling fails.
|
||
### Passive sampling | ||
|
||
A few moments before each slot, the node SHOULD be subscribed to `SAMPLES_PER_SLOT` column subnets to receive the samples from their peers. A node utilizes `get_custody_columns` helper to determine which column subnets to be subscribed to. This should be easy to do because the node already has a diverse set of peers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to defeat the purpose of only subscribing to gossip from custodied subnets. If you subscribe to an extra SAMPLES_PER_SLOT
column subnets before each slot, it is equivalent to increasing the size of the total subnets custodied. I don't think subscribing/unsubscribing quickly makes much of a difference here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, how about defining "passive sampling" as receiving samples from the custodied subnets (instead of extra subnets) and decide that the data is available if it receives such samples. This hasn't been specified anywhere in the spec yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like the wrong term, I don't think being part of a gossip subnet can be thought as sampling . The level of amplification is 8x on a subnet vs a simple req/resp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The level of amplification is 8x on a subnet vs a simple req/resp.
It doesn't really have to be 8x. You can just connects to a single node as a mesh peer, so the bandwidth used will be just the same as req/resp.
This sounds like the wrong term, I don't think being part of a gossip subnet can be thought as sampling
I disagree on this. Sampling means take some portion of something. Subscribing to some columns/subnets means taking some columns of all the columns, so I think the term still makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, how about defining "passive sampling" as receiving samples from the custodied subnets (instead of extra subnets) and decide that the data is available if it receives such samples. This hasn't been specified anywhere in the spec yet.
I would like to change my mind on this. I think it's okay to be subscribed to extra subnets.
it is equivalent to increasing the size of the total subnets custodied
It's not really equivalent because you don't keep the past samples for the extra subnets. You just get them and throw them away.
In terms of the bandwidth usage, as I mentioned in the previous comment, you can reduce the mesh degree to 1 so that you use as much bandwidth as req/resp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires us to be able to have dynamic mesh sizes for separate topics, currently in go-libp2p-pubsub
and possibly in other language implementations all topics have the same mesh size. Also having a mesh size of 1 can be problematic for the general network as you would have increased latency on the propagation of a message. A remote peer will not know if a connected peer has a mesh size of 8 or 1. On some paths, data columns might take a lot longer to be propagated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for this usecase you would want a new protocol message ? Where instead of random gossip, a peer simply returns the most recent seen message ids for a particular topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also having a mesh size of 1 can be problematic for the general network as you would have increased latency on the propagation of a message
I think that's okay because, even though the message is delayed or not received at all, the passive sampling acts as only a complement to active sampling.
Notice that, without passive sampling, the sampling node has to wait until the sampling time to request the samples. With passive sampling (even with a mesh size of 1), there will be a very likely chance that it will get the samples before the sampling time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for this usecase you would want a new protocol message ? Where instead of random gossip, a peer simply returns the most recent seen message ids for a particular topic.
In fact, I have an upgrade to GossipSub in mind which will probably help on this issue. I will create a PR on that in a few days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for this usecase you would want a new protocol message ? Where instead of random gossip, a peer simply returns the most recent seen message ids for a particular topic.
In fact, I have an upgrade to GossipSub in mind which will probably help on this issue. I will create a PR on that in a few days.
Here it is libp2p/specs#617
I think we should just increase the custody requirement, though not all the way to |
I don't quite follow this. Is this off-topic? The purpose of this PR is that the nodes don't have to guess when the samples arrive at their peers. |
I think this is on the research stage, so I will close this PR at the moment. |
Since a node doesn't know when it should request the samples from its peers, it's better to passively receive them from the subnets, so that the node doesn't have to guess when to request the samples. This method is called "passive sampling"
The former method, now called "active sampling", will be used only if the node wants to do sampling in the past slots or the passive sampling fails.