Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P2P: Subnet content resolution protocol #475

Closed
adlrocha opened this issue Feb 16, 2023 · 20 comments
Closed

P2P: Subnet content resolution protocol #475

adlrocha opened this issue Feb 16, 2023 · 20 comments
Labels

Comments

@adlrocha
Copy link
Contributor

adlrocha commented Feb 16, 2023

Background

While child subnets C are required to sync with their parents for their operation, it is not required for parent subnets, P, to sync with all their children. This means that while C can directly pull the top-down messages that need to be proposed and executed in C by reading the state of the parent through the IPC agent and getting the raw messages, this is not possible for bottom-up messages. Bottom-up messages are propagated inside checkpoints as a cid that points to the aggregate of all the messages propagated from C to P. P does not have direct access to the state of C to get the raw messages behind that cid and conveniently propose them for execution. This is where the subnet content resolution protocol comes into play.

This protocol can be used by any participant of IPC to resolve content stored in the state of a specific subnet. The caller performs a request specifying the type of content being resolved and the cid of the content and any participant of that subnet will pull the content from its state and share it respond to the request. This protocol is run by all IPC agents participant in an IPC subnet. Initially, the only type supported for resolution will be CrossMsgs, but in the future additional content types and handlers can be registered in the protocol.

Design

This design is inspired by the way we implemented the protocol in the MVP, but if you can come up with a simpler or more efficient design, by all means feel free to propose it and implement it that way. As we don't have a registry of all the IPC Agents or peers participating in a subnet, we leverage GossipSub for the operation of the protocol. Each IPC agent is subscribed to an independent topic for each of the subnets syncing with. Thus, if an IPC agent is syncing with P and C, it will be automatically subscribed to /ipc/resolve/P and /ipc/resolve/C.

In the MVP, the protocol was designed as an asynchronous request-response protocol on top of a broadcast layer (i.e. GossipSub). We implemented three types of messages:

/// Protocol supported messages
enum Messages {
  Pull
  Push
  Response
}

/// Supported types of content
enum ContentType {
  CrossMsgs(Vec<Msg>)
}

/// Requests pulling some content from a subnet
struct Pull {
   source: Option<MultiAddr>,   // multiaddr of the peerID initializing the request
   source_sn: Option<SubnetID>, // source subnetID
   type: ContentType,  // type of content being requested
   cid: cid:Cid  // cid of the content
}

/// Response to a pull request.
struct Response<T: Serialize> {
   type: ContentType,  // type of content being requested
   content: T  // content resolved
}

/// Proactively pushes new content into a subnet to let 
/// nodes decide if they want to preemptively cache it.
/// (its structure is the same as for `Response` with but it 
/// is handled differently by agents.
struct Push<T: Serialize> {
   type: ContentType,  // type of content being requested
   content: T  // content resolved
}
  • When an agent wants to resolve some content from a subnet it broadcasts a Pull message to the relevant broadcast topic for the destination subnet sharing information about the content to be resolved, and optionally either information about the subnet or the multiaddress of the source agent making the request.
  • When one of the agents subscribed in the subnet and syncing with the state sees the request it either responds by broadcasting a Response message to the topic of the source subnet if it was specified in the request, or it directly connects to the MultiAddr of the initiator of the request and send the Response directly to them.
    • Broadcasting the message allows for caching and de-duplications but also increases the load of the network.
  • Finally, when a checkpoint with cross-messages is propagated from C to P, agents in C may choose to broadcast a Push message to the topic of P for the case where agents from validators in P may want to preemptively cache the content so they can propose the messages without having to resolve the content in a destination subnet.

Alternatives

Gossipsub + Bitswap

One alternative to this protocol would be to directly use Bitswap to resolve any cid from our existing connections. We could use GossipSub exclusively for peer discovery, i.e. so all IPC agents would subscribe to an ipc/agents/<subnet_id> topics for each subnet to mesh with other IPC agents syncing with these subnets and establish connections that can then be leveraged by Bitswap to resolve content. For this to work, all the content that we want to be "resolvable" in the IPC agent needs to be cached in a local datastore.

Point-to-point + DHT or Gossipsub for peer discovery.

Another option is to leverage a DHT for each subnet, or to subscribe to specific topics for each of the subnets in order to discover with peers syncing with the same subnets, and then build a direct peer-to-peer protocol for the content resolution with the same kind of messages proposed above for the MVP implementation. Actually, a peer-to-peer libp2p protocol on top of some peer discovery protocol could be the most efficient in terms of number of messages and network load.

@aakoshh
Copy link
Contributor

aakoshh commented Feb 20, 2023

For the record, we had a discussion about this on Zoom:

  • The IPC agent is meant as the separate process running between the Parent and Child Lotus instances.
  • IPC agents subscribe to GossipSub topics that connect them to each other across validators.
  • An IPC agent uses JSON-RPC to pull CIDs from the Child Lotus instance.
  • The JSON-RPC forwards the message to the Gateway actor, where the single CID in a checkpoint is resolved not just to a list of included CIDs but a list of raw Messages directly, and the whole thing is returned. A more general approach of asking the IPLD store would just return a list of CIDs.
  • The Parent Lotus instance asks the IPC agent during block proposal whether it has anything to add, as opposed to the IPC agent pushing a message to the Lotus mempool via RPC. Lotus effectively has two mempools this way.
    Lotus and Forest will need updating to handle the potential that a message included in a block is not the regular chain messages it expects, but a checkpoint.
  • Bitswap is limited to 2MB messages (which should be enough, if we can recursively resolve CIDs, rather than in one lump sum message, since we are not dealing with any content that wasn't a normal Filecoin content at some point).

@aakoshh
Copy link
Contributor

aakoshh commented Feb 20, 2023

It's not entirely clear what Msg is in the following snippet:

/// Supported types of content
enum ContentType {
  CrossMsgs(Vec<Msg>)
}

First I thought it was a fully fledged raw message, but later usage seems to contradict this, because ContentType appears in requests, as well as the response along with the actual content. So I assume it tells how content: T should be deserialized?

@aakoshh
Copy link
Contributor

aakoshh commented Feb 20, 2023

I had questions about Bitswap, notably how it handles recursion.

I saw that for example Forest implements Bitswapstore for RocksDB, which is a trait in the in the libp2p-bitswap library, a different implementation from iroh-bitswap linked above.

If we look at the implementation it has this part:

impl BitswapStore for RocksDb {
    type Params = libipld::DefaultParams;
...
    fn missing_blocks(&mut self, cid: &Cid) -> anyhow::Result<Vec<Cid>> {
        bitswap_missing_blocks::<_, Self::Params>(self, cid)
    }
}

Following on to bitswap_missing_blocks we can see that what id does is load the CID as a libipld::Block type and use its references method to get a list of all contained CIDs from it.

To do so, the Block requires the Ipld type to implement References for the Codec that the StoreParams have. In our case, libpld::DefaultParams defines the use of IpldCodec (as opposed to DagCbor), which implements References here, but if we have a look then IpldCodec::DagCbor is an option, and forwards the call to DagCbor, which reads the binary format of a CBOR and looks for CIDs in it, or to the default implementation on Ipld data structures, depending on what codec the Block was encoded with.

In the case of Forest, it will indeed try to follow all CIDs recursively by returning them as missing. The question was: what happens if a CID field is not resolvable, for example because it is a cryptographic commitment to something that will be revealed later. To find the answer, we'll have to look into the libp2p-bitswap implementation itself.

To be clear, the scenario is:

  1. We have a CID of a checkpoint, we ask for it over bitswap
  2. We receive the bytes - if we use Forest's approach we load it into a Block and return a list of CIDs to ask if they are missing.
  3. TODO: What does libp2p-bitswap do here? Does it insert the Block into the BitswapStore, or not yet, because it has missing parts?
  4. TODO: There are limits to the number of wants a peer accepts - what if there are hundreds of messages in the checkpoint?
  5. We receive a Block for every message in the checkpoint, which should be FVM Message types. This can have CIDs in the params, but params is a RawBytes type, so it would be opaque to DagCbor and not cause more recursive lookups.
  6. If, for any reason, we used another non-opaque type which had CIDs, such that they would not be resolvable, and we used Forest's generic approach again, then we'd return a missing CID that cannot be resolved. The question is again like in step 3: would the system save this Block? Would it notify us that it's partially available? Probably not.

So, to cover all bases, I think we can use bitswap the following way:

  1. either make sure all CIDs can be resolved, and not worry about FVM payloads because they are opaque bytes, or
  2. parse the data into our custom types where CIDs may be unresolvable, and return only the CIDs which should be resolvable, ie. not use Forest's generic approach

Thinking about Fendermint where I want to include CIDs in the blocks for async resolution (ie. not resolve them before the block is executed, but by some future time when the same CID is re-proposed for execution, so that I can keep using Tendermint without stalling consensus while resolution is happening), option 1 should be fine, exactly because I am proposing a CID for resolution, so someone must have it, and Tendermint Core itself doesn't use Bitswap, so it won't get into trouble.

@adlrocha
Copy link
Contributor Author

First I thought it was a fully fledged raw message, but later usage seems to contradict this, because ContentType appears in requests, as well as the response along with the actual content. So I assume it tells how content: T should be deserialized?

Correct. I abused a bit the notation. This is an unnecessary generalization for M2 where I considered supporting more than one content type already for the protocol.

By the way, if you feel discussing on this issue is a bit inefficient (there is no support for threads/resolve conversations. etc.), we can move to a Github discussion, or we can kick-off a proper design doc for this already.

@adlrocha
Copy link
Contributor Author

I had questions about Bitswap, notably how it handles recursion.

When Bitswap receives an IPLD block it inspects if there are pending links to be resolved and broadcasts a WANT message for those CIDs. As we don't need recursive resolution of CIDs for now maybe is easier to not use Bitswap and write a lighter libp2p protocol that from all the connections it checks if its part of the subnet and it can serve the content we are looking for. Bitswap is known to flood the network with a lot of requests to overcome its lack of content routing capabilities, so if we choose to go with it maybe we should wrap it into some higher-level protocol of our own to work around this and try to focus a bit more the search for content to peers in a specific subnet.

We should maybe take a first stab to writing a design doc so we can clarify all this details. WDYT?

We have a CID of a checkpoint, we ask for it over bitswap

Maybe I misunderstood you here, but quick clarification, there's no need for resolving a checkpoint from a Cid. The checkpoint for a subnet C with cross-messages is committed in the state of P, so P already has access to the checkpoint. The checkpoint is the one including the cid of the raw messages in a field of its committed checkpoint.

@aakoshh
Copy link
Contributor

aakoshh commented Feb 20, 2023

@adlrocha thank you for responding to my comments here. I think it's as good a place as any other, at least it's in an obvious place. Currently I am investigating the libp2p-bitswap implementation and wanted to use this place to take some notes. If we conclude that we need to come up with a new protocol, a I agree that separate design doc would be more appropriate.

As we don't need recursive resolution of CIDs

In your design you don't need recursion because you put all messages into a single data structure assembled by the Gateway actor. However I think it would be great if we designed for recusiveness for message resolution between actors (and between Fendermint applications) because checkpointing is inherently recursive, we shouldn't discard it. At least I thought IPLD data can be forwarded by enveloping, so that the core of the message can travel on unchanged, and always be available, rather than as slightly modified copies.

there's no need for resolving a checkpoint from a Cid.

I was thinking about the use case of Fendermint where I wanted to include CIDs as for resolution, so the first step would always be to resolve the CID. I am not at a place yet where I can enumerate to what messages they can resolve; I will probably create an enum just for that, where one of the options will be a cross message, the other a simple message which was turned into CID to lower the cost of re-inclusion in case a block could not be finalized. In that model it would be ForResolution(checkpont-CID) -> Checkpoint(message-list-CID, quorum-cert) -> Vec<message-CID> -> Vec<Message>.

The checkpoint for a subnet C with cross-messages is committed in the state of P, so P already has access to the checkpoint.

Hm, I'm missing the step where the checkpoint was pushed into the state of parent. I thought that the parent asks the agent for anything to include, and it gets the full checkpoint, not a checkpoint with just a CID.

maybe is easier to not use Bitswap and write a lighter libp2p protocol that from all the connections it checks if its part of the subnet and it can serve the content we are looking for.

That might be true, although from what I understand about the MVP it also exhibits flooding behaviour, by broadcasting the Pull and then the Response as well, which in this case are larger messages, especially if the granularity of the request/response is a full checkpoint, without looking at the availability of individual messages. I am wary that we'll end up reinventing bitswap.

At least in the case of Fendermint I believe Bitswap is the right fit and it's worth investigating. I am not skilled with libp2p so for me there's no such thing as a "lighter libp2p protocol" 🙂

Bitswap is known to flood the network with a lot of requests to overcome its lack of content routing capabilities, ... and try to focus a bit more the search for content to peers in a specific subnet.

Great point. I am looking at the NetworkBehaviour that implement Bitswap and I see that it's possible to add and remove peers, so we can probably have a separate bitswap instance for each subnet, rather than a global swarm where every want-block is passed to any peer on any subnet.

@aakoshh
Copy link
Contributor

aakoshh commented Feb 20, 2023

What about https://github.com/retrieval-markets-lab/rs-graphsync ?

EDIT: After reading your summary of graphsync using IPLD selectors, it's even more of an overkill for our situation than bitswap.

@aakoshh
Copy link
Contributor

aakoshh commented Feb 20, 2023

I also notice that some of the improvements that you mention in your fantastic blog have been included in the library, for example during a get they only send out one want-block, and the rest is want-have.

@aakoshh
Copy link
Contributor

aakoshh commented Feb 20, 2023

Random notes regarding libp2p-bitswap:

  • 1.2 seems to be implemented, despite the behaviour saying it's 1.0 and 1.1, at least it handles want-have messages
  • the library does not track wants for future completion
  • if none of the peers passed to the initial get has the data, the query errors out with not-found
  • so we must pass all peers of a subnet we can imagine
  • it will not benefit from the data spreading through the parent subnet because those peers are not listed as initial providers

NB the iroh-bitswap implementation does support the managing of want-lists for future completion. Maybe that's a reason the library seems so much larger.

It's not immediately obvious how IPC agents are supposed to know where to connect to the peers of other subnet's agents. I assume with the GossipSub approach there would have been a single address for all agents, then you just select your topic, but here, if we wanted separate swarms where you can target all the agents of a subnet for resolution, you have to learn the address from somewhere first.

Possibly we'd have to combine the two solution as @adlrocha suggested with "Gossipsub for peer discovery": we could publish the participation of an agent in a subnet to GossipSub, which is cool because the membership info would spread, and then take that membership and use it for running Bitswap queries. With this combo, we can have a single swarm as well, just select the providers according to the subnet.

@adlrocha
Copy link
Contributor Author

Amazing analysis, @aakoshh 🙌

Possibly we'd have to combine the two solution as @adlrocha suggested with "Gossipsub for peer discovery": we could publish the participation of an agent in a subnet to GossipSub, which is cool because the membership info would spread, and then take that membership and use it for running Bitswap queries. With this combo, we can have a single swarm as well, just select the providers according to the subnet.

I like this approach because it also offers a shared broadcast layer by all members for the case that you need to broadcast some control messages. It probably can't be used for the resolution itself as it may flood the network, but it can be used for heartbeat or proactive caching purposes.

@aakoshh
Copy link
Contributor

aakoshh commented Feb 21, 2023

@adlrocha thanks again for following up on all my comments on Slack and here as well. Here's a concrete proposal of something that I think would be general enough and work. It requires almost no new protocol development from our side, just stitching together existing tech, and we already have examples of such stitching; although I have to say it's a bit daunting.

When I wrote my Forest School document, based on my description of handling events around Bitswap it looks like Forest was version 0.4, whereas now it's version 0.6 and a lot has changed: the behaviour of getting and inserting data into the blockstore has been moved into the library (not sure if it was there before as well as in Forest or not).

If we look at the ForestBehaviour it has multiple parts:

  • gossipsub
  • discovery with kademlia
  • bitswap
  • hello, ping, and identify (not sure what that is ATM)
  • chain exchange

Bitswap is used to resolve messages in blocks, while the block of CIDs is received either as a gossipsub message or pulled through chain exchange.

The want_block method looks up all peers from the discovery behaviour and uses them to do a sync with bitswap.

The way the events are handled internally is complicated 😵

So, in a similar vein we could do the following:

  • Use kademlia for discovery, with all its hardening in place
  • Use gossipsub topics to signal membership by all agents in a subnet subscribing to a topic, which is known to everyone via the all_peers method of the Gossipsub behaviour. Gossipsub will use heartbeats internally to make sure peers are real.
  • Use bitswap to sync or get the CID in the checkpoint to receive all its constituents and finally get a notification when all the parts are received, passing in all the peers who are subscribed to the subnet topic according to Gossipsub.

I am not sure how reusable something like ForestBehaviour is, when it has already stitched together multiple lower level behaviours, ie. whether we could use the same behaviour in the Agent and Fendermint, but at least we have a blueprint. Hopefully we can make it a bit simpler, but maybe this is the way.

@aakoshh
Copy link
Contributor

aakoshh commented Feb 21, 2023

One thing that isn't entirely clear to me is that Forest feeds the discovered addresses to Bitswap, but it doesn't do so with Gossipsub. Does Gossipsub have its own way to discover peers?

@aakoshh
Copy link
Contributor

aakoshh commented Feb 21, 2023

From what I can see, Gossipsub must be using the on_swarm_events to maintain its list of connections.

The DiscoveryBehaviour does the same thing and emits the events which trigger the passing of of the connections to Bitswap, before forwarding the event to Kademila, if it's enabled. In fact, this is the only place where a DiscoveryOut event is emitted; notably it doesn't emit it when Kademlia yields new peers.

The reason possibly lies further down where Kademlia is polled and can return a Dial event, which I assume instructs the Swarm to connect to a discovered peer.

The question is:

  • Does Gossipsub know about the subscriptions of peers that it's not connected to, through gossip?
  • What if we aren't connected to any peers in a subnet because Kademlia didn't instruct us to be, ie. if we know which PeerId are subscribed to a topic, but we don't have any of them we can ask via Bitswap and we don't know their addresses either? Maybe in this case we should also emit Dial events?

@adlrocha
Copy link
Contributor Author

Does Gossipsub know about the subscriptions of peers that it's not connected to, through gossip?

I don't think this is the case. It knows about the, on average, 6 connections of its mesh, and from there it can know with certainty that their messages will be broadcast to the rest of the subscribers of the topic with high-probability but doesn't know who are these.

What if we aren't connected to any peers in a subnet because Kademlia didn't instruct us to be, ie. if we know which PeerId are subscribed to a topic, but we don't have any of them we can ask via Bitswap and we don't know their addresses either? Maybe in this case we should also emit Dial events?

I don't think I got this. Kademlia and Gossipsub are orthogonal. If we choose to use a DHT for membership management, each subnet should keep its own DHT and we can be sure that all members will be tracked there. The issue is that, while it is more reliable because the DHT keeps the whole membership, it is less dynamic than leveraging GossipSub.

Use kademlia for discovery, with all its hardening in place
Use gossipsub topics to signal membership by all agents in a subnet subscribing to a topic, which is known to everyone via the all_peers method of the Gossipsub behaviour. Gossipsub will use heartbeats internally to make sure peers are real.
Use bitswap to sync or get the CID in the checkpoint to receive all its constituents and finally get a notification when all the parts are received, passing in all the peers who are subscribed to the subnet topic according to Gossipsub.

This may be a bit of an over-kill. We could maybe use Gossipsub for membership and a point-to-point request-response libp2p protocol like chain exchange for now for the actual exchange. We could then introduce Bitswap in the future (it seems harder to integrate). That being said, let me ping the CoD team, IIRC they had a similar issue, let's see what they ended up doing. Following up in Slack.

@aakoshh
Copy link
Contributor

aakoshh commented Feb 21, 2023

It knows about the, on average, 6 connections of its mesh,

I don't think that's exactly true. It definitely knows about more peers, but by default it only uses 6 connections per topic. I saw this yesterday in join, that when you join a topic, you pick 6 connections, filling them up with random ones if you have to, which means there are more to choose from. You can see this here: there are collections that track who is subscribed to what and then there is separately the mesh, but later on it says the mesh keys are the ones we are currently subscribed to. Also when we receive a subscription, it is recorded and then optionally added to the mesh if there are less than 6 peers in it.

Kademlia and Gossipsub are orthogonal.

I don't think they are, not completely. I think Gossipsub only adds connected peers when the Swarm connects to them (see the link above), and the Swarm only connects to peers if you tell it to do so, which can be because they are explicit/persistent peers you configured the node to always connect to, or because the Kademlia behaviour told it to connect, because it had to run a query against them during the discovery process. They seem orthogonal, and they kind of are, but the Gossipsub connections exist as a byproduct of Kademlia (and the discovery behaviour driving it) doing its regular thing.

My current thesis is that we need some kind of peer discovery process, because Gossipsub won't connect to anything on its own.

each subnet should keep its own DHT

Not necessarily; I mean we can have a single DHT to discover all IPC agents in existence, then decide based on Gossipsub who to contact when we need to resolve content from certain subnets. I doubt that we'll run out of space in the K-buckets to be able to track so many agents. It's just possible that we need to prompt the Swarm to connect to the ones we want, not just hope that we are already connected to some. And here, we need an address, which we normally get from Kademlia. We cannot ask Gossipsub for addresses, just PeerIds; if we want to use Gossipsub for learning where to connect to, we have to publish that information explicitly, and ask the Swarm to connect; basically do Kademlia over Gossipsub.

Another option would be to let Kademlia connect, but keep track of these addresses even if it disconnects, and make sure we are always connected to some of those peers in the target subnets by telling the Swarm to dial them.

We could maybe use Gossipsub for membership and a point-to-point request-response libp2p protocol like chain exchange for now for the actual exchange. We could then introduce Bitswap in the future (it seems harder to integrate).

I disagree, I think Bitswap is the the right solution here and the one easier to integrate as well, because it needs zero custom development.

I might be still misunderstanding how your approach for including the cross messages works, but I thought once you do your resolution, you have to put the CID of the cross message into an actual block for execution - otherwise how would any other peer know that this is the block a validator decided to do this. This means that during historical syncing gossiping blocks, you will encounter CIDs in the blocks that can be the current (Signed)Message type, or the cross message, and these CIDs are currently resolved by Bitswap already (see links above). By tapping into Bitswap would actually do it in the way things are supposed to work.

@aakoshh
Copy link
Contributor

aakoshh commented Feb 21, 2023

@adlrocha it has been a long time since I read it but https://github.com/ethereum/devp2p/blob/master/discv5/discv5.md combines peer discovery with topics.

@aakoshh
Copy link
Contributor

aakoshh commented Feb 21, 2023

Just noticed that Forest now implements its own Bitswap: https://github.com/ChainSafe/forest/blob/main/node/forest_libp2p/bitswap/README.md

@adlrocha
Copy link
Contributor Author

It definitely knows about more peers, but by default it only uses 6 connections per topic.

It definitely knows about more connections, but it can only be sure about 6 of those being subscribed to a specific topic (I think we are saying the same thing :) )

My current thesis is that we need some kind of peer discovery process, because Gossipsub won't connect to anything on its own.

I agree with this. There needs to be a bootstrap infrastructure for agents (or a protocol like mDNS for bootstrapping, although I am afraid that wouldn't work for us).

Another option would be to let Kademlia connect, but keep track of these addresses even if it disconnects, and make sure we are always connected to some of those peers in the target subnets by telling the Swarm to dial them.

A peer could probably also populate its entry with information about the subnets it is subscribed to. Actually, I feel this approach is really similar to that of discv5 (it has also been a while since I last read the protocol).

To summarize a bit the points made so far, we seem to agree on:

  • The need for a bootrstrapping infrastructure to establish a first batch of connections and peer-exchange with other agents.
  • Bitswap seems to be the best option for data exchange as it is content-addressed and content-agnostic allowing us to accommodate any kind of additional content we need to support in the future.
  • We need some kind of peer discovery mechanisms in order to be able to run a Bitswap exchange with an agent running a full-node in the subnet we want to pull content from. Gossipsub and the Kademlia DHT are our two candidates for peer discovery, being Gossipsub a more dynamic protocol and easier to integrate, and Kademlia more reliable but requiring more logic to accommodate our case.

@aakoshh
Copy link
Contributor

aakoshh commented Feb 23, 2023

but it can only be sure about 6 of those being subscribed to a specific topic (I think we are saying the same thing :) )

Not exactly, I meant that it knows others are subscribed, but it doesn't gossip that topic to them.

There needs to be a bootstrap infrastructure for agents

Yes, and thanks for reaching out to the Bacalhau people 🧜‍♂️ They use Gossipsub Peer Exchange to learn about addresses of other peers in the network during pruning; alas, this is not available in the Rust version of the library. I don't think this is the right time for us to add the feature, so I vote we start with Kademlia. Judging by the Forest code, it doesn't require too much code to wrap in a similar structure than their DiscoveryBehaviour.

Assuming we know from Gossipsub who are the agents in a specific subnet that we can contact for getting our stuff, what may be difficult to balance is that we are actually able to maintain connections to some of them. If we try to connect to one and the connection pool of the Swarm is already full, the outgoing connection will be rejected. It's like we'd need a separate Swarm for each subnet we are interested in, to make sure we have enough capacity reserved for each. Or we have to code it in one of our network behaviour to make sure we are connected to some peers in each subnet, which is probably the better approach. Or maybe we can just not limit outgoing connections.

@aakoshh
Copy link
Contributor

aakoshh commented Mar 8, 2023

Most of the issues are completed, the library is ready to be tested. Let's close this and use the offshoots we created from now.

@aakoshh aakoshh closed this as completed Mar 8, 2023
@jsoares jsoares transferred this issue from consensus-shipyard/ipc-libs Dec 19, 2023
@jsoares jsoares added the s:ipc label Dec 19, 2023
@jsoares jsoares closed this as not planned Won't fix, can't repro, duplicate, stale Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants