IPC-37: Resolver Service and Client #55

aakoshh · 2023-03-01T20:06:11Z

The PR creates a Service and Client to act as the public interface of content resolution.

The Service is what users of the library have to instantiate, it creates all the behaviours created in consensus-shipyard/ipc#34 , consensus-shipyard/ipc#467 and consensus-shipyard/ipc#466 , along with a Client, which can be used to send requests/commands to it over a channel. The Client can be cloned, so that a separate instance can be given to any component that needs to talk to the Service.

Instead of an request/response queue as in Forest, the Client hides the query behind an async resolve method. The caller needs to tell which CID they want to resolve and from which subnet. The Service returns an error immediately if there are no peers known in that subnet.

The other two methods at the moment are to set a list of subnets the agent can provide data for, and to pin subnets which have known IDs (from config or from the ledger). Both take a list, but the first overrides any current value, the second adds to them.

adlrocha

Overall, LGTM, but I am having a hard time putting all things together and understanding when each of the behaviors kick in. Maybe it is worth having a quick sync once the whole protocol is done for you to guide us through the code 🙏

Also, up till now we are using vanilla Bitswap, right? Which assumes that the content being retrieved lives in peers' BitswapStore, which is not currently the case for the agent.

adlrocha · 2023-03-03T14:09:26Z

ipld/resolver/src/behaviour/mod.rs

+        Ok(Self {
+            ping: Default::default(),
+            identify: identify::Behaviour::new(identify::Config::new(
+                "ipfs/0.1.0".into(),


Out of a curiosity, why do we need to pass this ipfs/ for the identify config?

I don't know, I copied it as-is from Forest. The KademliaConfig says the protocol name can be used to distinguish different groups of peers, but the IdentifyConfig just says:

/// Application-specific version of the protocol family used by the peer, /// e.g. `ipfs/1.0.0` or `polkadot/1.0.0`. pub protocol_version: String,

I fixed the 0.1.0 to be 1.0.0, it looked like a typo. I suspect it doesn't matter what it is, it could be ipc? I can't see it used, but I only looked at the Identify behaviour.

For the record when my nodes connect to each other, I see these protocol being supported:

["/ipfs/ping/1.0.0", "/ipfs/id/1.0.0", "/ipfs/id/push/1.0.0", "/ipc/smoke-test/kad/1.0.0", "/meshsub/1.1.0", "/meshsub/1.0.0", "/ipfs-embed/bitswap/1.0.0"]

So this exact thing doesn't even appear 🤷

I see they are using it for authentication purposes. They tag the specific DHT the interact with.

adlrocha · 2023-03-03T14:25:21Z

ipld/resolver/src/service.rs

+        let transport = build_transport(config.network.local_key.clone());
+        let behaviour = Behaviour::new(config.network, config.discovery, config.membership, store)?;
+
+        // NOTE: Hardcoded values from Forest. Will leave them as is until we know we need to change.


Makes sense, with our current scale this values seem conservative enough.

adlrocha · 2023-03-03T14:56:28Z

ipld/resolver/src/service.rs

+                    // We could keep the Swarm alive to serve content to others,
+                    // but we ourselves are unable to send requests. Let's treat
+                    // this as time to quit.
+                    None => { break; }


What if the client is dropped and we need to spawn a new one from a Service already running? We shouldn't have to restart the service for this, right? Maybe the user should worry not to ever drop the client in the lifetime of the service.

In order to spawn a client, the Service would have to have a copy of the Sender<Request> channel (which is what happens in Forest. But the Service instance is consumed by run, so you can't call anything on it any more, you can't have a reference to it, you would not be able to clone a new Sender. And by itself having a copy of Sender, it would never experience this condition either, unless it manually drop it.

So I decided that a cleaner way is for Service::new to return the Service and the Sender (wrapped into a Client). Then you can make as many clones of the Client as you want, before or after you run the Service.

But you are right, you have to make sure not to drop all Client instances. But this is easy, because if you did, and you needed another one, you wouldn't be able to get it from anywhere, so the compiler wouldn't let this happen. And if you never ever need to make a Client call (after telling it what subnets you support), then yeah, you just need to make sure you keep an instance in your application state anyway.

adlrocha · 2023-03-03T15:06:52Z

ipld/resolver/src/service.rs

+            membership::Event::Skipped(peer_id) => {
+                // Don't repeatedly look up peers we can't add to the routing table.
+                if self.background_lookup_filter.insert(&peer_id) {
+                    self.discovery_mut().background_lookup(peer_id)


Are we refreshing the bloom filter after some time? I don´t know if having a lot of peers after some point could lead to skipping legit peers.

I am also thinking about this, but I haven't come up with a nice way to do it yet. A fresh start every now and then seems like a blunt and easy way, but leads to a burst of lookups after. Maybe a better protection would be to use a cache like the guava library supported in the JVM: it has limits on memory and a TTL as well.

adlrocha · 2023-03-03T15:16:30Z

ipld/resolver/src/service.rs

+    let tcp_transport =
+        || libp2p::tcp::tokio::Transport::new(libp2p::tcp::Config::new().nodelay(true));


I don't know if it is already supported in rust-libp2p, but I would love if we could switch to quic in the future and only fallback to tcp if not supported :)

I will check. FWIW in #64 I factor out transport creation, so we can switch easily.

adlrocha · 2023-03-03T15:18:55Z

ipld/resolver/src/service.rs

+            Request::Resolve(cid, subnet_id, response_channel) => {
+                let peers = self.membership_mut().providers_of_subnet(&subnet_id);
+                if peers.is_empty() {
+                    send_resolve_result(response_channel, Err(anyhow!(NoKnownPeers(subnet_id))));


This would mean that no membership was picked up through gossipsub for the subnet, right? We could include in the future a fallback that optimistically asks our direct connections what subnets they are part of for the case we are lucky and we find a provider.

Yes, most likely we asked too early and we should come back later, or perhaps find a known peer and add it to the static address list.

I am also thinking on how to make the gossip better for newcomers, to lessen the trade-off of periodic republishing and let newcomers learn the providers ASAP.

Here's an idea: when we receive a gossip from a provider we haven't known about before, we publish our own in return. They must be a newcomer, so they could use our own information for sure. Perhaps with some cool-off period, like no repeated publishing within 10 seconds (just above the gossipsub broadcast time frame).

Created https://github.com/consensus-shipyard/ipc-agent/issues/67

aakoshh · 2023-03-04T14:37:48Z

Also, up till now we are using vanilla Bitswap, right? Which assumes that the content being retrieved lives in peers' BitswapStore, which is not currently the case for the agent.

That's right, this Bitswap library interacts with the store. The agent will have to provide an implementation of the store that is fit for purpose:

For requests, it would look in its own memory, or the Lotus node it is connected to
For respones, it would store the data in a MemoryBlockstore, so that it can serve it to Lotus later

With Fendermint they can share the actual database, so this is easier. The potential question is whether databases of the same validator across the different levels of the hierarchy could be shared.

aakoshh · 2023-03-04T14:49:10Z

I know you had a different idea of how to resolve content, by going through the Gateway actor itself, something like "give the the contents of the checkpoint identified by this CID", and the Gateway would collect every message that belongs to the checkpoint. Whereas Bitswap would go CID by CID until the whole thing is pulled.

So, I reckon your implementation of the BitswapStore would have to be special in that it would first ask the Gateway actor for the CIDs of the checkpoints and load them into its memory; when a request comes, it's only served if it was discovered this way, otherwise it's considered not asking for a part of a checkpoint.

I recognise that this may not work well for you if you want to respond to a CID by some data that doesn't actually correspond to that CID, which would be impossible, so I urge you to find a compromise.

adlrocha · 2023-03-06T16:31:32Z

I recognise that this may not work well for you if you want to respond to a CID by some data that doesn't actually correspond to that CID, which would be impossible, so I urge you to find a compromise.

This will never be the case. I looked at the BitswapStore implementation in the smoke test and for our case it may just suffice to add a wrapper for the async calls to the gateway or relevant subnet state. 🙏

aakoshh requested review from adlrocha, cryptoAtwill and hmoniz March 1, 2023 20:06

aakoshh mentioned this pull request Mar 1, 2023

IPC-36: Content behaviour #53

Merged

aakoshh force-pushed the ipc-36-content branch from fb62e12 to 32f3583 Compare March 3, 2023 11:37

aakoshh force-pushed the ipc-37-service branch from 6987119 to 3dcb4e3 Compare March 3, 2023 11:38

adlrocha approved these changes Mar 3, 2023

View reviewed changes

aakoshh mentioned this pull request Dec 19, 2023

IPLD Resolver: Gossip to new joiners consensus-shipyard/ipc#457

Closed

aakoshh mentioned this pull request Dec 19, 2023

IPLD Resolver: Limit Bitswap peer count consensus-shipyard/ipc#456

Closed

Base automatically changed from ipc-36-content to main March 6, 2023 09:35

aakoshh added 11 commits March 6, 2023 09:36

IPC-37: Constructor for the service.

925cc33

IPC-37: Subscribe to the membership topic

9ee3bbd

IPC-37: Swarm event handlign skeleton

3ec08d4

IPC-37: Complete query response

c5e75e2

IPC-37: Service Client interface

8b156da

IPC-37: Handle internal events.

fc4bdd8

IPC-37: Query connected first, then go to all.

54f634f

IPC-37: Comments

f6d6f38

IPC-37: Add Bloom filter.

70f37f2

IPC-37: Remove dead code

e33ce8d

IPC-37: Emit Added for static peers and never emit Removed

ae5f459

aakoshh force-pushed the ipc-37-service branch from d718595 to ae5f459 Compare March 6, 2023 09:36

aakoshh merged commit 62662d1 into main Mar 6, 2023

aakoshh deleted the ipc-37-service branch March 6, 2023 09:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPC-37: Resolver Service and Client #55

IPC-37: Resolver Service and Client #55

aakoshh commented Mar 1, 2023

adlrocha left a comment

adlrocha Mar 3, 2023

aakoshh Mar 4, 2023

adlrocha Mar 6, 2023 •

edited

Loading

adlrocha Mar 3, 2023

adlrocha Mar 3, 2023

aakoshh Mar 4, 2023

adlrocha Mar 3, 2023

aakoshh Mar 4, 2023

adlrocha Mar 3, 2023

aakoshh Mar 4, 2023

adlrocha Mar 3, 2023

aakoshh Mar 4, 2023

aakoshh Mar 4, 2023

aakoshh commented Mar 4, 2023

aakoshh commented Mar 4, 2023

adlrocha commented Mar 6, 2023 •

edited

Loading

		let tcp_transport =
		\|\| libp2p::tcp::tokio::Transport::new(libp2p::tcp::Config::new().nodelay(true));

IPC-37: Resolver Service and Client #55

IPC-37: Resolver Service and Client #55

Conversation

aakoshh commented Mar 1, 2023

adlrocha left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adlrocha Mar 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aakoshh commented Mar 4, 2023

aakoshh commented Mar 4, 2023

adlrocha commented Mar 6, 2023 • edited Loading

adlrocha Mar 6, 2023 •

edited

Loading

adlrocha commented Mar 6, 2023 •

edited

Loading