-
Notifications
You must be signed in to change notification settings - Fork 6
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM, but I am having a hard time putting all things together and understanding when each of the behaviors kick in. Maybe it is worth having a quick sync once the whole protocol is done for you to guide us through the code 🙏
Also, up till now we are using vanilla Bitswap, right? Which assumes that the content being retrieved lives in peers' BitswapStore, which is not currently the case for the agent.
Ok(Self { | ||
ping: Default::default(), | ||
identify: identify::Behaviour::new(identify::Config::new( | ||
"ipfs/0.1.0".into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of a curiosity, why do we need to pass this ipfs/
for the identify config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, I copied it as-is from Forest. The KademliaConfig
says the protocol name can be used to distinguish different groups of peers, but the IdentifyConfig
just says:
/// Application-specific version of the protocol family used by the peer,
/// e.g. `ipfs/1.0.0` or `polkadot/1.0.0`.
pub protocol_version: String,
I fixed the 0.1.0
to be 1.0.0
, it looked like a typo. I suspect it doesn't matter what it is, it could be ipc
? I can't see it used, but I only looked at the Identify
behaviour.
For the record when my nodes connect to each other, I see these protocol being supported:
["/ipfs/ping/1.0.0", "/ipfs/id/1.0.0", "/ipfs/id/push/1.0.0", "/ipc/smoke-test/kad/1.0.0", "/meshsub/1.1.0", "/meshsub/1.0.0", "/ipfs-embed/bitswap/1.0.0"]
So this exact thing doesn't even appear 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see they are using it for authentication purposes. They tag the specific DHT the interact with.
let transport = build_transport(config.network.local_key.clone()); | ||
let behaviour = Behaviour::new(config.network, config.discovery, config.membership, store)?; | ||
|
||
// NOTE: Hardcoded values from Forest. Will leave them as is until we know we need to change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, with our current scale this values seem conservative enough.
// We could keep the Swarm alive to serve content to others, | ||
// but we ourselves are unable to send requests. Let's treat | ||
// this as time to quit. | ||
None => { break; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the client is dropped and we need to spawn a new one from a Service already running? We shouldn't have to restart the service for this, right? Maybe the user should worry not to ever drop the client in the lifetime of the service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to spawn a client, the Service
would have to have a copy of the Sender<Request>
channel (which is what happens in Forest. But the Service
instance is consumed by run
, so you can't call anything on it any more, you can't have a reference to it, you would not be able to clone a new Sender
. And by itself having a copy of Sender
, it would never experience this condition either, unless it manually drop it.
So I decided that a cleaner way is for Service::new
to return the Service
and the Sender
(wrapped into a Client
). Then you can make as many clones of the Client
as you want, before or after you run
the Service
.
But you are right, you have to make sure not to drop all Client
instances. But this is easy, because if you did, and you needed another one, you wouldn't be able to get it from anywhere, so the compiler wouldn't let this happen. And if you never ever need to make a Client
call (after telling it what subnets you support), then yeah, you just need to make sure you keep an instance in your application state anyway.
membership::Event::Skipped(peer_id) => { | ||
// Don't repeatedly look up peers we can't add to the routing table. | ||
if self.background_lookup_filter.insert(&peer_id) { | ||
self.discovery_mut().background_lookup(peer_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we refreshing the bloom filter after some time? I don´t know if having a lot of peers after some point could lead to skipping legit peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also thinking about this, but I haven't come up with a nice way to do it yet. A fresh start every now and then seems like a blunt and easy way, but leads to a burst of lookups after. Maybe a better protection would be to use a cache like the guava
library supported in the JVM: it has limits on memory and a TTL as well.
let tcp_transport = | ||
|| libp2p::tcp::tokio::Transport::new(libp2p::tcp::Config::new().nodelay(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it is already supported in rust-libp2p, but I would love if we could switch to quic in the future and only fallback to tcp if not supported :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check. FWIW in #64 I factor out transport creation, so we can switch easily.
Request::Resolve(cid, subnet_id, response_channel) => { | ||
let peers = self.membership_mut().providers_of_subnet(&subnet_id); | ||
if peers.is_empty() { | ||
send_resolve_result(response_channel, Err(anyhow!(NoKnownPeers(subnet_id)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would mean that no membership was picked up through gossipsub for the subnet, right? We could include in the future a fallback that optimistically asks our direct connections what subnets they are part of for the case we are lucky and we find a provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, most likely we asked too early and we should come back later, or perhaps find a known peer and add it to the static address list.
I am also thinking on how to make the gossip better for newcomers, to lessen the trade-off of periodic republishing and let newcomers learn the providers ASAP.
Here's an idea: when we receive a gossip from a provider we haven't known about before, we publish our own in return. They must be a newcomer, so they could use our own information for sure. Perhaps with some cool-off period, like no repeated publishing within 10 seconds (just above the gossipsub broadcast time frame).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, this Bitswap library interacts with the store. The agent will have to provide an implementation of the store that is fit for purpose:
With Fendermint they can share the actual database, so this is easier. The potential question is whether databases of the same validator across the different levels of the hierarchy could be shared. |
I know you had a different idea of how to resolve content, by going through the Gateway actor itself, something like "give the the contents of the checkpoint identified by this CID", and the Gateway would collect every message that belongs to the checkpoint. Whereas Bitswap would go CID by CID until the whole thing is pulled. So, I reckon your implementation of the I recognise that this may not work well for you if you want to respond to a CID by some data that doesn't actually correspond to that CID, which would be impossible, so I urge you to find a compromise. |
This will never be the case. I looked at the BitswapStore implementation in the smoke test and for our case it may just suffice to add a wrapper for the async calls to the gateway or relevant subnet state. 🙏 |
Closes consensus-shipyard/ipc#465
The PR creates a
Service
andClient
to act as the public interface of content resolution.The
Service
is what users of the library have to instantiate, it creates all the behaviours created in consensus-shipyard/ipc#34 , consensus-shipyard/ipc#467 and consensus-shipyard/ipc#466 , along with aClient
, which can be used to send requests/commands to it over a channel. TheClient
can be cloned, so that a separate instance can be given to any component that needs to talk to theService
.Instead of an request/response queue as in Forest, the
Client
hides the query behind an asyncresolve
method. The caller needs to tell which CID they want to resolve and from which subnet. TheService
returns an error immediately if there are no peers known in that subnet.The other two methods at the moment are to set a list of subnets the agent can provide data for, and to pin subnets which have known IDs (from config or from the ledger). Both take a list, but the first overrides any current value, the second adds to them.