-
Notifications
You must be signed in to change notification settings - Fork 30
Thoughts on the next level of content routing for ipfs #162
Description
Currently ipfs uses a DHT for all content routing. This works quite well for
many use cases and is generally reliable, fast, and durable. The problem we are
now facing is that it does not scale well. Sending out provider records for
every single block of every file to the DHT uses an obscene amount of
bandwidth, not to mention the increased CPU and memory load. To continue
improving ipfs, we need to find a better solution.
Delegated routing
The idea of delegated routing is fairly simple. Select another node in the
network to perform all of your content routing for you. This is a good
solution, but it assumes a few things: There are other nodes in the network
willing to do this for you, you trust those nodes, and you can reach those
nodes. This sort of routing is going to be just about required for mobile
applications of ipfs.
Advantages
If you have a low latency connection to your selected delegate, and the
delegate is well connected in the network, routing queries should complete
very fast. From the users perspective, all that is involved is a single RPC
between them and the delegate. The resources required here are very minimal.
Another nice feature is that if a single delegate is shared across multiple
clients, results could be cached to further reduce the resource usage.
Disadvantages
In order for this to work, you have to have a node on the network willing to be
your delegate. This means that you either have to control your own node out in
the wild, or convince someone else to let you use theirs. This isn't always
easy, especially for the casual or mobile users. One solution here is to allow
the ipfs gateway nodes to be open routing delegates, but that may end up
putting an extreme level of stress on the gateways and also makes the whole
system just a bit more centralized (which happens to be something we're opposed
to).
Trackers
The idea of trackers is quite similar to delegated routing. The primary
difference here is that while the idea of delegated routing is agnostic to
how the routing is performed, the intention was still that the delegates were
DHT nodes on the main network. Using the idea of trackers would mean that the
delegate just stores all routing information it deals with locally. This is
what bittorrent uses (in addition to Mainline DHT).
Advantages
Compared to delegated routing in general, the only main advantage is that all
routing queries will be resolved by the tracker, and not require extra work in
the background to find the data. This, in an ideal world, is the fastest
content routing system.
Disadvantages
Trackers have a limited knowledge of what all is available in the network. To
find content using a tracker, other users who have the content you want must
also be using that tracker. Trackers also don't scale well to larger scale
systems, too much load on a single point of failure.
Hybrid Delegates
The idea of a 'hybrid' delegate system is something i've been thinking of for a
little while now. The basic concept is to use multiple delegates, as well as
falling back to the dht in certain cases. First, nodes can mark themselves as
'supernodes' and announce through some medium that they are willing to fulfill
routing queries. Other peers can discover these through some mechanism, mdns,
the dht, a preconfigured list, or randomly discover them when connecting for
other purposes. Once discovered, if a node is configured to use this system,
they can select the 'best' set of discovered supernodes to use and send all
routing queries to those nodes. Depending on how much you trust each of those
nodes you can select more or less of them to query and optionally fall back to
making DHT queries. You should also be able to mark certain supernodes as
trusted in your configuration so you remember them and can use 'just them' to
fullfil queries.
This idea is really a few different things mashed together. Tiered routing, the
ability to query multiple content routing providers intelligently, Delegated
routing as discussed above, and integrated ways to discover these delegates.
One scenario i'm imagining with this system is that you could run a 'supernode'
on your LAN or in your datacenter for all the other ipfs nodes around you to
automatically discover (via MDNS) and use.
TODO: make this section more coherent.
Other things that could help
Batched Providing
Currently, the content routing interface only allows for a single object to be
provided or have providers looked up for at a time. Changing this interface to
allow multiple requests to be batched together should improve resource
consumption significantly.
Multicast
The DHT sends the same message to a large number of different peers throughout
the network, requiring it to form individaul connections to each of them and
send out the message many times. If a multicast system were to be implemented
in libp2p, it could be used to reduce the outgoing bandwidth required by the
DHT.