Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiaddr Based Content Routing #11

Open
aschmahmann opened this issue Apr 11, 2019 · 1 comment
Open

Multiaddr Based Content Routing #11

aschmahmann opened this issue Apr 11, 2019 · 1 comment

Comments

@aschmahmann
Copy link

aschmahmann commented Apr 11, 2019

Context

A standard IPFS data request causes the Exchange (i.e. Bitswap) to search the Content Routing system for a set of PeerInfo objects (which are just PeerIDs + their multiaddrs). The Exchange then takes these PeerInfo objects and requests data from the peers.

This causes us to need libp2p peers to proxy all data available to the Content Routing system, but if the data is available elsewhere shouldn't we be able to access it?

Proposal

I would like to be able to request data from multiaddrs that do not correspond to libp2p peers. For example, if we want to store data with some cloud storage provider like AWS S3 we could put a provide record in the DHT that Hash(Data) lives at /http/mybucket.s3.amazonaws.com/Data.

Motivation

While we could also run an a compute node, like EC2, with a set of IPFS cluster daemons on them with an S3-backed datastore it's certainly more costly. This is even more interesting if we can "draft" data that's publicly available over HTTP into IPFS.

Implications for future work

While the first iteration of this idea is conceptually fairly simple, it has implications for some of our ongoing endeavors. For instance, if we have 1000 small blocks hosted on /http/mysite.com/Data1-1000 that are all part of a single IPLD object we wouldn't be able to just provide the root IPLD node since there's no peer that will be able to tell us where the other 999 blocks are. There are various ways we could extend the protocol to allow us to tell retrievers where the other 999 blocks are, but it's not as simple as with the existing peer based retrieval.

Additionally, we would likely face increasing demand to support large files that are available over HTTP. Since we don't want users to download a lot of data before it's verified we'd probably want to extend the protocol with some ability for the peers advertising the content in the DHT to add (references to) hashes of chunks of the large file that could be verified. Similarly, we'd want to add the ability to download ranges of bytes when presented with a multiaddr that supports that functionality.


I think implementing this functionality could make running "pinning" services much easier and less expensive as well as greatly increasing the amount of content accessible via IPFS. But what about you @Stebalien @raulk @bigs ?

@mikeal
Copy link

mikeal commented Apr 12, 2019

This aligns well with a project I'm working on in IPLD for centralized Block storage over HTTP.

Also, this is similar to a prior discussion I started for adding an equivalent to Bittorrent's webseed feature ipld/ipld#57

You'll need to settle on a base encoding for the CIDs, I'm planning on base32 to align with the rest of our shift to base32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants